How Can AI Be Used to Ensure Clinical Data Quality is Optimal?

Jennifer Bradford, Director of Data Science

Sheelagh Aird, Senior Director, Data Operations

PHASTAR

There is no question that the quality of data is fundamental for a successful clinical trial. Clinical data reported during a trial must be complete and reliable, as it is the foundation for analysis, submission, approval, and the ultimate marketing of a drug. 

Therefore, data cleaning becomes a crucial step in the management of a clinical trial to ensure precision and consistency in the data collected. Keying errors frequently occur during data entry. This could be a transcription error in a date, dose, or range, or a simple spelling mistake in a drug name or disease text which can impact medical coding. 

Of course, clinical trial systems are designed with automated edits to ensure the accuracy of data at the point of entry into the system and potential data entry concerns are immediately flagged. However, they are not foolproof. Manual methods for clinical data management are another option whereby queries are raised in the system to the clinical trial site. This can help to alleviate potential safety issues from discrepancies and inconsistencies in the data collected. Clearly, more automated methods of checking data are preferred over manual methods, the latter of which is time-consuming and expensive. 

Promising results can be seen with advancements in artificial intelligence (AI) being used to study the queries in context. AI assists in improving automated data checks as well as provides additional processes to identify potential issues earlier in a study. By applying machine learning (ML) to historic manual queries across different studies to understand common issues across and within studies, it could enable a more targeted approach to process optimization for clinical trial data cleaning.

A Review of the Data Collection Process 

Clinical data management teams and each trial site work collaboratively to ensure that the data collected is managed in a conscientious way and then is reported clearly and accurately and  delivered securely to the data repository for access by a CRO or sponsor. An essential part of this process is data cleaning to ensure the data is consistent and accurate. 

Generally, to ensure quality, data is checked for inconsistencies such as missing data. This step in the process can be automated as data is entered (edit checks). It can also be done manually after data has been entered through a query to the trial site.

Examples of How to Utilize AI in Manual Data Queries 

So, how can clinical trial investigators use AI to successfully review data cleaning? We’ll look at a few examples within clinical trial settings. 

For example, in Study 1, a higher-than-expected number of queries were identified. In this case, out of 21,103 queries, 7,560 or 36% were manual. Considering that the average cost of a manual query from start to close is about $170, the high percentage of manual queries added significant cost to the study. 

Taking a closer look at the specifics of the manual queries, the data included the form and variable the query was raised on, the row the query was raised on, and the query message. 

That information provided the basis for further study with the aim of reducing the number of manual queries. However, there were questions to be researched, including whether or not we could identify themes in the manual queries without subjecting them to human bias. 

The queries can be thought of as a set of documents and we could look across this set of documents to identify commonalities in words. However, given the volume of words together with their context (i.e. neighboring words), it is difficult to identify meaningful themes. Latent Dirchlect Allocation (LDA) offered an opportunity to identify these themes. As a type of topic modeling algorithm in machine learning, LDA allows sets of observations to be explained by unobserved groups. The observations being words collected into queries or documents, with each document a mixture of a small number of groups or topics and that each word in a document is attributable to one of the document’s topics. Its purpose is to learn the topic distribution of each document in a collection of documents. 

Via LDA, the mission was to determine if the number of manual inquiries could be reduced through the understanding of problematic forms. Could this technology result in more focused edit checks and could queries be auto generated? 

First, each query was tokenized or split into words with common data management words removed, i.e., confirm, verify, check, please, thank you, etc., and LDA applied to the queries to create several common topics for further review. The results of the LDA were visualized in the context of the forms used to collect the data and study experts consulted to bring deeper understanding of the context. 

Studies Prove LDA’s Efficacy 

The visualizations provided the study experts with insights into the different topics, identifying the most common words in each topic together with a summary of the context, such as the form and variable that the queries within a topic were raised against. This enabled the team to really understand what was driving queries within a topic. This understanding provided the basis to explore how these queries may be reduced going forward, for example through enhanced edit checks or, as described below, through a rules-based approach to speed up the discovery of the potential data issue. 

Study 2 was an extension study building on the results of Study 1, plus results from an earlier study in the same clinical indication that would help understand overlap and differences. As expected, there was a large amount of overlap. It is interesting to note, however, that the differences in Study 1 that had been addressed were different in Study 2. For example, issues arose in central labs were in Study 1 but not Study 2. 

Study 3 proved even more compelling. It was a completely different phase, study, sponsor, indication, and population and yet it also revealed some differences and some overlap.  

Rules-Based Approach for Further Study 

Based on the efficacy of the AI approach and interpretation of the different topics, next steps meant investigating whether the generation of rules could identify some of the queries in the ‘top’ topics identified. This would be a critical foundation for progressing onward and applying this approach to more studies and would uncover comparative overlap and differences in each study. 

To implement a rules-based scenario, however, a dataset would have to be created to assess the impact of the implemented rules using a historical snapshot from Study 1. For example, work with a topic result and an expert to generate rules to auto-identify the potential data issues. And finally, apply these rules to the data snapshot and come to an understanding of how many manual queries are aligned with the results. 

Reducing Manual Queries with AI in Data Management 

If the investigation proved successful, a final step would involve exploring the potential of this approach as part of the general data management process. 

Via LDA, the mission determined that the number of manual inquiries could be reduced through the understanding of problematic forms. Additionally, this technology would result in more focused edit checks and would allow queries to be auto generated. 

The exercise very definitely demonstrated the benefits of applying AI to the manual data queries generated during the data cleaning process. It validated that this technology provides the opportunity to significantly reduce the number of manual queries during a clinical trial and thus increase efficiencies while reducing costs. 

Although automation and AI techniques play a key role, managing and distributing clinical trial data will, for the foreseeable future, be a human-machine endeavor. While machines may be data-driven and more accurate than manual approaches, they will always require human attributes to provide the critical interpretation for better understanding of this data.

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special offers
from Pharmaceutical Outsourcing – all delivered right to your inbox! Sign up now!

  • <<
  • >>

Join the Discussion