Six Considerations When Implementing Machine Learning Algorithms in Clinical Research

Revolutionary advances in healthcare science and technology have led to a host of novel therapies, and to the increasingly complex research needed to develop them. The methods and tools used to conduct and monitor clinical trials need to keep pace; machine learning is playing a vital role in these advances.

Machine learning provides a crucial opportunity to shape and direct clinical trial research, helping harmonize and analyze the massive amounts of data generated in the course of a study, providing the means needed by researchers to make faster, more informed decisions. Yet, the machines are only as effective as their algorithms and their interaction with human researchers.

In guiding researchers who are using a system with built-in machine learning for the first time, we generally list six key considerations:

1. Clearly Understand the Algorithm Results and How They Can Be Interpreted in a Clinical Trial Context

Machine learning algorithms are designed to generate signals that help researchers track and monitor risks without needing to examine the data manually, point by point. The algorithms themselves can be complicated — yet to interpret them correctly, researchers need to understand what they are measuring and in what context. Ideally, these results are visualized in a manner that is easy to both interpret and act on, generating signals through a combination of drill-down capabilities, statistical models, and intuitive data visualizations with built-in workflows, which researchers can then use to act on existing risks — and to prevent future ones.

Results can be displayed in multiple ways. For instance, anomaly detection algorithms might generate workflows to flag abnormal data, which could be tailored directly to the needs of the specific users responsible for certain patients or sites. Researchers can then investigate the discrepancy and make a judgment as to how to proceed. Meanwhile, clustering algorithms group patients or sites into similar and cohesive clusters, which helps examine a single patient in the context of other similar patients. If a patient is very similar to his or her cluster of patients, then there is little cause for alarm — but if that patient stands out not only from the overall norm but from their own cluster, then they merit closer investigation.

When researchers have clear visual guidance that highlights and contextualizes anomalies as well as the ability to instantly parse data to answer specific questions, they are able to glean relevant and important insights that improve both patient safety and study efficacy.

2. View the Results of the Algorithms in the Study's Administrative Processes in Light of That Study’s Specific Design and Research Objectives

Any given set of data can generate a wide range of insights, especially in a trial with multiple sets of data. Yet, not all these insights are valuable — or even pertinent to the point of a given clinical trial.

For instance, a patient’s elevated blood pressure may be directly material to a cardiac trial but may prove immaterial to an oncology study. By considering these results in light of their own specific objectives, researchers can gauge whether or not any given data warrants taking action.

Optimally, a machine learning system enables individual researchers to configure the algorithms, then set them to run on a schedule. Those algorithms generate workflows and/or visualizations, which provide the researcher with feedback to help make informed decisions — but the researchers themselves are responsible for determining the relative value of the information and how to act on it.

Moreover, researchers should be able to control the generation of those signals to ensure that users are not flooded with signals that they cannot act upon, especially at the early stages of the study. This includes using thresholds based on the quality of the signals, as well as controlling the frequency of signal generation through scheduling.

3. Regularly Monitor and Audit the Algorithms

FDA regulations require that any action taken related to data be monitored and audited. Machine learning algorithms are no exception. A trace log that records when an algorithm is created, has its configuration changed, and is executed meets this requirement. Such a log also provides greater insight into which data, precisely, the algorithm is presenting. Other best practices include:

  • Maintaining a detailed log that illustrates each step the algorithm takes to identify any issues with the data.
  • Utilizing a notification system to alert the administrator when the algorithm begins, when it finishes, and if there are any problems during the run.
  • Comparing snapshots of data across various time periods to ensure that nothing has changed drastically.

These strategies ensure that, when an algorithm uncovers an interesting patient result, the researcher can easily (and accurately) decide whether to examine the causes more closely. Such monitoring and auditing processes not only satisfy regulatory demands — and help researchers build their own trust in the algorithm’s outputs — but are a critical component of quality management that leads to better-informed decision making.

4. Maintain Content-Specific Machine Learning-Based Data Libraries

Machine learning-based data libraries focused on various drug classes, therapeutic areas, disease states, and chemical structures can be used by machine learning systems, bolstering the data to which individual trial results are being compared. These offer significant benefits when used judiciously, including the ability to:

  • View each area through the lens of risk-based monitoring, including verifying data accuracy.
  • Look specifically at the signals detected by the algorithm, so as not to be distracted by the aggregated information.
  • Focus on similar patients from similar sites as a baseline to quickly identify outliers that warrant closer investigation of potential safety or data accuracy issues.

By leveraging these aggregated libraries, researchers can make smarter, faster, and safer decisions.

Validate Your Results

It is crucial to remember that machine learning algorithms are not infallible. Their efficacy and accuracy rest largely on the quality of data being received — as the saying goes, “Garbage in, garbage out.” This adage suggests three key points:

  • Never act on a result without questioning it first. The algorithms should be designed to be passive and exist to help researchers make well-informed decisions, not to dictate actions.
  • Continually update the algorithms based on new inputs and findings, including new data sources, user interaction, and feedback.
  • Maintain ongoing comparisons to known benchmarks to ensure the validity of the underlying algorithms. Those benchmarks could be snapshots of the raw data at different time periods, which were used to train the algorithms and track the accuracy of detecting well documented signals.

5. Remember the Human Element

Machine learning algorithms will generate signals given a valid raw data format, no matter which random data it is given. In many machine learning applications, where the stakes are negligible, users routinely ignore imperfect results — such as the many book recommendations or movie suggestions that fail to pique the searcher’s interest. But in clinical trials, false positives or false negatives can lead to devastating consequences.

Because each clinical trial is unique — in its goals, parameters, and patients — the algorithms that help decipher results must also be unique. They can be based on algorithms used for trials within the same therapeutic area, but they must be adjusted to address the specific needs of this trial. Because machines rely on the data provided, their algorithms must be carefully designed to incorporate the most relevant information. Most importantly, because machines lack personal judgment, researchers must themselves always carefully consider the results, bringing their own knowledge and insights to bear before taking action or making decisions based on the algorithms’ output.

6. A Safer Clinical Trial Environment for Patients and A More Effective, Efficient Clinical Trial Approach for Sponsors

There is no question that machine learning algorithms speed the use and understanding of clinical trial-generated data — analyzing millions of points of data in minutes, eliminating time-consuming point-by-point data review. They enable close to real-time analysis, which supports more informed and timely decisions — decisions which, in turn, keep the trial moving efficiently, potentially enable it to conclude sooner, and reduce costs. Further, because data are evaluated continuously throughout the trial, there is no need to engage in tedious, time-consuming data cleaning at the trial’s end, accelerating the reporting of trial results.

By understanding precisely what the algorithms are designed to do — and what they cannot do — as well as best practices for monitoring, auditing, and validating the results, researchers not only can manage the increasingly complex outputs of today’s trials, but can make those trials faster, safer, and smarter.

  • <<
  • >>

Join the Discussion