Visualizing for the Future – Prioritizing Data and Increasing Expertise for Enhanced Data Science Decision-Making

Nicola Griffiths- Manager of Clinical Data Operations, Phastar.

Data science is the current buzzword in the pharmaceutical industry. It offers solutions to the challenges of increasing clinical data sources in a multi-provider, decentralized, digitally-enabled environment. But to make sure it is more than just a buzzword we need to recognize data science as a distinct discipline and indispensable component to modern-day clinical trials.

Traditionally, data managers were used to Case Report Form (CRF) data. It was our gold standard, it was what we cleaned, what we understood. There were methods in place for collecting it and it was very regimented. But the data landscape is changing. In 2022, Phase III clinical trials were generating an average of 3.6 million data points – three times more than a decade previously.1 This huge volume of data was also coming from more sources without the rigorous checks teams were used to.

In this messy data environment, data scientists provide fundamental support to traditional data management teams, helping streamline cleaning and identifying potential issues which previously might have gone undetected. For example, what are the data points the site is making lots of changes to? Does that mean they are not certain about that data? Is there a quality issue?

Data scientists allow us to examine the consistency of data, prioritize what is important and separate out what is not. This allows us to take a more risk-based approach to data cleaning, ultimately saving time and resources.

Recent Trends in Clinical Trials and Changes to Data Management

Data management teams have been buried in data in recent years. This is not just remote monitoring data but also lab data, gene data, scans, personal monitoring devices – almost every source imaginable. We are receiving multiple data sources, from multiple vendors, all managed in a different way prior to being collated in Data Management departments and have the challenge of ensuring the final quality of that data, without any ability to impact the collection methods or data sources.

While data scientists can undoubtedly help prioritize this data, we also need to educate those providing data about the need to follow good clinical practice (GCP) i.e., that the information should be recorded, handled, and stored in a way that allows accurate reporting, interpretation, and verification, as data destined for a clinical trial has to be controlled to a very high standard. We need to go back to the source, whether that is labs or third-party suppliers, and have the conversations we had in data management 20 or 30 years ago. How do you control the quality and integrity of your data? Who recorded the original data? Who inputted the data into your database? How do you verify the inputted data versus the source? Have you got an audit trail to track any changes made to that data? How do we know who has changed the data and why? There is little point in the Data Management team having tight control over the data once it reaches us if there isn’t any quality control of the original data.

We also need to make sure we are selecting the most effective form of data for trial design. Often, data managers do not get a seat at the table when it comes to trial design. This can lead to data being collected for the sake of medical interest and sponsors paying for peripheral data which has no impact on, for example, regulatory approval of their drug. Involving data managers in trial design can cut this cost.

This increase in the volume of data alongside the current economic climate tightening the clinical budgets highlights the need for risk- based data cleaning focusing on primary endpoint and key study datapoints. Currently, I see a lack of understanding about what key, or critical, data is. When discussing critical data with clients I’ve been told several times that every field on the adverse event (AE) page is critical data because the study is evaluating safety of the drug, but this is not true. If, for example, their primary endpoint for safety evaluation is the frequency of treatment emergent AEs, a sporadic extra field added to the adverse event page of “was this event caused by COVID-19?” is not critical data. What is critical to know is the date the event occurred, and the date study drug was administered, because without that you cannot establish if the event is treatment emergent and should be included in the analysis. Knowing what the critical data is allows us to focus our data cleaning efforts and without building a clear understanding of critical data we cannot move to a more risk-based review led by data scientists.

Once we know what the critical datapoints are in a study, the next question is where to focus the rest of our data cleaning efforts. Data visualizations produced by data scientists provide us with this understanding. They give direction to the data management teams on where else there may be data issues rather than continue with the outdated method of cleaning everything. But again, it is about prioritizing and using a risk-based approach. Using a data scientist is a bit like using a metal detector to help you locate where to dig to find treasure in a field. You don’t have time to scan the whole field, so you must decide which part of the field to scan. Data visualizations take time to program, and we don’t have the time or the budget to check the whole database so the study team must think very carefully about what data is important or where we anticipate errors in the data.

Preparing For and Predicting Future Trends

In the future, as budgets and timelines are squeezed, we are going to need to do more risk-based cleaning, supported by data scientists. If we accept every data point does not need to be perfectly clean, data management teams will spend less time cleaning and sites will spend less time answering queries, which in turn will lead to cutting costs, and hopefully improving the quality of our critical data points.

Extrapolating the principles of risk-based monitoring and applying them to risk-based cleaning could allow a real growth in data science in the years to come.

However, if current trends continue, we will get more and more data in our clinical trials. If we want to reduce the costs of drug development, we also need to tackle this increasing volume of clinical trial data. Every data point that is collected must be entered into a database, cleaned, and stored and this comes at a cost. We need to take a step back and carefully consider what data we are collecting in our clinical trials and why.

For example, if a sponsor wants to test whether patients prefer writing in a paper or electronic diary, they are inclined to call it a clinical trial because it involves patients. From a data management perspective, the data then gets entered into a fully validated system, is cleaned to our high-quality standard, and stored in a secure facility. But they are not diagnosing or treating anything and are not providing an intervention to the patients care, they are just asking for preference, so does this need this gold standard approach to this data? The drinks industry does this type of preference study all the time – ‘do you prefer elderflower or orange?’. The pharmaceutical industry needs to define when an investigation is a clinical trial and when it is not to allow more opportunities for lower cost data collection and storage. We need to understand more about whether data needs to be collected to the highest standard, as seen in our clinical trials, or if it is only to add insight.

Conclusion

The increasing complexity and scale of clinical trials is leading to increased demand for specialist data-focused Contract Research Organizations (CROs). But there is also a need to increase genuine collaboration between experts and sponsors. Pharma companies are starting to recognize the value of data management and data scientists, but we need to be involved in every stage – right from trial design.

As we see more big data and growing data sources it will be increasingly important not to collect data for data’s sake. Data managers can help revolutionize how we collect and store data, while data scientists inform a risk-based approach to cleaning and prioritization. Both will ultimately save sponsors money.

Effecting this change is likely to be a slow creep. But by combining new technology with the expertise of data science teams we can ensure the industry is prepared for future trends and can optimize its decision making.

References

1. https://www.globenewswire.com/news-release/2021/01/12/2157143/0/en/Rising Protocol-Design-Complexity-Is-Driving-Rapid-Growth-in-Clinical-Trial-Data-Volume According-to-Tufts-Center-for-the-Study-of-Drug-Development.html

 

Nicola Griffiths, Manager of Clinical Data Operations at Phastar, has been working in the industry for over twenty years. Nicola started her career in data management at CRO Nottingham Clinical Research, (later bought by Worldwide Clinical Trials), working as a data entry assistant on paper CRF studies before moving into project management for several years. Nicola then joined Reckitt Benckiser (a consumer healthcare company) where she worked across various roles before returning to Data Management. At Reckitt Benckiser, Nicola worked on various clinical trials, not just on medicines but also on medical devices, cosmetics, and even general products. Nicola moved into management at SQN Clinical, (later to become Veristat), in the data management department, before moving to her current role at Phastar.

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from Pharmaceutical Outsourcing – all delivered right to your inbox! Sign up now!

  • <<
  • >>

Join the Discussion