Data Exchange Supersedes Structured Document Authoring

Tuesday, November 1, 2022

James Kelleher- Founder and CEO of Generis Corporation

As regulatory and business processes are increasingly data-first, structured document authoring is facing obsolescence. Next-level structured content authoring places the emphasis on ‘data objects', reducing the resources and manual steps needed in authoring and version control by a factor of ten. Generis’ James Kelleher outlines practical steps for companies looking to reap the rewards of a data-first approach to process management.

Structured authoring was designed to be the future of document authoring – the ultimate efficiency in presenting information. But the world is now moving away from documents as the medium of business information exchange. In the future, more and more information-based processes will be data-driven. This is great news, suggesting an end to issues with version control and a more traceable line back to the master source of intelligence.

But it does mean that structured document authoring, as a much-anticipated technology proposition, is already obsolete - before it really had a chance to get off the ground. For document management technologists and their customers, this is likely to feel very frustrating. Certainly, it will require a big strategic adjustment. But the upside is that companies now have a chance to leap-frog straight to a solution that is much more fit for purpose, transformational, pliable and sustainable in the long term.

Practical Issues with Reusable Content

The original concept of structured document authoring, which dates back to the 1990s, is based on building routine documents from re-usable segments of content. Although the premise has much to recommend it, reducing repetitive labor and the potential for error, it soon comes up against practical limitations – even in a document-centric world. If the approved, reusable content assets are entire paragraphs or sentences, typically these will need to be tweaked for each use case, for instance. With each edit a new version of that content is created, with implications for change management.

In the meantime, the focus of regulators and of recommended business practice more generally, has shifted towards live data as the primary source of truth, and as a means of transforming processes. This move away from document-based submissions and reporting further erodes the business case for structured document authoring.

Although regulated documents as official records of a product won’t disappear overnight, their ’Best Before’ date is drawing ever nearer. During the latter stages of the transition to ISO IDMP compliance in the EU, for instance, published documents will be phased out in favor of rolling data-based submissions: data that regulators can choose to analyze in their own way.

Ultimately, data-based information exchange will become the preferred norm for regulatory submissions, PSMF (safety master) files and APQR (annual product quality review) reports. In fact, PV case file submissions in Europe are already submitted in data form.

In the meantime, data and any published reports created from it must be consistent in its detail. This is both to avoid duplication of effort, and to avoid regulators raising concerns and queries.

To achieve that consistency right across an organization and its various processes, companies must embrace a robust new approach to data and content management which transcends individual functions or use cases.

Data is Now the Strategic Focus

Strategically, the focus of new content management investments must now be the data itself, and how this is managed so that it can be used more dynamically for publication - without the risk of a future loss of integrity or consistency between the data and any associated narrative.

Next-level structured content authoring places the emphasis on ‘data objects’. That data object might be ‘Study 123 is a 3-day short-dose study in male rabbits’, for instance. Creating a narrative now means pulling in these ‘data objects’ and inserting minimal ‘joining text’ to make the data more readable in a particular context.

Here, if core information changes, updates can be made at a source level and automatically cascaded down through all use cases for that data object, without the need for extensive manual intervention. That is, there is no longer the need to track down derivative content or engage in a hefty version control exercise. Simple change rules make it easy to keep everything up to date.

Conditional text use becomes much easier to provide for, too, once the emphasis is on joining data objects rather than reusing entire sections of narrative. Now, rules can be set to the effect of: ‘If the study type is male rabbits, use X text’.

This approach to content preparation offers much more dynamism and flexibility than a structured document authoring scenario. With the persisting diversity in requirements between the different regulatory authorities, this controlled flexibility is very useful. ‘Buttons’ could even be created to automatically generate the preferred content configuration required by each authority.

One Unified Data Source

Moving away from documents and even from reusable content requires a different mindset, and this is probably one of the biggest barriers for companies currently.

Relying less on Word might seem to imply that teams will need to become proficient in XML. Yet this perception is tied up with the traditional treatment of content – in contrast to the new scenario where the focus is the master data and adding to this to enrich associated company-wide knowledge (around a given product and its evolving status), and where editing can be done in the new breed of user-friendly tools whether for data, Word or XML.

This is about teams from multiple functions all contributing to and enhancing one unified data source, rather than each continuing to enter their own particular information of interest into their respective systems (Clinical, Regulatory/RIM, etc).

The new 360-degree data model also removes the convoluted process of content authors having to seek out the respective subject experts/content owners to prepare their section of a report or other document. If all of the required data, in rich granular form, already exists in a definitive resource, the author can simply produce their output with a short series of clicks. (This needn’t mean dispensing with human-based checks and fine-tuning of output content, but it does remove the need for protracted upstream authoring processes.)

The potential benefits of working with data objects are significant. At a conservative estimate, there is scope to reduce the effort of producing a final draft for approval by a factor of ten, thanks to the reduced specialist resources and manual steps needed in authoring and version control. That’s in addition to huge savings in the time and effort that would otherwise be needed to manage components – including decisions about the levels of granularity, rules around re-use, traceability of data into the output, and an entire migration of larger documents into smaller documents/components.

Create an Information Lake

As to how companies might skip a generation of automated content authoring and go straight to the real rewards of a data-first approach to process management, this starts with moving all data to a unified platform to form a definitive ‘information lake’. As long as all of the contributing data is accessible, of good quality, complete and current, and usable for the intended purposes, this dynamic repository could serve as a critical enabler underpinning a range of use cases.

Once the caliber of this core enterprise asset is assured, companies can start to add the automation rules to ensure that the right data is leveraged in the right place at the right time, to support each given process.

Beyond the life sciences sector there are strong precedents for using trusted data objects to construct content – in the airline and automotive industries, for instance, where precision, rigor and safety are as critical as they are in life sciences. Multiple companies are already working with and benefiting from this approach right now. Life science companies have a real opportunity to take a step change in technology, driving competitiveness and better patient outcomes.

James Kelleher is the founder and CEO of Generis Corporation, the creator of CARA™, a data and content management platform that helps companies in regulated industries, like Life Sciences, transform their complex business processes. www.generiscorp. com; [email protected]

Subscribe to our e-Newsletters
Stay up to date with the latest news, articles, and events. Plus, get special
offers from Pharmaceutical Outsourcing – all delivered right to your inbox! Sign up now!

Data Exchange Supersedes Structured Document Authoring

Practical Issues with Reusable Content

Data is Now the Strategic Focus

One Unified Data Source

Create an Information Lake

Join the Discussion

2024 Contract Services Directory and Guide

2024 Contract Services Directory and Guide

Subscriptions

Related Categories

Clinical Data Capture / Electronic Data Capture (EDC) »

Clinical Research Data Management »

Clinical Research Organizations / Clinical Research Services (CRO Outsourcing) »

Clinical Trial Data Management Outsourcing / Data Management CRO »

Clinical Trial Development Services and Outsourcing »

Contract Pharmaceutical and Clinical Trial Services »

Clinical Trial Biostatistics Outsourcing (Biostatistical Services) »

Clinical Trial Electronic Data Capture Software (EDC Software) »

Connect with Us

Sign up for our eNewsletter!