February 17, 2025

Ossified Data in the Life Sciences

While structured databases are designed for easy querying, and generative AI has made great strides in handling unstructured text, the life sciences faces a significant gap in terms of what data automated systems can handle. At Artos, we call this Ossified Data—data that was once structured but has been trapped in PDFs, RTFs, or other unstructured formats due to legacy systems, regulatory requirements, or internal data ownership complexities.

Ossified Data in the Life Sciences

The Unique Challenge of Tabular Data

While structured databases are designed for easy querying, and generative AI has made great strides in handling unstructured text, the life sciences faces a significant gap in terms of what data automated systems can handle. At Artos, we call this Ossified Data—data that was once structured but has been trapped in PDFs, RTFs, or other unstructured formats due to legacy systems, regulatory requirements, or internal data ownership complexities.

Ossified data is structured, yet inaccessible. It can’t be easily queried, and AI models struggle to interpret it accurately. While the simple answer might be to directly access the structured data, lack of standardization across databases, different data owners across different organizations, the uniqueness of data generated from different types of drug programs, and legacy requirements from regulators make this a challenging problem to fully solve for. To make matters trickier, a lot of times, data has to be moved into more ossified formats.

Why Traditional Approaches Fail

Current solutions fall short in handling this data effectively:

  • Rule-based systems are too rigid – Drug programs can have unique data requirements, making it difficult to build deterministic rules that generalize across cases.

  • OCR-based solutions distort structure – While OCR can recognize text, it often fails to maintain the structural relationships in tables, leading to misaligned columns and lost context.

  • LLMs are not optimized for tables – Large Language Models can process natural language and some structural components well but struggle to balance where to rely on vision-related and text-related interpretations of this data.

  • Manual processing is slow and error-prone – This is how most organizations do it today. Organizations often rely on human review to extract and format data, but this is inefficient, costly, and error-prone.

Is This a Solvable Problem?

We hypothesized that a novel kind of AI-driven pipeline could extract, interpret, and structure Ossified Data accurately, mapping any input format to any output format while preserving context. To test this, we built and refined an AI-powered transformation system designed to:

  • Process input from any format – Extract tabular data from PDFs, RTFs, and other unstructured sources.

  • Maintain structural integrity – Preserve column relationships and contextual dependencies.

  • Adapt to different regulatory needs – Standardize data despite variations in format and layout.

The Answer: Yes

At Artos, we found out the answer to that question is yes. Over tens of thousands of tables, we've been able to show that we can accurately extract, transform, and insert tables into all the common situations that life sciences companies will face. To confirm this, across all of those tables, we had human medical writers the outputs for each one of those tables and confirm performance of the systems. Check out the data below:

Where Have We Seen This Be Most Useful?

Transforming Ossified Data into structured formats has proven to be highly valuable in several key areas:

  • Clinical Study Report tables – Ensuring accurate and consistent tabular data for regulatory submissions.

  • Non-clinical tabulated summaries – Streamlining the integration of preclinical data into standardized formats.

  • CMC documentation – Organizing complex manufacturing data for clear regulatory reporting.

  • And much more!

While change management remains a challenge, the need for reliable data transformation is one of the most exciting use cases for AI in our opinion.

At Artos, we love tackling issues like this. And we have a lot more of things like this coming down the pipeline. If you have thoughts, we'd love to talk more about it!

Similar Blogs

Stay Informed, Subscribe to Our Newsletter

Sign up for our newsletter to get alerts on new feature releases, free resources, industry insights, and more.

Stay Informed, Subscribe to Our Newsletter

Sign up for our newsletter to get alerts on new feature releases, free resources, industry insights, and more.

Stay Informed, Subscribe to Our Newsletter

Sign up for our newsletter to get alerts on new feature releases, free resources, industry insights, and more.