The Generative AI Sniff Test for Medical and Regulatory Writing Solutions

The team here at Artos figured that it might be timely to write about conferences. With folks coming back from summer vacation, conferences are picking up again. AI has been and will continue to be a central theme at many of these as the biotech, pharma, and medical devices spaces continue to grapple with how to best deploy these solutions to create value for the sector. And conferences are an industry staple for the life sciences - a convergence the life sciences space in particular relies on to learn and progress.

Conference Season

Especially with all the vendors out there in the AI space, we here at Artos decided it might be nice to write up a little bit more about how to best assess the AI vendors out there.

A New Dynamic: Evaluating AI Tools vs. Traditional Software

At conferences, demos are often a key part of the vendor engagement process. However, it’s important to understand that evaluating Generative AI tools, especially in a demo setting, is different from traditional software evaluations. Unlike traditional, deterministic software, Generative AI tools are probabilistic, meaning their outputs may vary even when the same inputs are used. This dynamic can make the conference demo process feel more ambiguous.

During a conference demo, you might see an AI tool that produces impressive results in front of an audience. However, you should be mindful of the fact that these tools could behave differently in practice, particularly when working with your organization’s proprietary data.

Given that the accuracy of these AI systems is the main determinant of whether it is useful, and given that it’s nearly impossible to assess the accuracy of these AI systems in a demo environment, what’s the best way to determine how good an AI system is?

A demo or trial run with your own data is the only real way to know whether it will work or not. And even in those situations, if it works once, there’s no guarantee that it will work the next time. This makes it tricky to evaluate generative AI solutions, especially in the medical writing space.

A Noisy Space

To make matters more difficult, it’s actually remarkably easy for companies to build a good-looking Generative AI proof-of-concept. For a variety of technical reasons, this wave of AI is particularly powerful because of how easy AI is to access relative to past generations of AI tools. Combined with the more natural, text-based inputs and outputs, and the barrier to building an AI system that looks like it works is actually not high. And this is not just true of AI vendors, this is also true of internal teams.

This means getting something that looks good is easy; getting something that works reliably at scale is hard.

When you combine the probabilistic nature of Generative AI in the medical or regulatory writing space with the low barrier to building an initial Generative AI product, you end up with the situation the life sciences industry finds itself in now: lots of noise and hype around AI, and lots of difficulty finding AI that actually works.

Cutting Through the Noise

At Artos, as we’ve continued to build out an incredibly robust platform of AI solutions for the life sciences, we’ve learned that there are non-negotiables that any mature AI product team must understand. And by asking questions around this, those looking to evaluate AI can determine how likely an internal team is to succeed at delivering a Generative AI project or how likely a vendor is to actually have a good AI product.

There are generally two categories of questions to ask about:

AI’s shortcomings. AI is known to have problems, such as hallucinations. And in the life sciences in particular, it is often not specific enough about details to meet the requirements of deep subject matter experts. Teams that are or will actually build great AI products will have very detailed answers to how they solve for hallucinations, how they accurately retrieve information from a large corpus of documents, how they ensure that information is presented correctly, and more. Most of the answers to these questions can’t be said in just one or two sentences, even with fancy acronyms like RAG. Importantly, the best AI teams will be able to distill down how their systems work to terms that are understandable to people who aren’t AI or computer science practitioners. So, even as a non-technical person, you should feel confident asking these questions and expecting the answers to make sense.
Workflows. The teams who have thought most about AI realize that AI is only part of the solution. You can read our blog post Why Generative AI is not Enough for more information. You should be able to understand exactly how an AI system fits into your workflow - how do you kick off the process of creating a draft? Editing it? Adding new information to it? Getting it reviewed by multiple people? These questions can really help people understand how well the people building AI products understand how they’re supposed to be used, which can be a really helpful proxy for which products are actually good, or, in the case of teams building internally, which products are likely to succeed.

Conclusion

Approaching AI with these questions can help many in the life sciences space overcome the practical, execution-oriented hurdles associated with Generative AI right now, especially in the medical and regulatory writing space. This can help filter vendors, internal AI initiatives, and more, so teams can spend more time interacting with and investing in AI that works.