Real-World Evidence in the AI Era: What You Can Unlock Depends on What You Build On

Written by Guy Amster, Senior Principal Machine Learning Engineer, Flatiron Health

Real-world evidence (RWE) has long held the promise of transforming how we understand cancer—turning the reality of everyday patient care into insights that inform research, clinical development, regulatory decisions, and treatment strategies. In the era of AI, that promise feels closer than ever. 

But as someone who has spent 7 years building AI systems for oncology data, I've watched a gap widen between what AI can do and what it does in practice. Across a survey of 90 RWD purchase decision-makers, 89% identified data quality and completeness as extremely or very important, followed closely by 84% who emphasized the importance of cohort size. 

Together, these findings point to a clear shift in the market: as we rely on data for more decisions, accuracy, depth, and representativeness are becoming more important. As that shift accelerates, data quality becomes even more important. AI does not solve for weak data foundations, it amplifies whatever foundation it is built on.

The First Bottleneck: The Data You Don’t Have

In oncology, RWE is most valuable when the clinical picture is complex. However, those are also the places where weak data foundations break down first.

The full patient journey lives across biomarker reports, pathology results, and evolving treatment decisions, much of which can't be captured in structured fields. An EHR isn't a research database; it’s a tool built for doctors to treat patients, and turning that record into usable evidence requires knowing what data should exist, where it is likely to appear, and how to interpret it in context.

In the pre-AI era, missing data often looked like missing data. Now, AI can fill the gap with something that sounds plausible—a hallucination. If left undetected, those errors can move downstream, distorting analyses and masking novel or unexpected signals in the data.

That's why the first step—before any curation, before any modeling—is assembling a dataset that reflects the actual complexity of cancer care. At Flatiron, our clinical and data experts work together to build a high-fidelity map of the patient experience, so the data foundation is strong enough to support the models and decisions built on top of it.

The Second Bottleneck: You Can’t Outperform Your Labels

Even with the right data, there is a second constraint that is just as fundamental: you cannot build higher-quality outputs than the signals used to train and evaluate your models. Foundation models can't match human experts "out of the box," but they can get there with iteration, if the right labels are available. 

In oncology, those labels are best when they come from expert human abstraction: clinicians and trained professionals who interpret nuance, context, and ambiguity in ways that models cannot replicate independently. There's a persistent narrative that AI reduces the need for human expertise. From where I sit, the opposite is true. Every high-performing system I've built has required more investment in human labeling, not less—because without high-quality labels, you cannot reliably measure whether each iteration is actually improving performance. You cannot achieve gram-level precision using a scale that only measures in kilograms, and no amount of architectural cleverness changes that.

At Flatiron, scaling AI has meant doubling down on this interplay between human expertise and machine extraction, iterating continuously until performance is both measurable and meaningful. The goal is not to remove humans from the loop, but to scale their expertise in a way that preserves and scales clinical fidelity.

Proving and Not Assuming Quality

The next problem: modern AI systems can generate datasets that are internally consistent, statistically plausible, and analytically convenient—characteristics that used to signal quality, but no longer prove quality. Today, AI-generated data can pass superficial checks while still failing where it matters most: rare cohorts, complex eligibility criteria, and multi-variable analyses, where small errors compound into significant distortions.

For RWE users, this shifts the burden of proof. It's no longer enough to ask whether a dataset "looks right." The question is whether it's been rigorously evaluated against the realities of clinical care and whether its limitations are understood.

At Flatiron, we've built the VALID framework, a methodology for evaluating AI-curated oncology data across three pillars: variable-level performance metrics that benchmark LLM output against expert human abstraction; automated verification checks that identify internal inconsistencies and implausible values; and replication analyses that compare LLM-derived findings to established clinical results.

The Real Inflection Point: From Data to Decisions

RWE is increasingly being integrated into complex clinical decisions. A clinical development team can pressure-test trial designs using digital twin approaches before enrolling a single patient. Researchers can study outcomes in rare or underrepresented populations with confidence in the underlying data. But these systems can only produce clinically sound results when the underlying data is trustworthy.

In a world where advanced models are widely accessible, the differentiator isn't who has the best algorithms. It's who has built systems that consistently produce answers reflecting clinical reality-across edge cases, complex patient populations, and the full spectrum of decisions that biopharma organizations need to make. 

For life sciences teams, this changes the mandate. The question is no longer “What can AI do?” It’s “What can we rely on to do, repeatedly, in the most complex scenarios?” The organizations that win in this next phase will not be those with the most tools, it will be those whose tools have access to every piece of data, and can replicate expert-level quality, even in the most complex clinical settings. 

Because in oncology, the goal isn’t just to generate insight. It’s to generate insight you can act on, with confidence.

Interested in how Flatiron is shaping the future of oncology? Join us at the ASCO Annual Meeting to learn more.

The editorial staff had no role in this post's creation.