AI Medical Scribes Hallucinating Conditions During Patient Visits

This article digs into Ontario’s auditor general report on AI medical scribes. The report found that twenty government-approved platforms weren’t properly evaluated, and testing turned up hallucinations, inaccuracies, and missing data.

Thousands of Ontario doctors already use these tools in real clinics. That’s got folks worried about patient safety and whether AI-generated notes can really be trusted.

Zooming out, the piece also brings up similar debates in the U.S. about AI scribes like OpenEvidence. It’s pretty clear this isn’t just a local issue—there’s a bigger conversation happening about oversight and clinical risk.

Table of Contents

Ontario’s audit findings and their implications

The Ontario audit looked at 20 government-approved AI scribe platforms. Testers found big problems: hallucinations, wrong info, and incomplete data popped up during procurement testing.

The report warns that these issues could lead to bad treatment plans or even harm patients. There’s this tension—people want faster, more consistent documentation, but flawed notes could sneak errors into patient records.

Ontario’s procurement minister said the hallucinations only showed up during testing, not in real patient care. Still, the audit points out that about 5,000 doctors in Ontario already use these scribes.

Auditor Shelley Spence even asked her own doctor to review transcripts after a visit. That’s a pretty telling sign that people are worried about how reliable these AI notes really are in everyday care.

Implications for patient safety and clinical workflow

If AI-generated notes are wrong, those mistakes can ripple into diagnoses, prescriptions, or follow-ups. That’s a real risk to patient safety and care continuity.

The Ontario case is part of a bigger debate—how do we balance the time-saving perks of AI scribes with the need for accuracy and accountability?

The OpenEvidence example and the broader AI-scribe landscape

The report mentions OpenEvidence, a U.S. platform that’s been criticized for overstating results from small studies. Some doctors say OpenEvidence gives incomplete answers or makes things sound better than they are when data is limited.

This seems to be a pattern. AI tools might look good in controlled tests but can stumble when they’re used with real patients who don’t fit the mold.

Clinicians like saving time and getting more consistent notes. But there’s this growing sense that we don’t yet know how these tools will really perform in the wild. The Ontario findings—and similar worries in the U.S.—make it clear we need more ongoing oversight and transparency about what AI notes can (and can’t) actually do.

Policy, oversight, and practice implications

There’s a big takeaway here: we need stricter evaluation and ongoing oversight before AI scribes go mainstream in healthcare. The Ontario report calls for solid validation, strong monitoring, clear accountability, and transparent reporting.

If we skip those steps, automation could backfire—misdocumentation and patient harm are real risks.

Recommended safeguards and actions

Set up thorough validation protocols that cover lots of real-life situations and patient types
Keep monitoring after deployment, with regular audits and a simple way to report errors
Label AI-generated notes clearly as assistant-supplied, and make sure clinicians double-check critical stuff
Offer targeted training so clinicians know what AI can and can’t do—and how to handle risks
Build governance structures so someone’s accountable for AI outputs and decision support
Publish dashboards and incident stats to help everyone keep improving

Practical takeaways for healthcare teams

As AI scribes keep spreading, clinicians and administrators need to look past the hype. Patient safety and reliability should come first.

The Ontario findings give a cautious playbook: focus on verification, transparency, and good governance if you want to roll out AI documentation tools responsibly.

Actions for practice

Try using AI scribes to help with documentation, but don’t let them replace your clinical judgment.
Always have a human review any AI-generated notes before sharing them with care teams or patients.
Make sure you connect AI tools with existing electronic health records. That way, things stay consistent and you can track changes.
Set up clear escalation pathways so you know what to do when you spot inaccuracies or something just seems off.
Keep clinicians involved in evaluating and giving feedback on these tools. Their input really does make a difference in improving performance.

Here is the source article for this story: Doctors’ AI Systems Are Hallucinating Nonexistent Medical Issues During Appointments With Patients

Additional Reading: