Agentic AI in Medicine: Where Assistance Should End and Autonomy Begins
- Angelo Materlik

- Jan 12
- 4 min read
The conversation around AI in healthcare often jumps from helpful pattern recognition to the idea of autonomous agents that diagnose, order tests, and update records on their own. Some studies claim agentic systems can outperform clinicians on certain benchmarks; skeptics worry about overconfidence, bias, and responsibility when things go wrong. Both sides have a point. The critical question is not whether medicine should use agentic AI, but where assistance should end and autonomy should begin.
Agentic AI differs from traditional predictive models in one key respect: agency. A model that labels a CT scan is advisory; a system that automatically triages studies, places orders, or drafts a discharge summary is acting in the world. That action carries implicit claims about confidence, context, and consequence. In clinical environments—regulated, safety-critical, and time-pressured—the bar for autonomy must be higher and differently defined.
The right framing is not “AI versus clinicians.” It is “task decomposition and appropriate autonomy.” Many clinical workflows are composed of micro-tasks with very different risk profiles. Some can be automated with reversible, contained actions. Others demand human judgment with full accountability.
A practical framework:
- Classify tasks by risk and reversibility. Low-risk, reversible tasks (drafting documentation, pre-populating forms, sorting queues) are suitable for agentic AI with human review. High-risk, irreversible decisions (final diagnosis, therapy initiation, order submission without review) should remain under clinician authority.
- Require uncertainty awareness. An agent that never abstains is unsafe. Systems must estimate uncertainty, detect out-of-distribution inputs, and be able to defer, escalate, or request more information.
- Build guardrails at multiple layers. Capabilities should be constrained by design (what actions are permitted), by policy (who can approve), and by context (when actions are allowed). Logging and auditability are non-negotiable.
- Evaluate beyond accuracy. Metrics like calibration, abstention rate, time-to-intervention, and harm-aware outcomes matter more than raw accuracy on a test set. What matters is whether the agent improves care without increasing risk.
Where agentic AI can help today:
- Radiology worklist triage: Agents can reorder studies based on preliminary findings (e.g., suspected intracranial hemorrhage) detected by image models, ensuring urgent cases surface first. The agent acts on the queue, but a clinician confirms the diagnosis.
- Documentation assistants: Agents can draft discharge summaries from structured data and clinical notes, extract key problems and medications, and suggest billing codes. Clinicians review and sign. The action (drafting) is reversible and saves time.
- Medication reconciliation: Agents compare medication lists across records, flag discrepancies, and propose reconciled lists. Clinicians decide, but the agent does the tedious part.
- Sepsis alerts with action templates: An agent can monitor vitals and labs, flag possible sepsis, and prepare order sets and documentation templates. A clinician approves the orders.
These uses share a pattern: the agent initiates helpful actions inside guardrails, and a human with legal responsibility approves the final step.
By contrast, places to avoid autonomy (for now):
- Definitive diagnosis without human confirmation, especially for rare or atypical cases where models are least reliable.
- Independent ordering of high-risk tests or therapies.
- Communicating bad news to patients or caregivers.
- Overwriting the medical record without a review loop.
This is not timidity; it is a design choice consistent with safety engineering in other domains.
Liability and governance matter. If an agent makes an unsafe recommendation that a clinician accepts, who is responsible? Today, clinicians sign off on final diagnoses and orders; that convention should hold until legal frameworks evolve. Vendors should carry appropriate risk-sharing and provide transparent documentation about model limitations, testing, and monitoring.
Regulation is catching up. Under the EU AI Act and medical device regulations, many agentic functions will be classified as higher risk, requiring rigorous quality systems (ISO 13485), post-market surveillance, and change control. Hospitals should treat agentic AI like any other safety-critical technology, with formal procurement, validation, and monitoring processes.
Operationalizing agentic AI requires attention to workflow:
- Clear handoff moments: The agent proposes an action; the clinician accepts, modifies, or rejects. Interfaces should make reviewing and editing fast and obvious.
- Escalation pathways: If the agent detects high uncertainty or conflicting signals, it should escalate to the appropriate role (e.g., attending physician) rather than push ahead.
- Monitoring and feedback loops: Capture acceptance/rejection rates, categorize failure modes, and use them to retrain or recalibrate. If clinicians stop trusting the agent, usage will quietly decay.
- Training and change management: Clinicians need to understand capabilities, limits, and how to override. The goal is to reduce cognitive load, not add another noisy channel.
Evaluation must reflect reality. Benchmarks that reward single-answer accuracy are not enough. Prefer evaluations that simulate workflow: time savings, error reduction, appropriate deferral, and outcomes. Calibration is especially important; an agent that says “I don’t know” appropriately is safer than one that is confident and wrong.
Finally, transparency with patients is critical. If agentic systems touch care, patients deserve to know in general terms how automation is used, what protections exist, and how their data is safeguarded. Trust is easier to maintain than to rebuild after a misstep.
Agentic AI has a place in medicine today—inside carefully defined boundaries. Focus on reversible, low-risk actions that save time and attention for clinicians. Build in uncertainty awareness, escalation, and auditability. Evaluate beyond accuracy, and align liability with existing clinical responsibility. Done this way, agentic AI becomes a quiet force multiplier: it does the tedious work, surfaces the urgent, and lets clinicians spend more time making the judgments only they can make.
Listen to the full conversation: https://feeds.transistor.fm/born-kepler



Comments