top of page

An Agentic Clinical AI Assistant, Tested in Real Clinical Reality

  • Writer: Matthew Hellyar
    Matthew Hellyar
  • 2 hours ago
  • 6 min read
What a world-class system looks like — proven across 200+ documents, 30+ patients, and four evaluation series with zero hallucinations.

Respocare connect AI agentic clinical AI assistant bot

There is a question Respocare is asked more than any other.


What does an agentic clinical AI assistant actually do in a real clinical environment?

Not in a demo. Not in a pitch deck. Inside the messy, contradictory, longitudinal reality of a specialist managing a patient with eight conditions, twenty-eight documents, a contested allergy history, and fifteen minutes to make a clinical decision.

The answer is no longer hypothetical.


Over the past nine months, Respocare Connect AI has been evaluated across four progressive clinical series — each one harder, more adversarial, and more clinically representative than the last. The cumulative result, audited and documented, is the most rigorous evaluation record any agentic clinical AI platform in South Africa has produced.


Four evaluation series. 200+ clinical documents. 30+ patients. Zero hallucinations.


This article walks you through what that evaluation actually looked like — not the marketing version, the methodology version — and what it means for the future of healthcare in South Africa and beyond.



What an Agentic Clinical AI Assistant Is


Before the evidence, a definition. The category is new enough that the language is still being decided.


An agentic clinical AI assistant is a reasoning system. Not a transcription tool. Not a chatbot. Not a search engine with a clinical badge.


It operates across a patient's full longitudinal record — every visit, every letter, every lab, every imaging report — and it does five things in sequence:


It retrieves the relevant clinical evidence before producing any output. It reasons across that evidence to surface patterns, trajectories, and contradictions. It acts by generating clinical decision support — pre-visit briefs, handover documents, care gap audits. It refuses to produce a recommendation when the required evidence does not exist in the record. And it escalates when clinical complexity exceeds what AI should resolve alone.

Five behaviours. They are non-negotiable. Together, they are what separates a clinical AI assistant from a confident-sounding text generator.


The world-class question is whether a system can perform all five — consistently, under pressure, across adversarial conditions designed specifically to make it fail.

That is what Respocare Connect AI has been tested against.



The Real-World Test: Meet Margaret Venter


The evaluation was built around a synthetic but clinically realistic reference patient.

Margaret Venter. 56 years old. MRN ES4-001.


Margaret was not designed to be easy. She was designed to be the kind of patient who breaks lesser systems.


  • Eight simultaneous active conditions — hypothyroidism, type 2 diabetes (new diagnosis), hypertension, hyperlipidaemia, a pulmonary nodule under surveillance, gastro-oesophageal reflux, pre-existing osteopaenia, and a long-standing mood disorder.

  • Twenty-eight clinical documents across four visits — GP notes, specialist letters, lab reports, imaging, allergy histories, medication lists, discharge summaries.

  • Three confirmed allergies — Penicillin (confirmed 2011), Latex (confirmed 2015), Sulphonamides (suspected, never fully clarified).

  • Six deliberately embedded clinical traps — contradictions, missed follow-ups, prescribing risks, and edge cases designed to test whether the system would fabricate confidence where evidence was thin.


Margaret was given to Respocare Connect AI the same way a real patient would be: as a folder of clinical documents, in different formats, from different sources, written by different clinicians, across different visits. No structured prompts. No pre-cleaned data. Just the record as a clinician actually encounters it.


Then the system was asked to do what a clinician would do — produce a clinical position. Reason across the record. Surface what mattered.



What the System Did


Across all four visits and every clinical prompt, here is what Respocare Connect AI demonstrated.


It retrieved every confirmed allergy in every document where it was present. Penicillin was indexed correctly across all eight document types. Latex was tracked from its 2015 confirmation onward. The suspected Sulphonamide allergy was surfaced as suspected, not as confirmed — a small distinction with significant prescribing implications.


It tracked Margaret's TSH trajectory across visits. From 18.4 mIU/L at first presentation — a critically elevated value — through to 2.1 mIU/L on the third measurement. Hypothyroid, now controlled. The system did not present the latest value in isolation. It presented the line.


It detected her new diabetes diagnosis. HbA1c 6.4% at diagnosis, 6.1% at follow-up — trending in the right direction. The system flagged the diagnosis as new, identified the controlling medication, and surfaced the trajectory.


It tracked her blood pressure response. 148/92 mmHg at presentation, 128/80 mmHg on review. The antihypertensive was working. The system named the medication, the dose, and the response.


It surfaced all six embedded clinical traps. Every contradiction in the record. Every missed follow-up. Every prescribing risk. None were fabricated. None were missed.


It refused when evidence was insufficient. When asked to recommend a course of action that required clinical information not present in the documents, the system did not generate a confident-sounding answer. It stated what evidence was needed and stopped. This behaviour — refusal under pressure — is the one most clinical AI systems fail.


It cited every clinical statement to a source document. Every TSH value, every medication dose, every clinical finding traceable to the document it came from.

Across 28 documents, four visits, six embedded clinical traps, and every prompt output: zero hallucinations.


Series 4 average score: 9.79 / 10.


The evaluation signed off as GO. The commercial-facing conclusion in the report was direct: a GP could hand Margaret Venter over to a locum colleague using only the platform's output, and that colleague could treat her safely.

That is the capability claim. It is unusual in clinical AI because it is testable.



The Cumulative Record


Margaret's evaluation — Series 4 — was the most demanding of the programme, but it was not the first.

Series

Documents

Visits

Focus

Score

Hallucinations

Series 1

15

2

Foundational retrieval

0

Series 2

20

3

Clinical safety triggers

0

Series 3

37

8

Adversarial proving ground

9.2 / 10

0

Series 4

28

4

Phase 2 live evaluation

9.79 / 10

0

Cumulative

200+

30+ patients

Full programme

Improving

0

This is not a marketing number. It is a structural property of the architecture.

Respocare Connect AI is built on retrieval-augmented generation — the architectural principle that the AI must retrieve verified information from the patient's documents before generating any output. The rule is one sentence long: if it is not in the documents, it is not in the output.


That rule is what produces the zero. It is also what produces the refusal. A system that retrieves first cannot fabricate a fact that was never in the source material. It can only return what is there, flag what is missing, and refuse what it cannot defend.


This is what world-class architecture looks like at this stage of clinical AI — not the absence of gaps, but the discipline to find them, name them, and fix them before they reach a patient.


What This Means for South African Healthcare — and the World


South Africa is the launch market for Respocare Connect AI. It is not the limit.


The architecture is global from day one. The platform is HIPAA and POPIA compliant simultaneously, by architectural design, not by retrofit. Active conversations with healthcare partners in Dubai, the United States, and the United Kingdom are already shaping the deployment roadmap.


There are perhaps ten companies in the world right now credibly building agentic clinical AI. Respocare Connect AI is one of them — and the only one whose full evaluation methodology has been published from a South African base.

For South African specialists, GPs, nurses, dentists, allied health professionals, and psychologists, this is what the platform offers:


  • Time returned. Clinicians lose nearly 28 hours per week to administrative work. An agentic assistant performs the retrieval and synthesis work before the consultation begins.

  • Clinical clarity. The full longitudinal patient record, synthesised into a clinical position the clinician can interrogate in seconds.

  • Trust earned through transparency. Every clinical statement traceable to its source document. Every refusal explained. Every limitation named.

  • Safety architecture by design. Three-layer clinical safety, patient scoping at the identity layer, mandatory allergy pre-retrieval, refusal as a first-class behaviour.


It is not a product that demands attention. It is intelligence that disappears into practice.


The Principle Respocare Was Built On


Keep healthcare human. Make technology invisible.

Every architectural decision Respocare Connect AI makes is governed by this principle. If a feature improves performance but compromises clinical judgment, it does not ship. If a behaviour increases speed but reduces the clinician's ability to interrogate the output, it does not ship.


The clinician is always in the driving seat. The AI hands over fully cited, fully bounded synthesis. The clinical decision remains where it belongs.


This is not a marketing line. It is the constraint that determined what got built.



What's Next


Series 5 — the autonomous agent evaluation — is the next milestone in the trial programme. The objective is to move from documentation support into proactive clinical reasoning, with the same architectural discipline that has held across Series 1 through 4.


The first public preview of the interactive clinical tutorial is also imminent. Fifteen scenes. One synthetic demo patient. The full longitudinal reasoning of Respocare Connect AI rendered visible.


For specialists, clinicians, and healthcare partners interested in trial participation, early access, or strategic partnership, the door is open.


The frontline of clinical AI in South Africa is small. It will not stay small.

Continue reading


  • The Agentic Report — the weekly Wednesday dispatch from the trial floor, free at respocareinsights.io




About the author

Matthew Hellyar is the Founder and Chief Developer of Respocare Connect AI and a Strategic Partner at Respocare. He writes weekly in The Agentic Report from inside the Phase 2 clinical evaluation programme.

Comments


bottom of page