A clinical-grade triage agent across four regions.

Under NDA — name redactedHealthcare AIHIPAAUnited States14 weeks · 2025

62%Faster intakevs. nurse-only baseline

0PHI incidentsin 6 months live

47Clinics liveacross 4 regions

8,000Eval casesreviewed by clinicians

The brief.

A US healthcare operator running 47 clinics across four regions needed to triage symptom-intake forms in seconds. Speed mattered — patients in the queue wanted to know whether to wait, drive to an ER, or book a routine appointment. But the failure mode that could not happen was missing a high-acuity case. A delayed cardiac symptom misrouted to “routine” was the kind of outcome that ended companies.

The operator had tried two off-the-shelf vendors. Both produced systems that worked in the demo and quietly degraded once real intake forms — with their typos, code-switching and missing fields — started flowing through.

Devmint partnered with the operator's clinical leadership and engineering team to ship the agent in production — with the eval, the audit log, and the on-call rotation that an HIPAA-bound clinical system actually requires.

What we shipped.

Routing · 018-route specialty triageCardiology, urgent care, primary, dermatology, ENT, orthopedics, pediatrics, OB/GYN. Each intake form is classified against the routing graph with confidence scores, not a yes/no flag.
Safety · 02Hard-coded escalationSpecific symptom patterns — chest pain, stroke indicators, pediatric fever above thresholds — escalate to a nurse-on-call within 15 seconds. No LLM judgement involved on these; the rules are deterministic.
Compliance · 03On-prem PHI inferencePHI-bearing fields are processed on the operator's own infrastructure with a locally hosted model. No identifiable data leaves the operator's network. The cloud LLM only sees de-identified structured features.
Audit · 04Replay-ready audit logEvery decision, every retrieval, every model version is logged. A compliance officer can reconstruct any single triage decision against a newer model — the audit log was a precondition for the contract.

How we built it.

The framing we landed on with the operator was that the agent is a triage assistant, not a triage decision-maker. Every escalation, every routing recommendation is reviewed by a nurse-on-call before it reaches the patient. The agent's job is to do the speed work — the parsing, the classification, the initial routing — and the human's job is to confirm.

That framing is what made the engagement defensible. It also informed the eval. We didn't evaluate the agent against “correct decision” — we evaluated it against “nurse-on-call sees this in <30 seconds with the right context attached.” The 62% intake speed improvement is measured against that target, not the headline.

The eval harness ran against 8,000 historical cases, each reviewed by a board-certified clinician. Every PR ran the full eval. New regions could not go live until they hit 99.4% on the high-acuity correctness floor.

The stack.

AI & infra

Anthropic Claude (cloud, de-id)
Llama 3 70B (on-prem, PHI)
Custom triage orchestration
Postgres + audit log table
Datadog · custom SLO dashboards

Application

Next.js 15 · TypeScript
Nurse-on-call dashboard (real-time)
FHIR R4 ingest
On-prem deploy + Terraform
Sentry · PagerDuty for SEV-1

“We hired Devmint to build an agent. What we got was an engineering team that taught ours how to think about agents in regulated work. The runbook they left behind is still the document we reference.”

— VP Engineering · Healthcare AI

Outcomes.

Across six months of production, the agent has triaged 47,302 intake forms with zero PHI incidents, an 11.2% escalation rate to nurse-on-call (the design target was 8–12%), and a 62% reduction in time-to-routing against the pre-launch baseline.

The audit log was used in a formal HIPAA review three months after launch and passed without findings. Devmint continues on a quarterly retainer running the operate-and-improve programme.

More work

Other recent builds.

Engagement · Fintech · 10 weeks

An AI underwriting copilot that cut review time from 4 days to 90 minutes.

Engagement · Internal Ops · 5 weeks