Home/Work/Fintech Underwriting Copilot
Engagement · Fintech SaaS · Case study · 01

An AI underwriting copilot that cut review time from 4 days to 90 minutes.

Under NDA — name redactedFintech SaaSSeries AUnited Kingdom10 weeks · Q1 2026
90 minAvg review timedown from 4 business days
312%ARR growth · 12 mopost-launch
7-figureAnnual ops savedmeasured at month six
100%Auditable decisionsevery decision logged

The brief.

A UK-based fintech lender — building credit products for small businesses — was buried in PDFs. Their underwriting team of fourteen analysts was spending an average of four business days reviewing each loan application. The bottleneck wasn't the decision itself; it was the work that came before it.

Each application required cross-referencing a 200-page internal policy manual, eighteen years of regulatory bulletins, the company's own case history, and a stack of supporting documents from the applicant. Analysts were good — but they were doing the same retrieval-and-comparison work hundreds of times a week, and the volume was about to triple following a planned product launch.

The company had tried two AI vendors in the previous twelve months. One delivered a chatbot that “summarised” policy in plausible-sounding hallucinations. The other delivered a Python script that the team described, in writing, as “actively dangerous to send anywhere near a real application.”

What we shipped.

  • Component · 01Hybrid retrieval over 14,000 documentsDense vectors (pgvector with bge-large embeddings) combined with BM25 lexical search, re-ranked by a cross-encoder. Dense-only retrieval failed the long tail of regulatory edge cases; hybrid recovered them.
  • Component · 02Multi-step underwriting agentDecomposes the application into a structured checklist, retrieves the relevant policy for each item, drafts a finding with citations, flags conflicts, and produces a final recommendation — for a human to sign off on.
  • Component · 03Auditable decision logEvery retrieval, every model call, every prompt version, every output is logged with input hashes and replay-ready inputs. The UK FCA review passed cleanly on first attempt.
  • Component · 04Cost-and-latency routerThree models — a fast cheap one for 70% of straightforward retrieval, a stronger one for ambiguous comparisons, an offline batch model for nightly policy embedding refresh. Per-application unit cost is on the board pack.

How we built it.

We started with a single, narrow question: what does a senior analyst actually do in those four days? Week one, our principal sat with three analysts on the floor and watched them work. The answer turned out to be roughly 70% retrieval, 20% comparison, and 10% judgment — the part you don't automate.

That decomposition mattered. It meant the right system wasn't a chatbot. It wasn't a “summariser.” It was a structured agent that did the retrieval and comparison work an analyst would do — with citations, with auditability, with a hard human checkpoint before any decision was finalised.

We also wrote the evaluation harness in week one, before writing any prompt. The eval was 200 historical applications, each with a known correct decision and the policy passages the original reviewer cited. Every prompt change, model swap and retrieval tweak was scored against that set. By week ten, the agent matched senior-analyst decisions on 96.3% of the eval set — better than the agreement rate between senior analysts on the same set.

The stack.

AI & retrieval
  • Anthropic Claude · OpenAI
  • bge-large embeddings
  • pgvector + BM25 hybrid
  • Custom Python orchestration
  • Langfuse · OpenTelemetry
Application + infra
  • Next.js 15 · TypeScript
  • Postgres · Redis
  • Inngest for jobs
  • AWS (eu-west-2)
  • Sentry · Datadog

They shipped a working version in three weeks. Our previous vendor spent eight months on something worse. Devmint just operates differently.

CTO · Fintech SaaS · London

Outcomes.

The numbers in the strip at the top of this page were measured at the twelve-month anniversary of production cutover, from the company's own analytics — not from us.

Average review time held steady at 90 minutes per application across a 4.1× increase in volume. The underwriting team grew from 14 to 18 analysts despite the volume — the team chose to keep growing because the company kept growing, not because the system needed more humans.

Per-application unit cost (model spend, infra, share of engineering) settled at roughly 2.3% of the equivalent fully-loaded analyst time. The unit economics are part of the board pack — visible, defensible, scoped.

Devmint has continued on a quarterly retainer since launch, running the operate-and-improve programme: weekly eval reviews, monthly cost reviews, and three feature releases per quarter.

More work

Other recent builds.

Your case study next

Have an underwriting problem? Or a different one?

A 30-minute call is enough to know if we're the right team. A written proposal lands within 48 hours.