The brief.
A mid-market logistics operator's back office was drowning. Twelve people. Each morning started with several hundred PDF bills of lading from carriers, mismatched line items against shipped manifests, and an inbox of supplier emails asking variants of “where's my truck.”
The work was structured. It was repetitive. Almost none of it required judgement. But none of the operator's tools — their WMS, their email, their accounting system — talked to each other. So the work was “humans copying numbers between systems and writing emails that follow templates,” eight hours a day, twelve people deep.
The operator didn't want to fire anyone. They wanted to redeploy them to actual logistics work — exception handling, supplier negotiation, route planning — and stop hiring against the next wave of volume.
What we shipped.
- Intake · 01Multi-source document parserPDF bills of lading, scanned manifests, email attachments. Layout-aware extraction with confidence scoring. Low-confidence items queue for human review; high-confidence go straight through.
- Reconcile · 02Line-item matchingEach parsed line item is matched against the expected shipped manifest in the WMS. Discrepancies are flagged with the specific delta (quantity, SKU, price) — not as a generic 'mismatch'.
- Draft · 03Supplier-specific email toneThe response email is drafted in the right tone for each supplier — formal for the regulator-facing relationships, casual for the long-standing partners. Each supplier has a tone profile the operator ops team curated.
- Review · 04One-click approveThe whole flow lands in the operator's existing tool — Microsoft Teams — as a card with the draft, the reasoning, and a single approve button. Average human review time: 12 seconds.
How we built it.
The framing the operator's COO gave us was important: “don't build a new tool. We have enough tools.” That meant the entire system had to live inside the surfaces the team already used — Teams for review, Outlook for sending, their WMS for state. No new dashboard, no new login, no new app to learn.
The architectural answer was a quiet automation that does the work and brings only the exceptions to humans. The 12-second average review time is what makes it sustainable — the human isn't evaluating “is this right.” They're evaluating “is anything obviously wrong.”
97.1% of incoming documents are auto-reconciled and the email response sent without any human touch. The 2.9% that go to review are the ones that actually need a human, which is also what gives the remaining 4-person team something meaningful to do.
The stack.
AI & parsing
- Anthropic Claude · OpenAI
- Custom layout-aware PDF parser
- Per-supplier tone profiles
- Confidence-scored extraction
- Langfuse for traces
Integration
- Microsoft Teams card surface
- Outlook send via Graph API
- WMS integration (Manhattan)
- Inngest for orchestration
- Postgres for audit log
“We didn't lose anyone. We redeployed eight people into work they actually wanted to do. That was the goal. The 97% auto-reconciliation rate was the math that paid for it.”
Outcomes.
Across the 90 days following production cutover, the system processed 127,420 documents. The team shrunk from 12 to 4 FTEs on this workflow — by redeployment, not termination. The 4 remaining people handle the 2.9% of cases that need a human, plus the higher-value exception work the operator was never able to staff before.
The operator extended into a quarterly retainer for two adjacent workflows: customs documentation and supplier invoice reconciliation, currently in the build phase.