The brief.
A fast-growing D2C fashion brand was launching three new collections a month — and bottlenecking on the team writing product descriptions, alt text and SEO copy. Each new collection meant 800–1,200 new SKUs, six locales, and a brand voice that took the founder thirty minutes per product to get right.
The team had tried generic AI writing tools and a freelancer pool. The first produced bland, off-brand copy that hurt their differentiation. The second was inconsistent — some pages read like the brand, others read like Amazon listings. Neither could keep up with the volume.
Devmint partnered with the brand's product and marketing teams to ship a content generation agent that's on-brand by default, image-conditioned, and accountable to a measurable conversion outcome.
What we shipped.
- Generation · 01Image-conditioned copyThe agent sees the product image alongside the structured attributes (material, colour, season, occasion). Copy is generated against both — not from a SKU sheet in isolation.
- Voice · 02Brand-voice evalEvery generated page is scored against 1,200 hand-written reference pages on tone, sentence cadence, CTA style and emotive register. Below 88 → reviewer queue. Above 92 → ships.
- Locale · 036-market variantsEN, FR, ES, DE, IT, NL — not translations of each other. Each locale is generated with its own brand-voice dictionary so the tone reads native, not localised.
- Outcome · 04Conversion-tracked outputEvery generated page is tagged in the analytics layer. The brand can see, per SKU, which generated copy converted higher than its predecessor — and feed that signal back into the eval set.
How we built it.
The framing the founder asked for was: “don't make me write product copy at midnight ever again, and don't hurt the brand.” Those were the two non-negotiables.
The technical answer was a generation pipeline with the brand-voice eval as the spine. Without that eval, the system would've shipped bland copy at scale — the worst possible outcome. With the eval, the system was forced to match the standard the founder had been holding for years.
The 4-week A/B test was the operational outcome contract. Both arms had identical product images, identical prices, identical SKUs. The only variable was the PDP copy: human-written baseline vs. agent-generated. +19% conversion lift on assisted sessions for the agent arm, statistically significant at week three.
The stack.
AI & eval
- Anthropic Claude (multimodal)
- Custom brand-voice scorer
- Per-locale dictionaries
- Langfuse for traces
- Versioned eval against 1,200 refs
Pipeline + integration
- Next.js admin surface
- Shopify integration · push to PDP
- Postgres + S3 (images)
- Inngest for batch generation
- Per-SKU conversion tagging
“We launched three collections in three weeks instead of three months. The PDP copy reads like I wrote it. Two months later the conversion data made the decision for us.”
Outcomes.
Across the three-month rollout, the agent generated content for 40,000 SKUs across six locales. The A/B test ran for four weeks on a matched cohort — +19% PDP conversion lift on the agent arm, statistically significant.
The brand has retained Devmint on a quarterly retainer to maintain the brand-voice eval, add locales (Portuguese on roadmap for the next quarter), and tune the conversion-feedback loop.