Catalog & PDP generation agent for 40,000 SKUs.

Under NDA — name redactedE-commerceD2C brand6 weeks · 2025

11×Faster listingsvs. human writer baseline

+19%PDP conversion4-week A/B vs. control

6Locales liveEN, FR, ES, DE, IT, NL

40,000SKUs generatedacross 3 collection waves

The brief.

A fast-growing D2C fashion brand was launching three new collections a month — and bottlenecking on the team writing product descriptions, alt text and SEO copy. Each new collection meant 800–1,200 new SKUs, six locales, and a brand voice that took the founder thirty minutes per product to get right.

The team had tried generic AI writing tools and a freelancer pool. The first produced bland, off-brand copy that hurt their differentiation. The second was inconsistent — some pages read like the brand, others read like Amazon listings. Neither could keep up with the volume.

Devmint partnered with the brand's product and marketing teams to ship a content generation agent that's on-brand by default, image-conditioned, and accountable to a measurable conversion outcome.

What we shipped.

Generation · 01Image-conditioned copyThe agent sees the product image alongside the structured attributes (material, colour, season, occasion). Copy is generated against both — not from a SKU sheet in isolation.
Voice · 02Brand-voice evalEvery generated page is scored against 1,200 hand-written reference pages on tone, sentence cadence, CTA style and emotive register. Below 88 → reviewer queue. Above 92 → ships.
Locale · 036-market variantsEN, FR, ES, DE, IT, NL — not translations of each other. Each locale is generated with its own brand-voice dictionary so the tone reads native, not localised.
Outcome · 04Conversion-tracked outputEvery generated page is tagged in the analytics layer. The brand can see, per SKU, which generated copy converted higher than its predecessor — and feed that signal back into the eval set.

How we built it.

The framing the founder asked for was: “don't make me write product copy at midnight ever again, and don't hurt the brand.” Those were the two non-negotiables.

The technical answer was a generation pipeline with the brand-voice eval as the spine. Without that eval, the system would've shipped bland copy at scale — the worst possible outcome. With the eval, the system was forced to match the standard the founder had been holding for years.

The 4-week A/B test was the operational outcome contract. Both arms had identical product images, identical prices, identical SKUs. The only variable was the PDP copy: human-written baseline vs. agent-generated. +19% conversion lift on assisted sessions for the agent arm, statistically significant at week three.

The stack.

AI & eval

Anthropic Claude (multimodal)
Custom brand-voice scorer
Per-locale dictionaries
Langfuse for traces
Versioned eval against 1,200 refs

Pipeline + integration

Next.js admin surface
Shopify integration · push to PDP
Postgres + S3 (images)
Inngest for batch generation
Per-SKU conversion tagging

“We launched three collections in three weeks instead of three months. The PDP copy reads like I wrote it. Two months later the conversion data made the decision for us.”

— Founder · D2C E-commerce

Outcomes.

Across the three-month rollout, the agent generated content for 40,000 SKUs across six locales. The A/B test ran for four weeks on a matched cohort — +19% PDP conversion lift on the agent arm, statistically significant.

The brand has retained Devmint on a quarterly retainer to maintain the brand-voice eval, add locales (Portuguese on roadmap for the next quarter), and tune the conversion-feedback loop.

More work

Other recent builds.

Engagement · Internal Ops · 5 weeks

An ops automation that turned a 12-person team into a 4-person team.

Live · Retail · 12 weeks