Building in Public — Day 7: AI Agent for UNTP Credentials (Feb 19, 2026)
Day 6 was a rest day. Not a planned one — but somewhere around midday I hit a wall. The kind where you're staring at the screen and nothing useful is coming out. Between the pace of the last week and landing the Tier 2 integration earlier than expected, I decided to call it early and let the brain recover. No regrets. Day 7 was noticeably more productive for it.
The Strategy
Before diving into what we built, here's the thinking behind today's approach: get Digital Facility Records (DFR) working end-to-end first, then replicate for other credential types.
DFRs share a lot of structural DNA with Digital Product Passports — familiar territory with some added complexity around facility details and geo coordinates. The architecture was designed with replication in mind: adding a new credential type in n8n requires exactly 2 nodes (schema + rules). Everything else is shared infrastructure.
The real monster waiting at the end of this road is DCCs (Digital Conformity Credentials) — they need custom vocabulary and schema support that goes beyond engineering into governance territory. But that's a problem for future-us.
What We Built
Unified n8n webhook endpoint for generate and chat. Instead of routing generate vs. chat requests through different n8n paths, a single webhook now handles both modes. Cleaner, less to maintain, and it set us up for the shared context builder architecture that made the rest of today possible.
Three-block system prompt with Anthropic prompt caching. The schema and rules block runs ~5000+ tokens. That's expensive if you're paying for it on every chat message. With prompt caching, that block gets cached across requests — confirmed cache hits in Anthropic's usage stats — which cuts input token costs by roughly 90% for chat interactions. The structure is simple: cached prefix (schema + rules), a small middle block that varies by action type, and dynamic context at the end.
Split-panel AI editor. Left side shows a live JSON diff with per-field Accept/Reject buttons. Right side is the chat interface. Users can cherry-pick individual field changes from the AI's proposals without accepting everything wholesale. It turned out cleaner than expected — the diff auto-clears as fields are accepted.
Tier 2 validation wired into chat. Hit "Run Tier 2 Tests" and any validation errors get sent directly to the AI, which proposes targeted fixes. In generate mode, the AI self-validates its output up to 3 times before returning, with a circuit breaker to prevent infinite loops. In chat mode, validation is deliberately manual — users control when it runs, which saves a full API round-trip per message and gives them more control over the workflow.
Why n8n?
Two reasons we're building the AI workflow layer in n8n rather than baking it directly into the application code.
First, iteration speed. Prompts need tuning. A/B testing different approaches, adjusting instructions, tweaking the system prompt structure — all of that happens without pushing new code to the server. We can make a change, test it, and move on in minutes instead of going through a full deploy cycle.
Second, shareability. n8n has a large practitioner community, and workflows are easy to export and share. As we build this out, we can put our work directly into the hands of people who speak the same language — no translation layer required.
Wins
The biggest simplification came from consolidating per-type n8n routing into shared context builders. We started with 5 nodes per credential type and ended up at 2. The type-specific knowledge lives in exactly two places — schema and rules — not scattered across the pipeline.
The per-field accept/reject UX is genuinely useful. Being able to see exactly what the AI wants to change and selectively accept it makes the editing workflow feel collaborative rather than all-or-nothing.
Also caught a sneaky bug: chat mode was missing the entire schema + rules block from the system prompt. The AI was still giving reasonable answers from conversation context alone — which says something about how capable these models are — but it's also a good reminder that missing-context bugs can hide in plain sight for a while.
Learnings
Don't over-specialize early. We kept consolidating as we realized type-specific knowledge belongs in two places, not scattered everywhere. Start general, specialize only where it actually matters. This applies to n8n workflow design as much as anything else.
Three-block system prompts are the sweet spot for prompt caching when you have action-variant instructions. Stable cached prefix, small variant middle block, dynamic context at the end.
Not all credential types are the same kind of problem. DTEs (Digital Traceability Events) are structured relationship records — deterministic forms probably serve them better than AI generation. DCCs are the real challenge ahead, requiring governance expert input alongside the engineering work.
Where We Left Off
Text and semantic generation are working well end-to-end. The AI produces well-structured credentialSubject JSON that passes UNTP schema validation. Chat refinement is tested and working.
Next up: wiring in document uploads — PDFs, evidence files through Directus/S3 — so the AI can read reference documents and generate richer conformity claims based on real evidence rather than structured inputs alone.
Following along? The sprint continues. More updates as we build.