Beaumont Leadership Deck — Agent Stories · 2026-06-30

Story 1 of 2
Voice Engine — 6-Agent Writing Pipeline
A multi-agent system that makes every piece sound like me, not like AI — with a measurable bar for failure.
INPUT Brief role + background SCOUT Company Research pre-brief grounding SCRIBE Two-pass Drafter Claude · voice fingerprint INSPECTOR Deterministic Gate pure Python · no AI fail → back to Scribe pass CRITIC AI-tell reviewer · Gemini WEAVER structure · linkage MILLER StoryBrand Gate persuasive only OUTPUT Published 15 pieces shipped 4 CLs · 7 LinkedIn · 4 docs cross-model: Gemini reviews Claude

Slide Bullets

  • Scout grounds every piece in real company research before Scribe writes a word — fixes the "generic AI output" problem at the source
  • Scribe drafts against a voice fingerprint derived from 4 years of corpus: measurable patterns (avg sentence length, hedging rate, claim density), not adjectives
  • Inspector is pure Python — forbidden words, sentence length, em-dash count — deterministic gate fires before any AI review runs
  • Critic runs on Gemini reviewing Claude's output: cross-model adversarial review catches the specific tells each model misses in its own drafts
  • Miller gates persuasive pieces against StoryBrand structure — hero, stakes, guide, plan, CTA — only when intent: persuasive

Proof Points

  • 15 pieces shipped through the full pipeline: 4 cover letters, 7 LinkedIn rewrites, 4 articles and internal docs
  • Every piece clears 4–5 sequential agent gates before it ships — ~60+ total gate evaluations across the corpus
  • Bar is binary: if Jeff rewrites >30% of words, the engine has failed V1. Not "feels better" — a hard number set before Phase 1 began
  • outcome-architect issued a Phase 1 authorization at 95% confidence after auditing all four failure modes were addressed in the architecture
Origin: A NerdWallet cover letter missed Riskalyze and RightCapital entirely — two platforms central to the role. Generic AI output with no contextual grounding. That failure became the spec.

Story 2 of 2
Ship Pipeline — 4-Agent Code Quality Gate
A non-developer shipping production code through a structured 4-agent pipeline — with a security gate that literally blocks the deploy command if it doesn't pass.
OUTCOME-ARCHITECT ODF Audit + /goal auth block no build without this pass GEMINI-RESEARCHER Codebase Exploration + Architecture Plan maps before building GEMINI-BUILDER Implementation 80–90% of all code Claude orchestrates · Jeff reviews SECURITY-REVIEWER OWASP · Auth · Secrets CVE scan · injection pre-ship, every time DEPLOY GATE pip-audit · evals hook intercepts exit 2 = deploy blocked blocked fix required pass DEPLOYED · LIVE Modal · Cloudflare deploy-tested each ship auth · headers · CSP confirmed

Slide Bullets

  • outcome-architect runs an ODF audit before any code is written — issues a /goal authorization block; build cannot start without it (voice-engine cleared at 95% confidence)
  • gemini-researcher maps the codebase and delivers an architecture plan; gemini-builder handles 80–90% of all implementation — Claude orchestrates, Jeff reviews and approves
  • gemini-security-reviewer audits every deploy: OWASP Top 10, auth patterns, secrets exposure, injection risks, dependency CVEs
  • The deploy gate is literal — a PreToolUse(Bash) hook intercepts every modal deploy command and exits with error if pip-audit or evals fail

Proof Points

  • 15 security vulnerabilities closed on SearchOps in one project: 6 High, 9 Medium — all caught before production
  • 29 CVEs surfaced and cleared by automated dep scan; 7 more resolved in a single session (starlette, python-multipart, requests)
  • Three-agent quality gate ran clean on SearchOps v1 public launch: outcome-architect PASS, gemini-researcher 42/42 tests no issues, gemini-security-reviewer zero findings
  • SearchOps shipped solo in ~15 sessions: 4-layer scoring engine, 63 passing tests, full security audit, production-deployed on Modal
  • 232× DB query improvement on hot paths, 19–24× on dashboard counts — surfaced by gemini-researcher before performance became a production problem