Dogfood findings — v4.6 verification¶
Self-critique of design-ai v4.6 after running an end-to-end realistic scenario: bootstrap a Korean B2B SaaS HR onboarding flow.
Output produced: examples/cases/dogfood-v4-korean-hr-onboarding.md.
This document captures what worked since v3, what gaps emerged, and the fixes shipped in this dogfood pass.
What worked well¶
1. v4.5 family-completed specs paid off¶
Three families I exercised end-to-end in the deliverable:
- Form family — FormControl / FormLabel / FormHelperText cited directly. The aria-invalid + aria-describedby contract was right there in the spec; I didn't have to derive it.
- Dialog family — DialogTitle.id ↔ Dialog.aria-labelledby, DialogContentText.id ↔ Dialog.aria-describedby. The contract is explicit in three separate specs that cross-reference each other. v3 made me research this.
- List family — ListItem with secondaryAction vs ListItemButton for clickable rows. The boundary is documented; I didn't conflate them as I did in v3 dogfood.
The polish-effort difference was visible: when a primitive's spec answered my question without me thinking, I composed faster.
2. v4.4 TS-AST extractor produced honest scaffolds¶
When I needed Select props I checked examples/component-select.md; for LoadingButton I checked nothing existed (gap, see below). The 21 v4.5 drafts with the DRAFT banner + accurate API table were useful as references even unpolished — the API table answers "what props exist" and that's the question I actually asked.
The DRAFT banner was honest: I didn't quote anatomy / tokens from drafts, only the API.
3. Korean knowledge held up across contexts¶
korean-typography.md line-height bump (10%) applied without thinking. korean-conversational-conventions.md 해요체 vs 합쇼체 gave me the register decision (B2B onboarding → 해요체).
The KR-specific knowledge files compose naturally — I didn't have to glue them.
4. /stability-review dogfooded itself¶
Running python3 tools/audit/stability-review.py --today 2026-12 in the middle of the dogfood was a real test of v4.6. It executed correctly, surfaced one false positive (see fix #2 below), and the report format was actionable.
5. Single audit runner saved time¶
npm run audit (one command, 6 audits, 0.8s) ran during dogfood without context-switch. v3.x's six-separate-commands cost mental overhead.
What's missing / what broke¶
1. LoadingButton has no spec¶
Used in two places in the deliverable (form submit, dialog confirm) — but no examples/component-loading-button.md exists. The mui-x/x-loading-button is canonical; should be in v4.x coverage.
Action shipped: Generate via v2 extractor.
2. knowledge/COVERAGE.md flagged by stability-review as false positive¶
The generated index file lacks stability: because it's auto-regenerated. The stability tool flagged it correctly per its rules, but adopters will reasonably skip it. Either:
- Skip generated artifacts in stability-review.py, OR
- Have the coverage generator emit stability: stable (the file IS as-fresh-as-last-regen).
Action shipped: Skip generated artifacts (path-based: knowledge/COVERAGE.md).
3. No knowledge file specifically for "B2B onboarding form patterns"¶
There's knowledge/patterns/form-design.md (general), knowledge/patterns/onboarding.md (general). Neither covers B2B-specific decisions like:
- Auto-save on blur (vs explicit save button).
- Multi-step pacing (3-5 steps vs 7+).
- Sensitive data fields (주민등록번호, 통장 사본) — required disclosures.
- Bilingual hire flows (KR primary + EN toggle).
Action shipped: Add knowledge/patterns/b2b-onboarding-flows.md capturing the decisions surfaced in this dogfood.
4. LoadingButton use-case not in skills¶
design-system-builder skill doesn't invoke a "loading state pattern" sub-skill. When I needed an async-aware button I had to know it from MUI knowledge. A skill or knowledge file for "Async control patterns" would help.
Action: Add to roadmap (low priority; the inline knowledge in component-button.md covers the common case).
5. No "Korean B2B SaaS" entry in palettes-by-product-type.md¶
I had to derive the muted-teal seed myself (matching trust + sensitive data). A row in the curated palette table for "Korean B2B SaaS — sensitive data" with example seeds would scaffold this.
Action shipped: Add row to knowledge/colors/palettes-by-product-type.md.
6. v4.5 drafts mention "polished" specs in cross-refs but those don't always exist yet¶
E.g., component-form-helper-text.md (DRAFT) cross-refs component-form-control.md (polished — exists ✓) but several other drafts cross-ref polished sub-component specs that haven't been polished yet. Cross-ref consistency check might surface these.
Action: Future audit (Phase 39+); not blocking.
7. The v4.5 polished specs assume OutlinedInput exists; check that¶
Form spec uses <OutlinedInput>. Verified: examples/component-outlined-input.md exists (v4.5 draft). Cross-ref pattern works but the draft banner makes the user uncertain whether the API table can be trusted. The TS-AST extractor's accuracy claim should be repeated in each draft's banner.
Action shipped: Update v2 spec template banner to explicitly state "API table is AST-extracted and accurate; narrative sections are placeholders".
What's missing — declined to fix this pass¶
These surfaced but are out-of-scope for v4.7:
- Korean B2B layout density patterns — KR B2B apps tend to be denser than Western counterparts. A
knowledge/patterns/korean-density-conventions.mdwould help. Defer to v4.8+. - A "spec-component-from-mui-x" extractor — MUI X (DataGrid, DatePicker, etc.) lives in a different package. Current v2 extractor doesn't search there. Roadmap.
- Step-progress component family — used in onboarding but already partially covered. Polish backlog.
Time comparison¶
| Phase | v3 dogfood (fintech) | v4 dogfood (HR) |
|---|---|---|
| Brief → palette + tokens | ~12 min | ~6 min (palette skill polished + Korean conventions baked in) |
| First component spec | ~15 min (had to invent FormControl composition) | ~5 min (cited 5 family-completed specs) |
| Confirmation dialog | ~10 min | ~3 min |
| UX audit | ~8 min | ~5 min |
| Findings doc | ~10 min | (this doc) |
| Stability review | (didn't exist) | <1 min |
v4 ~3-5x faster on Form/Dialog/List heavy work because the spec corpus answered the questions instead of me re-deriving them.
What I shipped this dogfood¶
Fixes applied during the v4.7 commit:
- ✅ Generated
examples/component-loading-button.mdvia v2 extractor. - ✅
tools/audit/stability-review.pyskips generated artifacts (path-based skip-list). - ✅ Added
knowledge/patterns/b2b-onboarding-flows.md. - ✅ Added "Korean B2B SaaS — sensitive data" row in
knowledge/colors/palettes-by-product-type.md. - ✅ Updated v2 spec template banner with explicit accuracy claim.
What this validates¶
- v4.0 graduation was correct. The 8 stable surfaces I exercised (skills / commands / knowledge / examples / CLI / agents / docs / audits) all held up.
- v4.5 family completion was the right call. The 8 polished specs returned 3-5x productivity on form/dialog/list work.
- v4.6 stability automation works. One real false positive surfaced and got fixed.
What this does NOT validate¶
- VS Code extension under real load (didn't use it during this dogfood — adopter would).
- npm install path on a fresh machine (would need a clean clone test).
- Multi-language doc site rendering (mkdocs build was last verified at v3.12 release).
These belong in a separate install / e2e test.
Cross-reference¶
docs/DOGFOOD-FINDINGS.md— v1 dogfood (Korean fintech, ~v1.0)examples/cases/dogfood-v4-korean-hr-onboarding.md— the deliverable from this rundocs/SESSION-LOG.md— v2 → v4 narrative