ADR-016: Markdown as Source of Truth for Dual-Format System
Date: 2025-11-10 Status: โ Accepted Deciders: CTO (Claude Code AI), CEO Tags: #documentation #architecture #source-control
Context
The Pyramid Architecture requires a dual-format system: - Markdown (MD): Compressed, AI-optimized, minimal tokens - HTML: Verbose, human-friendly, examples and diagrams
Question: Which format should be the authoritative source?
Options: 1. MD as source โ Generate HTML via build system 2. HTML as source โ Generate MD via build system 3. Both independent โ Manually keep in sync 4. Database as source โ Generate both from DB
Decision
Markdown files are the single source of truth. HTML is generated automatically from MD via the build system.
Format:
- Source: Documentation/1-CONTEXT/*.md (version controlled in Git)
- Generated: Documentation/HTML/*.html (ephemeral, not committed)
- Build command: python3 Documentation/.meta/build/scripts/build.py
Content Flow:
MD (source, Git) โ Build System โ HTML (generated, not in Git)
โ
Expansions (.expansions.yml)
Rationale
Why MD as Source?
Git-Friendly: - Plain text diffs work perfectly - Merge conflicts are manageable - History tracking is clean - Code review is straightforward
AI-Editable: - Claude Code can directly edit MD files - No parsing/rendering complexity - Fast iterations - No tool dependencies
Build System Benefits: - MD โ HTML is well-established (markdown library) - HTML โ MD is lossy and error-prone - One-way builds are simpler than bidirectional sync
Token Budget Control: - MD files stay compressed - Token count is measurable (file size รท 4) - Pre-commit hook can validate before commit
Why NOT HTML as Source?
- Not Git-friendly (large, complex diffs)
- Hard for AI to edit (parsing HTML is complex)
- HTML โ MD conversion loses information
- Token budget impossible to enforce
Why NOT Both Independent?
- Manual sync is error-prone
- Divergence would occur inevitably
- Double the maintenance burden
- No clear authority on conflicts
Why NOT Database as Source?
- Adds unnecessary complexity
- Version control becomes harder
- Build system more complex
- No token budget visibility
Consequences
Positive
- Single Source of Truth: No ambiguity, MD is always authoritative
- Git-Native: Perfect integration with version control
- AI-Friendly: Claude Code can edit directly
- Build Automation: HTML regenerates automatically
- Token Budget Enforceable: Pre-commit hook validates MD file sizes
- Fast Iteration: Edit MD, rebuild HTML in seconds
Negative
- Build Step Required: Can't just open HTML without building
- Generated Files: HTML must be rebuilt after MD changes
- Deployment Dependency: Need build system in deployment pipeline
Mitigations
- Auto-Build: Build system runs automatically via pre-commit hook
- Fast Builds: Entire build takes <5 seconds
- CI/CD Integration: Can automate builds on push to main
Implementation
Current State:
- โ
All MD files in Documentation/1-CONTEXT/
- โ
Build system operational (build.py)
- โ
HTML generated to ../HTML/
- โ
.gitignore excludes generated HTML
- โ
Pre-commit hook validates and rebuilds
Build Process:
# Manual build
python3 Documentation/.meta/build/scripts/build.py
# Automatic (pre-commit hook)
# Runs on every commit to Documentation/
Expansion System:
- MD contains directive comments: <!-- EXPAND:type:key -->
- Expansion data in: *.expansions.yml files (same directory as MD)
- Build system injects expansions into HTML only
- MD stays compressed
Alternatives Considered
Alternative 1: HTML as Source, Generate MD
Rejected because: - HTML โ MD is lossy (formatting, structure lost) - Not Git-friendly (complex diffs) - Hard for AI to edit - Token budget impossible to enforce
Alternative 2: Both Independent, Manual Sync
Rejected because: - Inevitable divergence - Double maintenance burden - Human error in keeping formats in sync - No clear authority on conflicts
Alternative 3: AsciiDoc or RestructuredText as Source
Rejected because: - Less common (smaller ecosystem) - More complex syntax than Markdown - Tooling not as mature - No significant benefit over Markdown
Alternative 4: YAML or JSON as Source
Rejected because: - Not human-readable for long content - Poor diff experience - Not designed for prose - Would need custom tooling
Validation
Source Control: โ - MD files tracked in Git - Clean diffs for code review - Merge conflicts manageable
AI Editing: โ - Claude Code edits MD directly - Fast iterations (no parsing needed)
Build System: โ - 5-second build time - Consistent output - Zero errors
Token Budget: โ - 11,017 tokens current (22% of 50K budget) - Enforceable via pre-commit hook
References
Review Schedule
- Next Review: After 6 months of usage
- Review Trigger: If MD โ HTML builds become problematic
- Success Criteria: Zero divergence between MD and HTML
Notes
This decision is fundamental to the dual-format system. By making MD the source of truth, we ensure: - Git-friendly workflow - AI-editable documentation - Enforceable token budgets - Automated HTML generation
The expansion system (.expansions.yml files) allows us to keep MD compressed while making HTML verbose, solving the dual-audience problem elegantly.
CEO's explicit guidance: "I think the MD version should be the source of truth. This will allow you to get out important details and then embellish them with human readable nuances and examples."