ADR-016: Markdown as Source of Truth for Dual-Format System

Date: 2025-11-10 Status: ✅ Accepted Deciders: CTO (Claude Code AI), CEO Tags: #documentation #architecture #source-control

Context

The Pyramid Architecture requires a dual-format system: - Markdown (MD): Compressed, AI-optimized, minimal tokens - HTML: Verbose, human-friendly, examples and diagrams

Question: Which format should be the authoritative source?

Options: 1. MD as source → Generate HTML via build system 2. HTML as source → Generate MD via build system 3. Both independent → Manually keep in sync 4. Database as source → Generate both from DB

Decision

Markdown files are the single source of truth. HTML is generated automatically from MD via the build system.

Format: - Source: Documentation/1-CONTEXT/*.md (version controlled in Git) - Generated: Documentation/HTML/*.html (ephemeral, not committed) - Build command: python3 Documentation/.meta/build/scripts/build.py

Content Flow:

MD (source, Git) → Build System → HTML (generated, not in Git)
                 ↓
              Expansions (.expansions.yml)

Rationale

Why MD as Source?

Git-Friendly: - Plain text diffs work perfectly - Merge conflicts are manageable - History tracking is clean - Code review is straightforward

AI-Editable: - Claude Code can directly edit MD files - No parsing/rendering complexity - Fast iterations - No tool dependencies

Build System Benefits: - MD → HTML is well-established (markdown library) - HTML → MD is lossy and error-prone - One-way builds are simpler than bidirectional sync

Token Budget Control: - MD files stay compressed - Token count is measurable (file size ÷ 4) - Pre-commit hook can validate before commit

Why NOT HTML as Source?

Not Git-friendly (large, complex diffs)
Hard for AI to edit (parsing HTML is complex)
HTML → MD conversion loses information
Token budget impossible to enforce

Why NOT Both Independent?

Manual sync is error-prone
Divergence would occur inevitably
Double the maintenance burden
No clear authority on conflicts

Why NOT Database as Source?

Adds unnecessary complexity
Version control becomes harder
Build system more complex
No token budget visibility

Consequences

Positive

Single Source of Truth: No ambiguity, MD is always authoritative
Git-Native: Perfect integration with version control
AI-Friendly: Claude Code can edit directly
Build Automation: HTML regenerates automatically
Token Budget Enforceable: Pre-commit hook validates MD file sizes
Fast Iteration: Edit MD, rebuild HTML in seconds

Negative

Build Step Required: Can't just open HTML without building
Generated Files: HTML must be rebuilt after MD changes
Deployment Dependency: Need build system in deployment pipeline

Mitigations

Auto-Build: Build system runs automatically via pre-commit hook
Fast Builds: Entire build takes <5 seconds
CI/CD Integration: Can automate builds on push to main

Implementation

Current State: - ✅ All MD files in Documentation/1-CONTEXT/ - ✅ Build system operational (build.py) - ✅ HTML generated to ../HTML/ - ✅ .gitignore excludes generated HTML - ✅ Pre-commit hook validates and rebuilds

Build Process:

# Manual build
python3 Documentation/.meta/build/scripts/build.py

# Automatic (pre-commit hook)
# Runs on every commit to Documentation/

Expansion System: - MD contains directive comments:  - Expansion data in: *.expansions.yml files (same directory as MD) - Build system injects expansions into HTML only - MD stays compressed

Alternatives Considered

Alternative 1: HTML as Source, Generate MD

Rejected because: - HTML → MD is lossy (formatting, structure lost) - Not Git-friendly (complex diffs) - Hard for AI to edit - Token budget impossible to enforce

Alternative 2: Both Independent, Manual Sync

Rejected because: - Inevitable divergence - Double maintenance burden - Human error in keeping formats in sync - No clear authority on conflicts

Alternative 3: AsciiDoc or RestructuredText as Source

Rejected because: - Less common (smaller ecosystem) - More complex syntax than Markdown - Tooling not as mature - No significant benefit over Markdown

Alternative 4: YAML or JSON as Source

Rejected because: - Not human-readable for long content - Poor diff experience - Not designed for prose - Would need custom tooling

Validation

Source Control: ✅ - MD files tracked in Git - Clean diffs for code review - Merge conflicts manageable

AI Editing: ✅ - Claude Code edits MD directly - Fast iterations (no parsing needed)

Build System: ✅ - 5-second build time - Consistent output - Zero errors

Token Budget: ✅ - 11,017 tokens current (22% of 50K budget) - Enforceable via pre-commit hook

References

Review Schedule

Next Review: After 6 months of usage
Review Trigger: If MD → HTML builds become problematic
Success Criteria: Zero divergence between MD and HTML

Notes

This decision is fundamental to the dual-format system. By making MD the source of truth, we ensure: - Git-friendly workflow - AI-editable documentation - Enforceable token budgets - Automated HTML generation

The expansion system (.expansions.yml files) allows us to keep MD compressed while making HTML verbose, solving the dual-audience problem elegantly.

CEO's explicit guidance: "I think the MD version should be the source of truth. This will allow you to get out important details and then embellish them with human readable nuances and examples."