Adr 016 Markdown Source Of Truth

EPGOAT Documentation - Architectural Decisions

ADR-016: Markdown as Source of Truth for Dual-Format System

Date: 2025-11-10 Status: โœ… Accepted Deciders: CTO (Claude Code AI), CEO Tags: #documentation #architecture #source-control


Context

The Pyramid Architecture requires a dual-format system: - Markdown (MD): Compressed, AI-optimized, minimal tokens - HTML: Verbose, human-friendly, examples and diagrams

Question: Which format should be the authoritative source?

Options: 1. MD as source โ†’ Generate HTML via build system 2. HTML as source โ†’ Generate MD via build system 3. Both independent โ†’ Manually keep in sync 4. Database as source โ†’ Generate both from DB


Decision

Markdown files are the single source of truth. HTML is generated automatically from MD via the build system.

Format: - Source: Documentation/1-CONTEXT/*.md (version controlled in Git) - Generated: Documentation/HTML/*.html (ephemeral, not committed) - Build command: python3 Documentation/.meta/build/scripts/build.py

Content Flow:

MD (source, Git) โ†’ Build System โ†’ HTML (generated, not in Git)
                 โ†“
              Expansions (.expansions.yml)

Rationale

Why MD as Source?

Git-Friendly: - Plain text diffs work perfectly - Merge conflicts are manageable - History tracking is clean - Code review is straightforward

AI-Editable: - Claude Code can directly edit MD files - No parsing/rendering complexity - Fast iterations - No tool dependencies

Build System Benefits: - MD โ†’ HTML is well-established (markdown library) - HTML โ†’ MD is lossy and error-prone - One-way builds are simpler than bidirectional sync

Token Budget Control: - MD files stay compressed - Token count is measurable (file size รท 4) - Pre-commit hook can validate before commit

Why NOT HTML as Source?

  • Not Git-friendly (large, complex diffs)
  • Hard for AI to edit (parsing HTML is complex)
  • HTML โ†’ MD conversion loses information
  • Token budget impossible to enforce

Why NOT Both Independent?

  • Manual sync is error-prone
  • Divergence would occur inevitably
  • Double the maintenance burden
  • No clear authority on conflicts

Why NOT Database as Source?

  • Adds unnecessary complexity
  • Version control becomes harder
  • Build system more complex
  • No token budget visibility

Consequences

Positive

  1. Single Source of Truth: No ambiguity, MD is always authoritative
  2. Git-Native: Perfect integration with version control
  3. AI-Friendly: Claude Code can edit directly
  4. Build Automation: HTML regenerates automatically
  5. Token Budget Enforceable: Pre-commit hook validates MD file sizes
  6. Fast Iteration: Edit MD, rebuild HTML in seconds

Negative

  1. Build Step Required: Can't just open HTML without building
  2. Generated Files: HTML must be rebuilt after MD changes
  3. Deployment Dependency: Need build system in deployment pipeline

Mitigations

  1. Auto-Build: Build system runs automatically via pre-commit hook
  2. Fast Builds: Entire build takes <5 seconds
  3. CI/CD Integration: Can automate builds on push to main

Implementation

Current State: - โœ… All MD files in Documentation/1-CONTEXT/ - โœ… Build system operational (build.py) - โœ… HTML generated to ../HTML/ - โœ… .gitignore excludes generated HTML - โœ… Pre-commit hook validates and rebuilds

Build Process:

# Manual build
python3 Documentation/.meta/build/scripts/build.py

# Automatic (pre-commit hook)
# Runs on every commit to Documentation/

Expansion System: - MD contains directive comments: <!-- EXPAND:type:key --> - Expansion data in: *.expansions.yml files (same directory as MD) - Build system injects expansions into HTML only - MD stays compressed


Alternatives Considered

Alternative 1: HTML as Source, Generate MD

Rejected because: - HTML โ†’ MD is lossy (formatting, structure lost) - Not Git-friendly (complex diffs) - Hard for AI to edit - Token budget impossible to enforce

Alternative 2: Both Independent, Manual Sync

Rejected because: - Inevitable divergence - Double maintenance burden - Human error in keeping formats in sync - No clear authority on conflicts

Alternative 3: AsciiDoc or RestructuredText as Source

Rejected because: - Less common (smaller ecosystem) - More complex syntax than Markdown - Tooling not as mature - No significant benefit over Markdown

Alternative 4: YAML or JSON as Source

Rejected because: - Not human-readable for long content - Poor diff experience - Not designed for prose - Would need custom tooling


Validation

Source Control: โœ… - MD files tracked in Git - Clean diffs for code review - Merge conflicts manageable

AI Editing: โœ… - Claude Code edits MD directly - Fast iterations (no parsing needed)

Build System: โœ… - 5-second build time - Consistent output - Zero errors

Token Budget: โœ… - 11,017 tokens current (22% of 50K budget) - Enforceable via pre-commit hook


References


Review Schedule

  • Next Review: After 6 months of usage
  • Review Trigger: If MD โ†’ HTML builds become problematic
  • Success Criteria: Zero divergence between MD and HTML

Notes

This decision is fundamental to the dual-format system. By making MD the source of truth, we ensure: - Git-friendly workflow - AI-editable documentation - Enforceable token budgets - Automated HTML generation

The expansion system (.expansions.yml files) allows us to keep MD compressed while making HTML verbose, solving the dual-audience problem elegantly.

CEO's explicit guidance: "I think the MD version should be the source of truth. This will allow you to get out important details and then embellish them with human readable nuances and examples."