2025 11 12 Documentation Website Redesign

EPGOAT Documentation - Living Documents

Project: Documentation Website Redesign

📍 Current Context

Last Updated: 2025-11-12 (session start) Current Task: Design Phase - Creating comprehensive design document Breadcrumbs: Design Phase Status: Documenting architecture decisions, page structure, and implementation plan Next Step: Review design document with CEO for approval before implementation


✅ Progress Tracker

  • [in-progress] Phase 1: Design & Planning ← YOU ARE HERE
  • [in-progress] 1.1: Create design document ← ACTIVE
  • [ ] 1.2: Get CEO approval on design
  • [ ] 1.3: Finalize technical specifications
  • [ ] Phase 2: Dashboard Implementation
  • [ ] 2.1: Create dashboard landing page
  • [ ] 2.2: Implement statistics module
  • [ ] 2.3: Add quick links section
  • [ ] 2.4: Integrate GitHub API (private repo)
  • [ ] Phase 3: Page Granularity & Navigation
  • [ ] 3.1: Implement page splitting logic
  • [ ] 3.2: Create task-oriented navigation
  • [ ] 3.3: Build breadcrumb system
  • [ ] 3.4: Generate individual topic pages
  • [ ] Phase 4: Search & Cross-References
  • [ ] 4.1: Implement semantic search (lunr.js + TF-IDF)
  • [ ] 4.2: Build cross-reference system
  • [ ] 4.3: Add hover tooltips
  • [ ] 4.4: Create search index at build time
  • [ ] Phase 5: Documentation Agent
  • [ ] 5.1: Design trigger system
  • [ ] 5.2: Implement event detection
  • [ ] 5.3: Create auto-update workflows
  • [ ] 5.4: Integrate with TODO-BACKLOG sync
  • [ ] Phase 6: Testing & Deployment
  • [ ] 6.1: Test on desktop browsers
  • [ ] 6.2: Test mobile/tablet responsiveness
  • [ ] 6.3: Validate all links
  • [ ] 6.4: Deploy to production

🔀 Tangent Log

Active Tangents

[No active tangents yet]

Completed Tangents

[No completed tangents yet]


📋 Original Design Document

Problem Statement

The current HTML documentation has several issues: 1. Unorganized: No clear entry point or dashboard 2. Long single pages: Each section is one long page (hard to navigate) 3. No quick links: Can't quickly jump to important work (TODO-BACKLOG, Active Projects) 4. Poor discoverability: Hard to find related documentation 5. No semantic search: Keyword search only 6. Manual maintenance: Documentation gets stale, TODO-BACKLOG has unmarked completed work 7. Build process confusion: MD → HTML workflow not clear

Goals

  1. Create intuitive dashboard with statistics and quick navigation
  2. Granular page structure with individual topics on separate pages
  3. Semantic search for better content discovery
  4. Automatic cross-referencing with hover tooltips
  5. Task-oriented navigation aligned with developer workflows
  6. Documentation agent that keeps docs current without manual intervention
  7. Clarify and automate the MD → HTML build process

Success Criteria

  • ✅ Dashboard landing page with 10+ live statistics
  • ✅ Individual pages for every major documentation topic
  • ✅ Semantic search returns relevant results (not just keyword matches)
  • ✅ Cross-references automatically generated with hover previews
  • ✅ Documentation updates trigger automatically on code/decision changes
  • ✅ TODO-BACKLOG stays synchronized with actual work status
  • ✅ Build process happens automatically (no manual Make commands)
  • ✅ Zero broken internal links
  • ✅ Mobile-friendly (responsive design)

1. Dashboard Architecture

1.1 Landing Page (/index.html)

Purpose: Single entry point showing system health and quick navigation

Layout:

┌────────────────────────────────────────────────────┐
│  EPGOAT Documentation                              │
│  Last Updated: 2025-11-12 09:45                    │
├────────────────────────────────────────────────────┤
│                                                    │
│  📊 SYSTEM HEALTH                                  │
│  ┌──────────────┬──────────────┬──────────────┐  │
│  │ Coverage     │ Broken Links │ Stale Docs   │  │
│  │ 89%          │ 0            │ 2            │  │
│  └──────────────┴──────────────┴──────────────┘  │
│                                                    │
│  🔗 QUICK LINKS                                    │
│  • TODO-BACKLOG (3 pending)                       │
│  • Active Projects (1 in progress)                │
│  • Recent ADRs (2 this week)                      │
│  • Quick Start Guide                              │
│  • Database Schema (53 tables)                    │
│  • API Reference (v2)                             │
│                                                    │
│  📈 PROJECT STATISTICS                             │
│  • Database Tables: 53                            │
│  • API Endpoints: 47                              │
│  • Test Coverage: 78%                             │
│  • Migrations: 21                                 │
│  • Active TODOs: 3                                │
│  • Completed Items: 12                            │
│                                                    │
│  🔍 SEARCH                                         │
│  [Search documentation...]  [Semantic Search 🧠]  │
│                                                    │
│  📚 NAVIGATION                                     │
│  • Getting Started                                │
│  • Development Workflow                           │
│  • Work Management                                │
│  • Technical Reference                            │
│  • Decisions & History                            │
│  • Executive Dashboard                            │
└────────────────────────────────────────────────────┘

1.2 Statistics Module

Data Sources: - Static (generated at build time): - Documentation coverage (% of code with docs) - Database tables count - API endpoints count - Migration count - Dynamic (GitHub API for private repo): - Last updated timestamp - Broken links count - Stale docs count - Active projects count - Pending TODOs count - Completed items this week

GitHub API Integration: - Authentication: Personal Access Token (PAT) with repo scope - Rate Limits: 5,000 requests/hour (authenticated, private repo) - Endpoints: - GET /repos/{owner}/{repo}/contents/{path} - Read TODO-BACKLOG.md, ACTIVE-WORK.md - GET /repos/{owner}/{repo}/commits - Get last update timestamps - Parse markdown files client-side to count pending items - Caching: Cache results for 5 minutes (client-side localStorage) - Fallback: Show "⏳ Loading..." if API slow, show cached data if API fails

Implementation:

// dashboard.js
async function fetchGitHubStats() {
  const token = CONFIG.GITHUB_TOKEN; // From config.js (gitignored)
  const headers = { 'Authorization': `Bearer ${token}` };

  // Fetch TODO-BACKLOG
  const todoResp = await fetch(
    'https://api.github.com/repos/aflores3/epgoat-internal/contents/Documentation/LLM/01-Work-In-Progress/TODO-BACKLOG.md',
    { headers }
  );
  const todoContent = atob((await todoResp.json()).content);
  const pendingCount = (todoContent.match(/🔴|🟠|🟡/g) || []).length;

  return { pendingTodos: pendingCount, ... };
}

Links (with live counts): - TODO-BACKLOG (badge: "3 pending") - Active Projects (badge: "1 in progress") - Recent ADRs (badge: "2 this week") - Quick Start Guide - Database Schema (badge: "53 tables") - API Reference (badge: "v2")

Behavior: Links only (no inline content expansion)


2. Page Granularity Strategy

2.1 Hybrid Approach (Option C)

Three-Level Hierarchy:

  1. Top Level: Functional area overview pages
  2. /reference/database/index.html - Database overview
  3. /guides/development/index.html - Development guides overview

  4. Mid Level: Grouped topic pages

  5. /reference/database/functional-areas/epg-system.html - EPG-related tables
  6. /reference/database/functional-areas/user-management.html - User tables

  7. Bottom Level: Individual detail pages

  8. /reference/database/tables/events.html - Events table details
  9. /reference/database/tables/epg_data.html - EPG data table details

Example Navigation Path:

Dashboard
  → Technical Reference
    → Database Schema
      → EPG System (functional area)
        → Events Table (individual table)

2.2 Page Splitting Rules

Database Documentation: - Overview page: Schema statistics, recent migrations, functional areas list - Functional area pages (6-8 pages): - EPG System (events, epg_data, schedule, programme) - Team Management (teams, team_aliases, team_discovery) - Channel Management (channels, channel_names, channel_patterns) - User Management (users, subscriptions, preferences) - Provider Management (providers, playlists, matches) - Infrastructure (migrations, audit logs, health checks) - Individual table pages (53 pages): - Table purpose - All columns with types, constraints, descriptions - Relationships (foreign keys in/out) - Indexes - Triggers - Usage examples (code snippets) - Related tables (cross-links)

Other Documentation: - Standards: One page per standard (Python, TypeScript, Git, etc.) - Guides: One page per guide topic - ADRs: One page per decision - Projects: One page per project

2.3 File Structure

Documentation/HTML/
├── index.html (dashboard)
├── reference/
│   ├── database/
│   │   ├── index.html (overview)
│   │   ├── functional-areas/
│   │   │   ├── epg-system.html
│   │   │   ├── team-management.html
│   │   │   └── ...
│   │   └── tables/
│   │       ├── events.html
│   │       ├── teams.html
│   │       └── ...
│   └── api/
│       ├── index.html
│       └── endpoints/
│           ├── events-search.html
│           └── ...
├── guides/
│   ├── index.html
│   ├── getting-started.html
│   ├── database-migrations.html
│   └── ...
├── standards/
│   ├── index.html
│   ├── python.html
│   ├── typescript.html
│   └── ...
└── decisions/
    ├── index.html
    ├── adr-001-supabase.html
    └── ...

3. Semantic Search Implementation

Two-Stage Search: 1. Stage 1: Keyword Search (lunr.js) - Fast full-text search - Returns exact matches

  1. Stage 2: Semantic Similarity (TF-IDF)
  2. Finds related documents
  3. Shows "Related Documentation" section

Why This Approach: - ✅ Zero runtime cost (pre-computed at build time) - ✅ Client-side only (no server/API needed) - ✅ Fast (keyword search is instant, similarity pre-computed) - ✅ No dependencies on external services - ✅ Works offline - ✅ Privacy-preserving (no data sent to third parties)

3.2 Implementation Details

Build Time (Python):

# Documentation/LLM/.meta/build/scripts/build_search_index.py

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import json

def build_search_index(all_docs):
    """Build both keyword index and similarity matrix."""

    # Stage 1: Keyword index for lunr.js
    keyword_index = []
    for doc in all_docs:
        keyword_index.append({
            'id': doc['id'],
            'title': doc['title'],
            'content': doc['content'],
            'url': doc['url']
        })

    # Stage 2: TF-IDF similarity matrix
    vectorizer = TfidfVectorizer(stop_words='english', max_features=500)
    tfidf_matrix = vectorizer.fit_transform([d['content'] for d in all_docs])
    similarity_matrix = cosine_similarity(tfidf_matrix)

    # For each doc, store top 5 most similar docs
    similarity_index = {}
    for i, doc in enumerate(all_docs):
        similar_indices = similarity_matrix[i].argsort()[-6:-1][::-1]  # Top 5 (excluding self)
        similarity_index[doc['id']] = [
            {
                'id': all_docs[j]['id'],
                'title': all_docs[j]['title'],
                'url': all_docs[j]['url'],
                'similarity': float(similarity_matrix[i][j])
            }
            for j in similar_indices
        ]

    return {
        'keyword_index': keyword_index,
        'similarity_index': similarity_index
    }

# Write to search-index.json
with open('Documentation/HTML/assets/search-index.json', 'w') as f:
    json.dump(build_search_index(all_docs), f)

Runtime (JavaScript):

// Documentation/HTML/assets/search.js

// Load pre-computed index
const searchData = await fetch('/assets/search-index.json').then(r => r.json());

// Stage 1: Keyword search with lunr.js
const idx = lunr(function () {
  this.ref('id');
  this.field('title', { boost: 10 });
  this.field('content');

  searchData.keyword_index.forEach(doc => this.add(doc));
});

function search(query) {
  // Keyword search
  const results = idx.search(query);

  // For each result, add related docs from similarity index
  return results.map(result => ({
    ...result,
    related: searchData.similarity_index[result.ref] || []
  }));
}

UI:

┌─────────────────────────────────────────┐
│ Search: "database migrations"     [🔍]  │
├─────────────────────────────────────────┤
│ 📄 Database Migration Guide             │
│    Complete guide to writing and...    │
│    /guides/database-migrations.html     │
│                                         │
│ 📄 Database Standards                   │
│    Schema conventions, soft deletes...  │
│    /standards/database.html             │
│                                         │
│ 🔗 Related Documentation:               │
│    • ADR-001: Supabase Migration        │
│    • Database Schema Reference          │
│    • Migration Template                 │
└─────────────────────────────────────────┘

3.3 Search Index Generation

When to Regenerate: - Every build (part of build.py) - ~2 seconds for typical documentation size - Output: Documentation/HTML/assets/search-index.json (~200KB compressed)

What's Indexed: - All markdown source files - Titles, headings, body content - Code examples (with lower weight) - Table names, API endpoints (with higher weight)


4. Cross-Reference System

Behavior: - Detect references to other documentation (tables, APIs, guides) - Auto-generate links - Show tooltip preview on hover - Click to navigate to full page

Example:

<!-- In EPG System functional area page -->
<p>The <a href="/reference/database/tables/events.html"
      class="doc-ref"
      data-preview="events table stores sports events from TheSportsDB and ESPN APIs">
   events table
</a> stores all scheduled sports events.</p>

Hover Behavior:

┌─────────────────────────────────────────┐
│ The [events table] stores all...       │
│       ↓ (hover)                         │
│     ╔═══════════════════════════════╗  │
│     ║ events                        ║  │
│     ║ Stores sports events from     ║  │
│     ║ TheSportsDB and ESPN APIs     ║  │
│     ║                               ║  │
│     ║ Click to view full details → ║  │
│     ╚═══════════════════════════════╝  │
└─────────────────────────────────────────┘

4.2 Auto-Detection Rules

Database Tables: - Pattern: events table, epg_data, events (backticks) - Links to: /reference/database/tables/{table_name}.html - Preview: First sentence of table purpose

API Endpoints: - Pattern: /api/v2/events/search, events search endpoint - Links to: /reference/api/endpoints/{endpoint_name}.html - Preview: Endpoint description + method

Guides: - Pattern: Database Migration Guide, [Database Migration Guide] - Links to: /guides/database-migrations.html - Preview: First paragraph

ADRs: - Pattern: ADR-001, Supabase Migration decision - Links to: /decisions/adr-001-supabase.html - Preview: Decision summary

Standards: - Pattern: Python Standards, TypeScript coding standards - Links to: /standards/{standard_name}.html - Preview: Key rules summary

4.3 Implementation

Build Time (Python):

# Documentation/LLM/.meta/build/scripts/add_cross_references.py

import re
from bs4 import BeautifulSoup

REFERENCE_PATTERNS = {
    'table': (
        r'\b(\w+)\s+table\b',
        lambda m: f'/reference/database/tables/{m.group(1)}.html'
    ),
    'guide': (
        r'\[(.*?)\]\(((?!http).*?\.md)\)',  # Internal markdown links
        lambda m: f'/guides/{m.group(2).replace(".md", ".html")}'
    ),
    # ... more patterns
}

def add_cross_references(html_content, doc_id):
    """Add data-preview attributes and links to references."""
    soup = BeautifulSoup(html_content, 'html.parser')

    for pattern_type, (pattern, url_func) in REFERENCE_PATTERNS.items():
        for match in re.finditer(pattern, soup.get_text()):
            # Find the text node and wrap in <a> with preview
            ref_text = match.group(1)
            preview = get_preview(ref_text, pattern_type)

            # Insert link with preview data
            link = soup.new_tag('a',
                href=url_func(match),
                **{'class': 'doc-ref', 'data-preview': preview}
            )
            # ... wrap text in link

    return str(soup)

Runtime (JavaScript):

// Documentation/HTML/assets/cross-references.js

document.querySelectorAll('.doc-ref').forEach(link => {
  link.addEventListener('mouseenter', (e) => {
    const preview = e.target.getAttribute('data-preview');
    showTooltip(e.target, preview);
  });

  link.addEventListener('mouseleave', (e) => {
    hideTooltip();
  });
});

function showTooltip(element, content) {
  const tooltip = document.createElement('div');
  tooltip.className = 'doc-preview-tooltip';
  tooltip.textContent = content;

  const rect = element.getBoundingClientRect();
  tooltip.style.top = rect.bottom + 5 + 'px';
  tooltip.style.left = rect.left + 'px';

  document.body.appendChild(tooltip);
}

5. Task-Oriented Navigation

5.1 Navigation Structure

Top-Level Categories (aligned with developer workflows):

  1. Getting Started (/getting-started/)
  2. Quick Start Guide
  3. Installation & Setup
  4. First EPG Generation
  5. Understanding the Codebase

  6. Development (/development/)

  7. Development Workflow
  8. Running Tests
  9. Code Quality Tools
  10. Debugging Guide

  11. Work Management (/work/)

  12. TODO-BACKLOG
  13. Active Projects
  14. Completed Work Archive
  15. Weekly Planning

  16. Technical Reference (/reference/)

  17. Database Schema
  18. API Reference
  19. Configuration Files
  20. Third-Party APIs (TheSportsDB, ESPN)

  21. Standards (/standards/)

  22. Core Principles
  23. Python Standards
  24. TypeScript Standards
  25. Git Workflow
  26. Database Standards

  27. Decisions & History (/decisions/)

  28. Architecture Decision Records (ADRs)
  29. Migration History
  30. Project Postmortems
  31. Lessons Learned

  32. Executive Dashboard (/executive/)

  33. CEO Inbox
  34. CTO Updates
  35. Quarterly Objectives
  36. Decisions Pending

5.2 Navigation Component

Sidebar (persistent):

┌─────────────────────────┐
│ EPGOAT Documentation    │
├─────────────────────────┤
│ 🏠 Dashboard            │
│                         │
│ 🚀 Getting Started      │
│   • Quick Start         │
│   • Installation        │
│   • First EPG           │
│                         │
│ 💻 Development          │
│   • Workflow            │
│   • Testing             │
│   • Debugging           │
│                         │
│ 📋 Work Management      │
│   • TODO-BACKLOG   [3]  │
│   • Active Projects [1] │
│   • Archive             │
│                         │
│ 📚 Technical Reference  │
│   • Database            │
│   • API                 │
│   • Config Files        │
│                         │
│ 📐 Standards            │
│   • Core Principles     │
│   • Python              │
│   • TypeScript          │
│   • Git                 │
│                         │
│ 🏛️ Decisions & History  │
│   • ADRs                │
│   • Migrations          │
│   • Postmortems         │
│                         │
│ 👔 Executive            │
│   • CEO Inbox           │
│   • CTO Updates         │
│   • Objectives          │
└─────────────────────────┘

Breadcrumbs (top of page):

Dashboard → Technical Reference → Database Schema → EPG System → Events Table

Page-Level Navigation (bottom of page):

← Previous: EPG Data Table  |  Next: Schedule Table →

Related Documentation:
• TheSportsDB API Integration Guide
• ADR-011: Dual API Source System
• Team Management Functional Area

6. Documentation Agent System

6.1 Purpose

Problem: Documentation gets stale because: - Code changes aren't reflected in docs - Decisions are made but ADRs not updated - TODO-BACKLOG shows completed work as pending - New topics aren't documented

Solution: Event-driven Documentation Agent that: - Detects changes requiring documentation updates - Suggests specific documentation changes - Auto-updates simple cases (timestamps, stats) - Keeps TODO-BACKLOG synchronized - Runs BEFORE commits (not after)

6.2 Trigger System

When Agent Activates:

  1. Code Changes (via git hooks):
  2. Any .py file in backend/epgoat/domain/models.py → Update Database Schema docs
  3. Any .sql migration file → Update Database Schema docs (already covered by existing hook)
  4. Any services/*.py file → Update API Reference docs
  5. Any config/*.yml file → Update Configuration Reference

  6. Decision Changes (via file watching):

  7. New .md file in Documentation/LLM/06-Decisions/ → Update Decisions index
  8. Changes to existing ADR → Update related documentation references

  9. Documentation Changes (via file watching):

  10. Any .md change in Documentation/LLM/ → Rebuild affected HTML
  11. New topic added → Update navigation and search index

  12. Work Status Changes (via scheduled check):

  13. Changes to TodoWrite state → Sync with TODO-BACKLOG.md
  14. Living document progress → Update Work Management dashboard
  15. Commit messages with "feat:", "fix:" → Check if TODO item can be marked complete

How Agent Runs: - Local: Git hooks (pre-commit, post-commit) - Not GitHub Actions (conserve minutes) - LLM-driven: Use maintain-documentation skill + automation scripts - Timing: BEFORE PR merge (not after)

6.3 Agent Modes

Mode 1: Auto-Update (no user intervention) - Update "Last Updated" timestamps - Rebuild search index - Update statistics (table count, API endpoint count) - Regenerate database docs from migrations

Mode 2: Suggest Updates (prompt user) - Code change detected → "Would you like to update {doc_name}?" - New decision made → "Should I create ADR-XXX for {decision}?" - TODO marked complete in commit → "Mark TODO as complete in BACKLOG?"

Mode 3: Validate (block commit if failed) - Broken internal links → Block commit - Token budget exceeded (Layer 1 >50K) → Block commit - Build failure → Block commit

6.4 TODO-BACKLOG Sync

Problem: Work completed but not marked in TODO-BACKLOG.md

Root Cause Analysis (to be investigated): - Manual updates forgotten - No automatic sync between TodoWrite and TODO-BACKLOG - Completion happens in different session/context

Solution:

# Documentation/scripts/sync_todo_backlog.py

def sync_todo_with_backlog():
    """Sync TodoWrite state with TODO-BACKLOG.md."""

    # 1. Parse current TODO-BACKLOG.md
    backlog = parse_todo_backlog()

    # 2. Check git commits for completion signals
    recent_commits = git.log('--since="1 week ago"', '--oneline')
    completed_items = extract_completed_items(recent_commits)

    # 3. Check living documents for completed tasks
    living_docs = find_living_documents()
    for doc in living_docs:
        completed_items.extend(extract_completed_tasks(doc))

    # 4. Update TODO-BACKLOG.md
    for item in completed_items:
        mark_complete_in_backlog(backlog, item)

    # 5. Write back
    write_todo_backlog(backlog)

When to Run: - Pre-commit hook: Check for TODOs mentioned in commit message - Post-commit hook: Extract completed work from commit and mark in BACKLOG - Daily: Scan living documents and mark completed tasks - On demand: make sync-todo-backlog

6.5 Integration with maintain-documentation Skill

Relationship: - maintain-documentation skill: Real-time linter during active coding - Runs when you're actively writing code - Checks if code changes affect documentation - Prompts for immediate updates - Part of your active workflow

  • Documentation Agent: Comprehensive periodic maintenance
  • Runs on git hooks (pre-commit, post-commit)
  • Scans entire codebase for stale docs
  • Auto-updates simple cases
  • Catches things missed during active coding

Both are needed: Skill is proactive during work, Agent is comprehensive sweep

6.6 Implementation Phases

Phase 1: Git hooks for automatic triggers

# .git/hooks/pre-commit-documentation-agent
# Runs before commit, blocks if documentation invalid

python3 Documentation/scripts/documentation_agent.py --mode validate

if [ $? -ne 0 ]; then
  echo "❌ Documentation validation failed"
  echo "Run: make update-docs"
  exit 1
fi

Phase 2: Event detection and suggestion

# Documentation/scripts/documentation_agent.py

class DocumentationAgent:
    def detect_changes(self):
        """Detect what changed and what docs need updating."""
        changed_files = git.diff('--cached', '--name-only')

        suggestions = []
        for file in changed_files:
            if 'backend/epgoat/domain/models.py' in file:
                suggestions.append({
                    'file': file,
                    'docs_affected': ['Database Schema'],
                    'action': 'regenerate',
                    'auto': True  # Can auto-update
                })
            elif 'services/' in file:
                suggestions.append({
                    'file': file,
                    'docs_affected': ['API Reference'],
                    'action': 'manual_update',
                    'auto': False  # Needs human review
                })

        return suggestions

Phase 3: Auto-update execution

def execute_auto_updates(suggestions):
    """Execute automatic documentation updates."""
    for suggestion in suggestions:
        if suggestion['auto']:
            if 'Database Schema' in suggestion['docs_affected']:
                run_command('make regenerate-db-docs')
            elif 'search index' in suggestion['docs_affected']:
                run_command('python3 Documentation/LLM/.meta/build/scripts/build_search_index.py')

7. Build Process: LLM-Enriched Documentation Generation

7.1 The "Teacher Model" Approach

Key Insight: MD and HTML serve different audiences with different needs

Markdown (Documentation/LLM/): - Audience: AI assistants (Claude Code, etc.) - Style: Concise, token-efficient, reference format - Purpose: Quick lookups, programmatic access, minimal context - Example: "events table: stores sports events from APIs. 30 columns."

HTML (Documentation/HTML/): - Audience: Freshly graduated CS students (entry-level developers) - Style: Verbose, educational, explanatory - Purpose: Learning, understanding, onboarding - Example: Full explanation with context, examples, diagrams, use cases

The Build Process as "Teaching": The LLM acts as a teacher during the build process: - Takes concise MD notes (the "outline") - Expands them into rich educational content (the "lesson") - Adds context, examples, diagrams, analogies - Explains "why" not just "what" - Anticipates junior developer questions

7.2 Enrichment Process

What the LLM adds during build:

  1. Expanded Explanations
  2. MD: "soft delete via record_status field"
  3. HTML: "We use soft deletes instead of hard deletes (DELETE statements) to preserve historical data. The record_status field can be 'active', 'archived', or 'deleted'. This approach allows us to recover accidentally deleted data and maintain audit trails. For example, if a user deletes an event, we set record_status='deleted' instead of removing the row entirely."

  4. Real Code Examples

  5. MD: "Usage: api_events.py, services/event_matcher.py"
  6. HTML: Complete code snippets showing actual usage with line numbers and explanations: python # backend/epgoat/services/event_matcher.py:45-52 def find_upcoming_events(): """Find all active events in the next 7 days.""" return db.query(Event).filter( Event.record_status == 'active', # Only active events Event.start_time >= datetime.now(), Event.start_time <= datetime.now() + timedelta(days=7) ).all()

  7. Visual Aids

  8. MD: "Foreign key: team_id → teams.id"
  9. HTML: ER diagram showing relationships, arrows, cardinality

  10. Context and Rationale

  11. MD: "Dual API sources: TheSportsDB, ESPN"
  12. HTML: "Why we use two APIs: TheSportsDB provides comprehensive historical data but sometimes lags on live scores. ESPN offers real-time updates but limited historical coverage. By combining both, we get the best of both worlds. See ADR-011 for the full decision rationale."

  13. Common Pitfalls / Gotchas

  14. MD: (none)
  15. HTML: "⚠️ Common Mistake: Don't query events without checking record_status. Always filter by record_status='active' unless you specifically need archived/deleted records. See Database Standards for soft delete patterns."

  16. Related Documentation Links

  17. MD: (minimal)
  18. HTML: Extensive cross-references with context: "📚 Learn more about team discovery in the Team Management Guide. See how this table is used in the Event Matching Workflow. Check ADR-017 for the team alias consolidation decision."

7.3 New Workflow (LLM-Enriched Build)

Workflow:

1. LLM writes:     Documentation/LLM/05-Reference/Database/Schema.md (CONCISE)
2. Auto-trigger:   Git pre-commit hook detects .md change
3. LLM enrichment: Build system sends MD to LLM with enrichment prompt
4. LLM expands:    Generates verbose, educational HTML with examples/diagrams
5. Stage HTML:     Hook stages generated HTML files
6. Commit once:    Single commit includes both MD source and HTML output

Implementation:

# Documentation/LLM/.meta/build/scripts/build_with_enrichment.py

from anthropic import Anthropic

def enrich_documentation(md_content: str, doc_type: str) -> str:
    """Use LLM to expand concise MD into verbose educational HTML."""

    prompt = f"""You are a senior software engineer teaching a freshly graduated CS student.

Take this concise technical documentation and expand it into rich, educational content:

{md_content}

Your expanded version should:
1. Explain concepts thoroughly (assume junior developer knowledge level)
2. Add real code examples from the codebase with explanations
3. Include visual aids where helpful (Mermaid diagrams, ASCII art)
4. Provide context and rationale (the "why" not just "what")
5. Highlight common pitfalls and gotchas
6. Add cross-references to related documentation
7. Use analogies when explaining complex concepts
8. Anticipate questions a junior developer would ask

Output format: HTML (semantic tags, accessible, clean structure)

Target length: 3-5x longer than the input (be verbose and thorough)
"""

    client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))

    response = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=8000,
        messages=[{"role": "user", "content": prompt}]
    )

    return response.content[0].text

def build_documentation():
    """Build all documentation with LLM enrichment."""

    for md_file in find_all_md_files('Documentation/LLM/'):
        # Read concise MD
        md_content = read_file(md_file)

        # Enrich with LLM
        enriched_html = enrich_documentation(md_content, doc_type='reference')

        # Add styling, navigation, cross-references
        final_html = apply_template(enriched_html)

        # Write to HTML output
        html_path = convert_path_to_html(md_file)
        write_file(html_path, final_html)

Git Hook:

# .git/hooks/pre-commit-build-docs

#!/bin/bash

MD_CHANGED=$(git diff --cached --name-only | grep "Documentation/LLM/.*\.md$" || true)

if [ -n "$MD_CHANGED" ]; then
  echo "📝 Markdown documentation changed, enriching with LLM..."

  # Run LLM-enriched build system
  python3 Documentation/LLM/.meta/build/scripts/build_with_enrichment.py

  if [ $? -eq 0 ]; then
    git add Documentation/HTML/
    echo "✅ HTML documentation enriched and staged"
  else
    echo "❌ Build failed, commit blocked"
    exit 1
  fi
fi

7.4 Benefits of LLM Enrichment

For Junior Developers: - ✅ Educational content tailored to their level - ✅ Learn "why" decisions were made, not just "what" exists - ✅ Code examples show real-world usage patterns - ✅ Visual aids help understand complex relationships - ✅ Common pitfalls prevent mistakes

For Senior Developers: - ✅ Write concise MD (faster, less repetitive) - ✅ LLM handles the "teaching" automatically - ✅ Consistent explanation quality - ✅ Scales without manual effort

For Documentation Maintenance: - ✅ Single source of truth (concise MD) - ✅ Version control stays clean (MD diffs readable) - ✅ Enrichment is reproducible (same MD = same HTML) - ✅ Can regenerate all HTML from MD at any time

For the Business: - ✅ Better onboarding (new hires learn faster) - ✅ Reduced training time (docs teach effectively) - ✅ Knowledge preservation (tribal knowledge documented) - ✅ Lower cost than hiring technical writers

7.5 Build Performance Considerations

Cost: - LLM API calls: ~$0.01-0.05 per page (Claude Sonnet) - Full documentation rebuild: ~$2-5 for entire site - Incremental builds: Only changed pages (~$0.01-0.10)

Time: - Enrichment: ~3-5 seconds per page - Full rebuild: ~2-3 minutes for 30-40 pages - Incremental: ~3-10 seconds per changed page

Optimization: - Cache enriched content (only rebuild if MD changed) - Parallel processing (enrich multiple pages simultaneously) - Use Haiku for simple pages, Sonnet for complex ones - Skip enrichment for non-reference docs (ADRs, guides already verbose)

When to Enrich: - Reference documentation (Database, API) → Always enrich - Standards documentation → Enrich - Guides / How-tos → Already verbose, minimal enrichment - ADRs → Already verbose, skip enrichment - Executive docs (CEO-INBOX, etc.) → Skip enrichment

7.6 Example: Before and After

Markdown Source (Schema.md):

## events

Sports events from TheSportsDB and ESPN APIs.

**Columns**:
- id (UUID, PK)
- thesportsdb_id (TEXT)
- espn_id (TEXT)
- sport_name (TEXT)
- league (TEXT)
- start_time (TIMESTAMP)
- record_status (TEXT)

**Relationships**:
- team_home_id → teams.id
- team_away_id → teams.id

**Usage**: api_events.py:45, event_matcher.py:78

Enriched HTML Output (Database-Schema.html, events section):

<section id="events-table">
  <h2>🏟️ Events Table</h2>

  <div class="overview">
    <p>The <code>events</code> table is the heart of EPGOAT's EPG system. It stores scheduled sports events from two primary sources: <strong>TheSportsDB</strong> (comprehensive historical data) and <strong>ESPN</strong> (real-time updates).</p>

    <p><strong>Why two APIs?</strong> We use a dual-source strategy (see <a href="/decisions/adr-011-dual-api-sources.html">ADR-011</a>) because no single API provides both comprehensive historical coverage AND real-time accuracy. TheSportsDB excels at breadth, ESPN excels at freshness. By combining them, we get the best of both worlds.</p>
  </div>

  <div class="visual-aid">
    <h3>Relationship Diagram</h3>
    <pre class="mermaid">
    erDiagram
      EVENTS ||--o{ TEAMS : "home team"
      EVENTS ||--o{ TEAMS : "away team"
      EVENTS }o--|| EPG_DATA : "generates"
    </pre>
  </div>

  <div class="columns-detail">
    <h3>Column Details</h3>

    <div class="column">
      <h4><code>id</code> (UUID, Primary Key)</h4>
      <p>Unique identifier for each event. We use UUIDs instead of auto-incrementing integers because events come from multiple external sources and UUIDs prevent ID collisions.</p>
      <p><strong>Example</strong>: <code>a3f2d8e7-12c4-4b5a-9d3e-7c8f2a1b4e6c</code></p>
    </div>

    <div class="column">
      <h4><code>thesportsdb_id</code> (TEXT, Nullable)</h4>
      <p>External ID from TheSportsDB API. Nullable because some events only exist in ESPN (e.g., very recent matchups).</p>
      <p><strong>When to use</strong>: When querying TheSportsDB API for additional event details. Store this to avoid re-fetching known events.</p>
      <p><strong>⚠️ Gotcha</strong>: Not all events have a TheSportsDB ID! Always check for NULL before using.</p>
    </div>

    <div class="column">
      <h4><code>espn_id</code> (TEXT, Nullable)</h4>
      <p>External ID from ESPN API. Nullable for the same reason as thesportsdb_id.</p>
      <p><strong>Format</strong>: Usually numeric string like <code>"401547392"</code></p>
    </div>

    <div class="column">
      <h4><code>sport_name</code> (TEXT, Required)</h4>
      <p>The sport type (e.g., "NFL", "NBA", "Soccer"). Normalized to match our sport_emojis.yml configuration.</p>
      <p><strong>Why normalized?</strong> TheSportsDB might say "American Football" while ESPN says "NFL". We normalize to a canonical form for consistency. See <code>backend/epgoat/config/sport_emojis.yml</code> for the full mapping.</p>
    </div>

    <div class="column">
      <h4><code>record_status</code> (TEXT, Required)</h4>
      <p>Soft delete status: <code>'active'</code>, <code>'archived'</code>, or <code>'deleted'</code>.</p>
      <p><strong>⚠️ Critical</strong>: NEVER use hard deletes (DELETE statements) on this table! Always use soft deletes by setting <code>record_status='deleted'</code>. This preserves historical data for analytics and allows recovery from accidental deletions.</p>
      <p><strong>Querying pattern</strong>: Always filter by <code>record_status='active'</code> unless you specifically need archived/deleted records.</p>
    </div>
  </div>

  <div class="code-examples">
    <h3>Code Examples</h3>

    <div class="example">
      <h4>Finding Upcoming Events</h4>
      <p>Here's how we query for events in the next 7 days (from <code>backend/epgoat/services/event_matcher.py:78-85</code>):</p>
      <pre><code class="language-python">
def find_upcoming_events(sport: str = None) -> list[Event]:
    """Find all active events in the next 7 days.

    Args:
        sport: Optional filter by sport name (e.g., 'NFL')

    Returns:
        List of Event objects sorted by start time
    """
    query = db.query(Event).filter(
        Event.record_status == 'active',  # Only active events
        Event.start_time >= datetime.now(),
        Event.start_time <= datetime.now() + timedelta(days=7)
    )

    if sport:
        query = query.filter(Event.sport_name == sport)

    return query.order_by(Event.start_time).all()
      </code></pre>
      <p><strong>Note</strong>: Notice how we ALWAYS filter by <code>record_status='active'</code>. This is the soft delete pattern.</p>
    </div>

    <div class="example">
      <h4>Creating a New Event</h4>
      <p>When ingesting events from APIs (from <code>backend/epgoat/api/api_events.py:45-58</code>):</p>
      <pre><code class="language-python">
def create_event_from_api(api_data: dict, source: str) -> Event:
    """Create event from TheSportsDB or ESPN API response.

    Args:
        api_data: Raw API response dictionary
        source: 'thesportsdb' or 'espn'
    """
    event = Event(
        id=uuid.uuid4(),
        sport_name=normalize_sport_name(api_data['sport']),
        league=api_data['league'],
        start_time=parse_timestamp(api_data['dateEvent']),
        record_status='active'  # New events are always active
    )

    # Store external ID based on source
    if source == 'thesportsdb':
        event.thesportsdb_id = api_data['idEvent']
    elif source == 'espn':
        event.espn_id = str(api_data['id'])

    db.add(event)
    db.commit()
    return event
      </code></pre>
    </div>
  </div>

  <div class="related-docs">
    <h3>Related Documentation</h3>
    <ul>
      <li>📖 <a href="/guides/event-matching.html">Event Matching Workflow</a> - How we match IPTV channels to events</li>
      <li>📖 <a href="/reference/api/thesportsdb.html">TheSportsDB API Integration</a> - Fetching event data</li>
      <li>📖 <a href="/reference/api/espn.html">ESPN API Integration</a> - Real-time event updates</li>
      <li>🏛️ <a href="/decisions/adr-011-dual-api-sources.html">ADR-011: Dual API Source Strategy</a> - Why we use two APIs</li>
      <li>📐 <a href="/standards/database.html#soft-deletes">Database Standards: Soft Delete Pattern</a> - record_status usage</li>
    </ul>
  </div>

  <div class="indexes-triggers">
    <h3>Indexes</h3>
    <p>The following indexes improve query performance:</p>
    <ul>
      <li><code>idx_events_start_time</code> - Fast lookups by date range</li>
      <li><code>idx_events_sport_name</code> - Filter by sport efficiently</li>
      <li><code>idx_events_record_status</code> - Soft delete queries</li>
    </ul>
    <p><strong>Why these indexes?</strong> Our most common query pattern is "find active events for a specific sport in a date range". These indexes cover that exact pattern.</p>
  </div>
</section>

Result: - MD: ~150 words, 10 lines → Quick AI reference - HTML: ~1,200 words, detailed teaching → Junior dev can learn the system

This is the "teacher model" in action!


8. Mobile & Responsive Design

8.1 Requirements

Priority: Desktop primary, mobile/tablet nice to have

Breakpoints: - Desktop: ≥1200px (default layout) - Tablet: 768px - 1199px (simplified sidebar) - Mobile: <768px (hamburger menu)

Responsive Behavior: - Dashboard: Single column on mobile, grid on desktop - Navigation: Collapsible sidebar on mobile - Search: Full-width on mobile - Tables: Horizontal scroll on mobile - Code blocks: Horizontal scroll on mobile

8.2 CSS Framework

Recommendation: Tailwind CSS or custom CSS Grid

Why not Bootstrap: Heavy, opinionated, not needed for simple docs site

Implementation:

/* Documentation/HTML/assets/style.css */

/* Mobile first */
.dashboard-stats {
  display: grid;
  grid-template-columns: 1fr;
  gap: 1rem;
}

/* Tablet */
@media (min-width: 768px) {
  .dashboard-stats {
    grid-template-columns: repeat(2, 1fr);
  }
}

/* Desktop */
@media (min-width: 1200px) {
  .dashboard-stats {
    grid-template-columns: repeat(3, 1fr);
  }

  .layout {
    display: grid;
    grid-template-columns: 250px 1fr;
  }
}

9. Branding & Style

9.1 Current Decision

CEO preference: "Leave as-is for now"

Rationale: Focus on content organization and functionality first, visual polish later

Current style: Clean, minimal, readable (based on existing build system output)

9.2 Future Considerations

When ready to update branding: - EPGOAT color palette - Logo placement - Custom fonts - Dark mode support - Print stylesheet


10. Implementation Timeline

10.1 Phases

Phase 1: Foundation (Week 1) - Dashboard landing page - Basic statistics (static) - Quick links section - Task-oriented navigation structure

Phase 2: Page Granularity (Week 2) - Implement page splitting logic - Generate functional area pages - Generate individual table pages - Breadcrumb system

Phase 3: Search & Cross-References (Week 2-3) - Build search index generation - Implement lunr.js keyword search - Add TF-IDF similarity computation - Auto-generate cross-reference links - Hover tooltip system

Phase 4: Dynamic Features (Week 3) - GitHub API integration - Live statistics - Cache layer - Error handling

Phase 5: Documentation Agent (Week 4) - Git hooks implementation - Change detection system - Auto-update logic - TODO-BACKLOG sync - Validation system

Phase 6: Polish & Testing (Week 4) - Mobile responsiveness - Cross-browser testing - Link validation - Performance optimization - Documentation of the documentation system

10.2 Milestones

  • ✅ Design document approved (Day 1)
  • ⏳ Dashboard functional (Day 7)
  • ⏳ All pages split and cross-linked (Day 14)
  • ⏳ Search working (Day 18)
  • ⏳ Documentation agent auto-updating (Day 25)
  • ⏳ Production ready (Day 30)

11. Success Metrics

11.1 Quantitative

  • Page load time: <1 second for any page
  • Search results: <100ms for keyword search
  • Zero broken links: 0 internal links broken
  • Documentation coverage: >90% of code documented
  • TODO-BACKLOG accuracy: 100% of completed work marked
  • Auto-update success rate: >95% of documentation updates automatic

11.2 Qualitative

  • Findability: Can find any topic in <3 clicks from dashboard
  • Clarity: New developers can start from Quick Start Guide
  • Maintenance: Documentation stays current without manual intervention
  • Confidence: CEO trusts documentation is accurate and complete

12. Open Questions for CEO

Before starting implementation:

  1. GitHub API Token: Who will provide the Personal Access Token for private repo access?
  2. Deployment: Where will the HTML documentation be hosted? (Cloudflare Pages, GitHub Pages, other?)
  3. Priority: Which phase should be implemented first if timeline is tight?
  4. Build process: Does auto-build on commit solve your concern about the MD → HTML process?
  5. TODO-BACKLOG sync: Should we investigate why it's out of sync before implementing the agent?

13. References

  • Existing Build System: Documentation/LLM/.meta/build/scripts/build.py
  • Database Doc Generator: Documentation/scripts/generate_database_docs.py
  • Living Document Template: .claude/skills/manage-living-documents/references/living-document-template.md
  • Documentation Standards: Documentation/LLM/02-Standards/Documentation-Standards.md
  • lunr.js: https://lunrjs.com/ (MIT license, 8KB gzipped)
  • GitHub API: https://docs.github.com/en/rest (5,000 requests/hour for private repos)

14. Next Steps

  1. Get CEO approval on this design document
  2. Answer open questions (GitHub token, deployment, priorities)
  3. Investigate TODO-BACKLOG sync issues (why is completed work not marked?)
  4. Start Phase 1 (Dashboard implementation)
  5. Iterate based on feedback

Document Status: [in-progress] - Awaiting CEO review and approval Created: 2025-11-12 Last Updated: 2025-11-12