Project: Documentation Website Redesign
📍 Current Context
Last Updated: 2025-11-12 (session start) Current Task: Design Phase - Creating comprehensive design document Breadcrumbs: Design Phase Status: Documenting architecture decisions, page structure, and implementation plan Next Step: Review design document with CEO for approval before implementation
✅ Progress Tracker
- [in-progress] Phase 1: Design & Planning ← YOU ARE HERE
- [in-progress] 1.1: Create design document ← ACTIVE
- [ ] 1.2: Get CEO approval on design
- [ ] 1.3: Finalize technical specifications
- [ ] Phase 2: Dashboard Implementation
- [ ] 2.1: Create dashboard landing page
- [ ] 2.2: Implement statistics module
- [ ] 2.3: Add quick links section
- [ ] 2.4: Integrate GitHub API (private repo)
- [ ] Phase 3: Page Granularity & Navigation
- [ ] 3.1: Implement page splitting logic
- [ ] 3.2: Create task-oriented navigation
- [ ] 3.3: Build breadcrumb system
- [ ] 3.4: Generate individual topic pages
- [ ] Phase 4: Search & Cross-References
- [ ] 4.1: Implement semantic search (lunr.js + TF-IDF)
- [ ] 4.2: Build cross-reference system
- [ ] 4.3: Add hover tooltips
- [ ] 4.4: Create search index at build time
- [ ] Phase 5: Documentation Agent
- [ ] 5.1: Design trigger system
- [ ] 5.2: Implement event detection
- [ ] 5.3: Create auto-update workflows
- [ ] 5.4: Integrate with TODO-BACKLOG sync
- [ ] Phase 6: Testing & Deployment
- [ ] 6.1: Test on desktop browsers
- [ ] 6.2: Test mobile/tablet responsiveness
- [ ] 6.3: Validate all links
- [ ] 6.4: Deploy to production
🔀 Tangent Log
Active Tangents
[No active tangents yet]
Completed Tangents
[No completed tangents yet]
📋 Original Design Document
Problem Statement
The current HTML documentation has several issues: 1. Unorganized: No clear entry point or dashboard 2. Long single pages: Each section is one long page (hard to navigate) 3. No quick links: Can't quickly jump to important work (TODO-BACKLOG, Active Projects) 4. Poor discoverability: Hard to find related documentation 5. No semantic search: Keyword search only 6. Manual maintenance: Documentation gets stale, TODO-BACKLOG has unmarked completed work 7. Build process confusion: MD → HTML workflow not clear
Goals
- Create intuitive dashboard with statistics and quick navigation
- Granular page structure with individual topics on separate pages
- Semantic search for better content discovery
- Automatic cross-referencing with hover tooltips
- Task-oriented navigation aligned with developer workflows
- Documentation agent that keeps docs current without manual intervention
- Clarify and automate the MD → HTML build process
Success Criteria
- ✅ Dashboard landing page with 10+ live statistics
- ✅ Individual pages for every major documentation topic
- ✅ Semantic search returns relevant results (not just keyword matches)
- ✅ Cross-references automatically generated with hover previews
- ✅ Documentation updates trigger automatically on code/decision changes
- ✅ TODO-BACKLOG stays synchronized with actual work status
- ✅ Build process happens automatically (no manual Make commands)
- ✅ Zero broken internal links
- ✅ Mobile-friendly (responsive design)
1. Dashboard Architecture
1.1 Landing Page (/index.html)
Purpose: Single entry point showing system health and quick navigation
Layout:
┌────────────────────────────────────────────────────┐
│ EPGOAT Documentation │
│ Last Updated: 2025-11-12 09:45 │
├────────────────────────────────────────────────────┤
│ │
│ 📊 SYSTEM HEALTH │
│ ┌──────────────┬──────────────┬──────────────┐ │
│ │ Coverage │ Broken Links │ Stale Docs │ │
│ │ 89% │ 0 │ 2 │ │
│ └──────────────┴──────────────┴──────────────┘ │
│ │
│ 🔗 QUICK LINKS │
│ • TODO-BACKLOG (3 pending) │
│ • Active Projects (1 in progress) │
│ • Recent ADRs (2 this week) │
│ • Quick Start Guide │
│ • Database Schema (53 tables) │
│ • API Reference (v2) │
│ │
│ 📈 PROJECT STATISTICS │
│ • Database Tables: 53 │
│ • API Endpoints: 47 │
│ • Test Coverage: 78% │
│ • Migrations: 21 │
│ • Active TODOs: 3 │
│ • Completed Items: 12 │
│ │
│ 🔍 SEARCH │
│ [Search documentation...] [Semantic Search 🧠] │
│ │
│ 📚 NAVIGATION │
│ • Getting Started │
│ • Development Workflow │
│ • Work Management │
│ • Technical Reference │
│ • Decisions & History │
│ • Executive Dashboard │
└────────────────────────────────────────────────────┘
1.2 Statistics Module
Data Sources: - Static (generated at build time): - Documentation coverage (% of code with docs) - Database tables count - API endpoints count - Migration count - Dynamic (GitHub API for private repo): - Last updated timestamp - Broken links count - Stale docs count - Active projects count - Pending TODOs count - Completed items this week
GitHub API Integration:
- Authentication: Personal Access Token (PAT) with repo scope
- Rate Limits: 5,000 requests/hour (authenticated, private repo)
- Endpoints:
- GET /repos/{owner}/{repo}/contents/{path} - Read TODO-BACKLOG.md, ACTIVE-WORK.md
- GET /repos/{owner}/{repo}/commits - Get last update timestamps
- Parse markdown files client-side to count pending items
- Caching: Cache results for 5 minutes (client-side localStorage)
- Fallback: Show "⏳ Loading..." if API slow, show cached data if API fails
Implementation:
// dashboard.js
async function fetchGitHubStats() {
const token = CONFIG.GITHUB_TOKEN; // From config.js (gitignored)
const headers = { 'Authorization': `Bearer ${token}` };
// Fetch TODO-BACKLOG
const todoResp = await fetch(
'https://api.github.com/repos/aflores3/epgoat-internal/contents/Documentation/LLM/01-Work-In-Progress/TODO-BACKLOG.md',
{ headers }
);
const todoContent = atob((await todoResp.json()).content);
const pendingCount = (todoContent.match(/🔴|🟠|🟡/g) || []).length;
return { pendingTodos: pendingCount, ... };
}
1.3 Quick Links Section
Links (with live counts): - TODO-BACKLOG (badge: "3 pending") - Active Projects (badge: "1 in progress") - Recent ADRs (badge: "2 this week") - Quick Start Guide - Database Schema (badge: "53 tables") - API Reference (badge: "v2")
Behavior: Links only (no inline content expansion)
2. Page Granularity Strategy
2.1 Hybrid Approach (Option C)
Three-Level Hierarchy:
- Top Level: Functional area overview pages
/reference/database/index.html- Database overview-
/guides/development/index.html- Development guides overview -
Mid Level: Grouped topic pages
/reference/database/functional-areas/epg-system.html- EPG-related tables-
/reference/database/functional-areas/user-management.html- User tables -
Bottom Level: Individual detail pages
/reference/database/tables/events.html- Events table details/reference/database/tables/epg_data.html- EPG data table details
Example Navigation Path:
Dashboard
→ Technical Reference
→ Database Schema
→ EPG System (functional area)
→ Events Table (individual table)
2.2 Page Splitting Rules
Database Documentation: - Overview page: Schema statistics, recent migrations, functional areas list - Functional area pages (6-8 pages): - EPG System (events, epg_data, schedule, programme) - Team Management (teams, team_aliases, team_discovery) - Channel Management (channels, channel_names, channel_patterns) - User Management (users, subscriptions, preferences) - Provider Management (providers, playlists, matches) - Infrastructure (migrations, audit logs, health checks) - Individual table pages (53 pages): - Table purpose - All columns with types, constraints, descriptions - Relationships (foreign keys in/out) - Indexes - Triggers - Usage examples (code snippets) - Related tables (cross-links)
Other Documentation: - Standards: One page per standard (Python, TypeScript, Git, etc.) - Guides: One page per guide topic - ADRs: One page per decision - Projects: One page per project
2.3 File Structure
Documentation/HTML/
├── index.html (dashboard)
├── reference/
│ ├── database/
│ │ ├── index.html (overview)
│ │ ├── functional-areas/
│ │ │ ├── epg-system.html
│ │ │ ├── team-management.html
│ │ │ └── ...
│ │ └── tables/
│ │ ├── events.html
│ │ ├── teams.html
│ │ └── ...
│ └── api/
│ ├── index.html
│ └── endpoints/
│ ├── events-search.html
│ └── ...
├── guides/
│ ├── index.html
│ ├── getting-started.html
│ ├── database-migrations.html
│ └── ...
├── standards/
│ ├── index.html
│ ├── python.html
│ ├── typescript.html
│ └── ...
└── decisions/
├── index.html
├── adr-001-supabase.html
└── ...
3. Semantic Search Implementation
3.1 Hybrid Approach (Recommended)
Two-Stage Search: 1. Stage 1: Keyword Search (lunr.js) - Fast full-text search - Returns exact matches
- Stage 2: Semantic Similarity (TF-IDF)
- Finds related documents
- Shows "Related Documentation" section
Why This Approach: - ✅ Zero runtime cost (pre-computed at build time) - ✅ Client-side only (no server/API needed) - ✅ Fast (keyword search is instant, similarity pre-computed) - ✅ No dependencies on external services - ✅ Works offline - ✅ Privacy-preserving (no data sent to third parties)
3.2 Implementation Details
Build Time (Python):
# Documentation/LLM/.meta/build/scripts/build_search_index.py
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import json
def build_search_index(all_docs):
"""Build both keyword index and similarity matrix."""
# Stage 1: Keyword index for lunr.js
keyword_index = []
for doc in all_docs:
keyword_index.append({
'id': doc['id'],
'title': doc['title'],
'content': doc['content'],
'url': doc['url']
})
# Stage 2: TF-IDF similarity matrix
vectorizer = TfidfVectorizer(stop_words='english', max_features=500)
tfidf_matrix = vectorizer.fit_transform([d['content'] for d in all_docs])
similarity_matrix = cosine_similarity(tfidf_matrix)
# For each doc, store top 5 most similar docs
similarity_index = {}
for i, doc in enumerate(all_docs):
similar_indices = similarity_matrix[i].argsort()[-6:-1][::-1] # Top 5 (excluding self)
similarity_index[doc['id']] = [
{
'id': all_docs[j]['id'],
'title': all_docs[j]['title'],
'url': all_docs[j]['url'],
'similarity': float(similarity_matrix[i][j])
}
for j in similar_indices
]
return {
'keyword_index': keyword_index,
'similarity_index': similarity_index
}
# Write to search-index.json
with open('Documentation/HTML/assets/search-index.json', 'w') as f:
json.dump(build_search_index(all_docs), f)
Runtime (JavaScript):
// Documentation/HTML/assets/search.js
// Load pre-computed index
const searchData = await fetch('/assets/search-index.json').then(r => r.json());
// Stage 1: Keyword search with lunr.js
const idx = lunr(function () {
this.ref('id');
this.field('title', { boost: 10 });
this.field('content');
searchData.keyword_index.forEach(doc => this.add(doc));
});
function search(query) {
// Keyword search
const results = idx.search(query);
// For each result, add related docs from similarity index
return results.map(result => ({
...result,
related: searchData.similarity_index[result.ref] || []
}));
}
UI:
┌─────────────────────────────────────────┐
│ Search: "database migrations" [🔍] │
├─────────────────────────────────────────┤
│ 📄 Database Migration Guide │
│ Complete guide to writing and... │
│ /guides/database-migrations.html │
│ │
│ 📄 Database Standards │
│ Schema conventions, soft deletes... │
│ /standards/database.html │
│ │
│ 🔗 Related Documentation: │
│ • ADR-001: Supabase Migration │
│ • Database Schema Reference │
│ • Migration Template │
└─────────────────────────────────────────┘
3.3 Search Index Generation
When to Regenerate:
- Every build (part of build.py)
- ~2 seconds for typical documentation size
- Output: Documentation/HTML/assets/search-index.json (~200KB compressed)
What's Indexed: - All markdown source files - Titles, headings, body content - Code examples (with lower weight) - Table names, API endpoints (with higher weight)
4. Cross-Reference System
4.1 Option C: Hover Tooltips + Links (Recommended)
Behavior: - Detect references to other documentation (tables, APIs, guides) - Auto-generate links - Show tooltip preview on hover - Click to navigate to full page
Example:
<!-- In EPG System functional area page -->
<p>The <a href="/reference/database/tables/events.html"
class="doc-ref"
data-preview="events table stores sports events from TheSportsDB and ESPN APIs">
events table
</a> stores all scheduled sports events.</p>
Hover Behavior:
┌─────────────────────────────────────────┐
│ The [events table] stores all... │
│ ↓ (hover) │
│ ╔═══════════════════════════════╗ │
│ ║ events ║ │
│ ║ Stores sports events from ║ │
│ ║ TheSportsDB and ESPN APIs ║ │
│ ║ ║ │
│ ║ Click to view full details → ║ │
│ ╚═══════════════════════════════╝ │
└─────────────────────────────────────────┘
4.2 Auto-Detection Rules
Database Tables:
- Pattern: events table, epg_data, events (backticks)
- Links to: /reference/database/tables/{table_name}.html
- Preview: First sentence of table purpose
API Endpoints:
- Pattern: /api/v2/events/search, events search endpoint
- Links to: /reference/api/endpoints/{endpoint_name}.html
- Preview: Endpoint description + method
Guides:
- Pattern: Database Migration Guide, [Database Migration Guide]
- Links to: /guides/database-migrations.html
- Preview: First paragraph
ADRs:
- Pattern: ADR-001, Supabase Migration decision
- Links to: /decisions/adr-001-supabase.html
- Preview: Decision summary
Standards:
- Pattern: Python Standards, TypeScript coding standards
- Links to: /standards/{standard_name}.html
- Preview: Key rules summary
4.3 Implementation
Build Time (Python):
# Documentation/LLM/.meta/build/scripts/add_cross_references.py
import re
from bs4 import BeautifulSoup
REFERENCE_PATTERNS = {
'table': (
r'\b(\w+)\s+table\b',
lambda m: f'/reference/database/tables/{m.group(1)}.html'
),
'guide': (
r'\[(.*?)\]\(((?!http).*?\.md)\)', # Internal markdown links
lambda m: f'/guides/{m.group(2).replace(".md", ".html")}'
),
# ... more patterns
}
def add_cross_references(html_content, doc_id):
"""Add data-preview attributes and links to references."""
soup = BeautifulSoup(html_content, 'html.parser')
for pattern_type, (pattern, url_func) in REFERENCE_PATTERNS.items():
for match in re.finditer(pattern, soup.get_text()):
# Find the text node and wrap in <a> with preview
ref_text = match.group(1)
preview = get_preview(ref_text, pattern_type)
# Insert link with preview data
link = soup.new_tag('a',
href=url_func(match),
**{'class': 'doc-ref', 'data-preview': preview}
)
# ... wrap text in link
return str(soup)
Runtime (JavaScript):
// Documentation/HTML/assets/cross-references.js
document.querySelectorAll('.doc-ref').forEach(link => {
link.addEventListener('mouseenter', (e) => {
const preview = e.target.getAttribute('data-preview');
showTooltip(e.target, preview);
});
link.addEventListener('mouseleave', (e) => {
hideTooltip();
});
});
function showTooltip(element, content) {
const tooltip = document.createElement('div');
tooltip.className = 'doc-preview-tooltip';
tooltip.textContent = content;
const rect = element.getBoundingClientRect();
tooltip.style.top = rect.bottom + 5 + 'px';
tooltip.style.left = rect.left + 'px';
document.body.appendChild(tooltip);
}
5. Task-Oriented Navigation
5.1 Navigation Structure
Top-Level Categories (aligned with developer workflows):
- Getting Started (
/getting-started/) - Quick Start Guide
- Installation & Setup
- First EPG Generation
-
Understanding the Codebase
-
Development (
/development/) - Development Workflow
- Running Tests
- Code Quality Tools
-
Debugging Guide
-
Work Management (
/work/) - TODO-BACKLOG
- Active Projects
- Completed Work Archive
-
Weekly Planning
-
Technical Reference (
/reference/) - Database Schema
- API Reference
- Configuration Files
-
Third-Party APIs (TheSportsDB, ESPN)
-
Standards (
/standards/) - Core Principles
- Python Standards
- TypeScript Standards
- Git Workflow
-
Database Standards
-
Decisions & History (
/decisions/) - Architecture Decision Records (ADRs)
- Migration History
- Project Postmortems
-
Lessons Learned
-
Executive Dashboard (
/executive/) - CEO Inbox
- CTO Updates
- Quarterly Objectives
- Decisions Pending
5.2 Navigation Component
Sidebar (persistent):
┌─────────────────────────┐
│ EPGOAT Documentation │
├─────────────────────────┤
│ 🏠 Dashboard │
│ │
│ 🚀 Getting Started │
│ • Quick Start │
│ • Installation │
│ • First EPG │
│ │
│ 💻 Development │
│ • Workflow │
│ • Testing │
│ • Debugging │
│ │
│ 📋 Work Management │
│ • TODO-BACKLOG [3] │
│ • Active Projects [1] │
│ • Archive │
│ │
│ 📚 Technical Reference │
│ • Database │
│ • API │
│ • Config Files │
│ │
│ 📐 Standards │
│ • Core Principles │
│ • Python │
│ • TypeScript │
│ • Git │
│ │
│ 🏛️ Decisions & History │
│ • ADRs │
│ • Migrations │
│ • Postmortems │
│ │
│ 👔 Executive │
│ • CEO Inbox │
│ • CTO Updates │
│ • Objectives │
└─────────────────────────┘
Breadcrumbs (top of page):
Dashboard → Technical Reference → Database Schema → EPG System → Events Table
Page-Level Navigation (bottom of page):
← Previous: EPG Data Table | Next: Schedule Table →
Related Documentation:
• TheSportsDB API Integration Guide
• ADR-011: Dual API Source System
• Team Management Functional Area
6. Documentation Agent System
6.1 Purpose
Problem: Documentation gets stale because: - Code changes aren't reflected in docs - Decisions are made but ADRs not updated - TODO-BACKLOG shows completed work as pending - New topics aren't documented
Solution: Event-driven Documentation Agent that: - Detects changes requiring documentation updates - Suggests specific documentation changes - Auto-updates simple cases (timestamps, stats) - Keeps TODO-BACKLOG synchronized - Runs BEFORE commits (not after)
6.2 Trigger System
When Agent Activates:
- Code Changes (via git hooks):
- Any
.pyfile inbackend/epgoat/domain/models.py→ Update Database Schema docs - Any
.sqlmigration file → Update Database Schema docs (already covered by existing hook) - Any
services/*.pyfile → Update API Reference docs -
Any
config/*.ymlfile → Update Configuration Reference -
Decision Changes (via file watching):
- New
.mdfile inDocumentation/LLM/06-Decisions/→ Update Decisions index -
Changes to existing ADR → Update related documentation references
-
Documentation Changes (via file watching):
- Any
.mdchange inDocumentation/LLM/→ Rebuild affected HTML -
New topic added → Update navigation and search index
-
Work Status Changes (via scheduled check):
- Changes to
TodoWritestate → Sync with TODO-BACKLOG.md - Living document progress → Update Work Management dashboard
- Commit messages with "feat:", "fix:" → Check if TODO item can be marked complete
How Agent Runs: - Local: Git hooks (pre-commit, post-commit) - Not GitHub Actions (conserve minutes) - LLM-driven: Use maintain-documentation skill + automation scripts - Timing: BEFORE PR merge (not after)
6.3 Agent Modes
Mode 1: Auto-Update (no user intervention) - Update "Last Updated" timestamps - Rebuild search index - Update statistics (table count, API endpoint count) - Regenerate database docs from migrations
Mode 2: Suggest Updates (prompt user) - Code change detected → "Would you like to update {doc_name}?" - New decision made → "Should I create ADR-XXX for {decision}?" - TODO marked complete in commit → "Mark TODO as complete in BACKLOG?"
Mode 3: Validate (block commit if failed) - Broken internal links → Block commit - Token budget exceeded (Layer 1 >50K) → Block commit - Build failure → Block commit
6.4 TODO-BACKLOG Sync
Problem: Work completed but not marked in TODO-BACKLOG.md
Root Cause Analysis (to be investigated): - Manual updates forgotten - No automatic sync between TodoWrite and TODO-BACKLOG - Completion happens in different session/context
Solution:
# Documentation/scripts/sync_todo_backlog.py
def sync_todo_with_backlog():
"""Sync TodoWrite state with TODO-BACKLOG.md."""
# 1. Parse current TODO-BACKLOG.md
backlog = parse_todo_backlog()
# 2. Check git commits for completion signals
recent_commits = git.log('--since="1 week ago"', '--oneline')
completed_items = extract_completed_items(recent_commits)
# 3. Check living documents for completed tasks
living_docs = find_living_documents()
for doc in living_docs:
completed_items.extend(extract_completed_tasks(doc))
# 4. Update TODO-BACKLOG.md
for item in completed_items:
mark_complete_in_backlog(backlog, item)
# 5. Write back
write_todo_backlog(backlog)
When to Run:
- Pre-commit hook: Check for TODOs mentioned in commit message
- Post-commit hook: Extract completed work from commit and mark in BACKLOG
- Daily: Scan living documents and mark completed tasks
- On demand: make sync-todo-backlog
6.5 Integration with maintain-documentation Skill
Relationship: - maintain-documentation skill: Real-time linter during active coding - Runs when you're actively writing code - Checks if code changes affect documentation - Prompts for immediate updates - Part of your active workflow
- Documentation Agent: Comprehensive periodic maintenance
- Runs on git hooks (pre-commit, post-commit)
- Scans entire codebase for stale docs
- Auto-updates simple cases
- Catches things missed during active coding
Both are needed: Skill is proactive during work, Agent is comprehensive sweep
6.6 Implementation Phases
Phase 1: Git hooks for automatic triggers
# .git/hooks/pre-commit-documentation-agent
# Runs before commit, blocks if documentation invalid
python3 Documentation/scripts/documentation_agent.py --mode validate
if [ $? -ne 0 ]; then
echo "❌ Documentation validation failed"
echo "Run: make update-docs"
exit 1
fi
Phase 2: Event detection and suggestion
# Documentation/scripts/documentation_agent.py
class DocumentationAgent:
def detect_changes(self):
"""Detect what changed and what docs need updating."""
changed_files = git.diff('--cached', '--name-only')
suggestions = []
for file in changed_files:
if 'backend/epgoat/domain/models.py' in file:
suggestions.append({
'file': file,
'docs_affected': ['Database Schema'],
'action': 'regenerate',
'auto': True # Can auto-update
})
elif 'services/' in file:
suggestions.append({
'file': file,
'docs_affected': ['API Reference'],
'action': 'manual_update',
'auto': False # Needs human review
})
return suggestions
Phase 3: Auto-update execution
def execute_auto_updates(suggestions):
"""Execute automatic documentation updates."""
for suggestion in suggestions:
if suggestion['auto']:
if 'Database Schema' in suggestion['docs_affected']:
run_command('make regenerate-db-docs')
elif 'search index' in suggestion['docs_affected']:
run_command('python3 Documentation/LLM/.meta/build/scripts/build_search_index.py')
7. Build Process: LLM-Enriched Documentation Generation
7.1 The "Teacher Model" Approach
Key Insight: MD and HTML serve different audiences with different needs
Markdown (Documentation/LLM/): - Audience: AI assistants (Claude Code, etc.) - Style: Concise, token-efficient, reference format - Purpose: Quick lookups, programmatic access, minimal context - Example: "events table: stores sports events from APIs. 30 columns."
HTML (Documentation/HTML/): - Audience: Freshly graduated CS students (entry-level developers) - Style: Verbose, educational, explanatory - Purpose: Learning, understanding, onboarding - Example: Full explanation with context, examples, diagrams, use cases
The Build Process as "Teaching": The LLM acts as a teacher during the build process: - Takes concise MD notes (the "outline") - Expands them into rich educational content (the "lesson") - Adds context, examples, diagrams, analogies - Explains "why" not just "what" - Anticipates junior developer questions
7.2 Enrichment Process
What the LLM adds during build:
- Expanded Explanations
- MD: "soft delete via record_status field"
-
HTML: "We use soft deletes instead of hard deletes (DELETE statements) to preserve historical data. The
record_statusfield can be 'active', 'archived', or 'deleted'. This approach allows us to recover accidentally deleted data and maintain audit trails. For example, if a user deletes an event, we setrecord_status='deleted'instead of removing the row entirely." -
Real Code Examples
- MD: "Usage: api_events.py, services/event_matcher.py"
-
HTML: Complete code snippets showing actual usage with line numbers and explanations:
python # backend/epgoat/services/event_matcher.py:45-52 def find_upcoming_events(): """Find all active events in the next 7 days.""" return db.query(Event).filter( Event.record_status == 'active', # Only active events Event.start_time >= datetime.now(), Event.start_time <= datetime.now() + timedelta(days=7) ).all() -
Visual Aids
- MD: "Foreign key: team_id → teams.id"
-
HTML: ER diagram showing relationships, arrows, cardinality
-
Context and Rationale
- MD: "Dual API sources: TheSportsDB, ESPN"
-
HTML: "Why we use two APIs: TheSportsDB provides comprehensive historical data but sometimes lags on live scores. ESPN offers real-time updates but limited historical coverage. By combining both, we get the best of both worlds. See ADR-011 for the full decision rationale."
-
Common Pitfalls / Gotchas
- MD: (none)
-
HTML: "⚠️ Common Mistake: Don't query events without checking record_status. Always filter by
record_status='active'unless you specifically need archived/deleted records. See Database Standards for soft delete patterns." -
Related Documentation Links
- MD: (minimal)
- HTML: Extensive cross-references with context: "📚 Learn more about team discovery in the Team Management Guide. See how this table is used in the Event Matching Workflow. Check ADR-017 for the team alias consolidation decision."
7.3 New Workflow (LLM-Enriched Build)
Workflow:
1. LLM writes: Documentation/LLM/05-Reference/Database/Schema.md (CONCISE)
2. Auto-trigger: Git pre-commit hook detects .md change
3. LLM enrichment: Build system sends MD to LLM with enrichment prompt
4. LLM expands: Generates verbose, educational HTML with examples/diagrams
5. Stage HTML: Hook stages generated HTML files
6. Commit once: Single commit includes both MD source and HTML output
Implementation:
# Documentation/LLM/.meta/build/scripts/build_with_enrichment.py
from anthropic import Anthropic
def enrich_documentation(md_content: str, doc_type: str) -> str:
"""Use LLM to expand concise MD into verbose educational HTML."""
prompt = f"""You are a senior software engineer teaching a freshly graduated CS student.
Take this concise technical documentation and expand it into rich, educational content:
{md_content}
Your expanded version should:
1. Explain concepts thoroughly (assume junior developer knowledge level)
2. Add real code examples from the codebase with explanations
3. Include visual aids where helpful (Mermaid diagrams, ASCII art)
4. Provide context and rationale (the "why" not just "what")
5. Highlight common pitfalls and gotchas
6. Add cross-references to related documentation
7. Use analogies when explaining complex concepts
8. Anticipate questions a junior developer would ask
Output format: HTML (semantic tags, accessible, clean structure)
Target length: 3-5x longer than the input (be verbose and thorough)
"""
client = Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=8000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def build_documentation():
"""Build all documentation with LLM enrichment."""
for md_file in find_all_md_files('Documentation/LLM/'):
# Read concise MD
md_content = read_file(md_file)
# Enrich with LLM
enriched_html = enrich_documentation(md_content, doc_type='reference')
# Add styling, navigation, cross-references
final_html = apply_template(enriched_html)
# Write to HTML output
html_path = convert_path_to_html(md_file)
write_file(html_path, final_html)
Git Hook:
# .git/hooks/pre-commit-build-docs
#!/bin/bash
MD_CHANGED=$(git diff --cached --name-only | grep "Documentation/LLM/.*\.md$" || true)
if [ -n "$MD_CHANGED" ]; then
echo "📝 Markdown documentation changed, enriching with LLM..."
# Run LLM-enriched build system
python3 Documentation/LLM/.meta/build/scripts/build_with_enrichment.py
if [ $? -eq 0 ]; then
git add Documentation/HTML/
echo "✅ HTML documentation enriched and staged"
else
echo "❌ Build failed, commit blocked"
exit 1
fi
fi
7.4 Benefits of LLM Enrichment
For Junior Developers: - ✅ Educational content tailored to their level - ✅ Learn "why" decisions were made, not just "what" exists - ✅ Code examples show real-world usage patterns - ✅ Visual aids help understand complex relationships - ✅ Common pitfalls prevent mistakes
For Senior Developers: - ✅ Write concise MD (faster, less repetitive) - ✅ LLM handles the "teaching" automatically - ✅ Consistent explanation quality - ✅ Scales without manual effort
For Documentation Maintenance: - ✅ Single source of truth (concise MD) - ✅ Version control stays clean (MD diffs readable) - ✅ Enrichment is reproducible (same MD = same HTML) - ✅ Can regenerate all HTML from MD at any time
For the Business: - ✅ Better onboarding (new hires learn faster) - ✅ Reduced training time (docs teach effectively) - ✅ Knowledge preservation (tribal knowledge documented) - ✅ Lower cost than hiring technical writers
7.5 Build Performance Considerations
Cost: - LLM API calls: ~$0.01-0.05 per page (Claude Sonnet) - Full documentation rebuild: ~$2-5 for entire site - Incremental builds: Only changed pages (~$0.01-0.10)
Time: - Enrichment: ~3-5 seconds per page - Full rebuild: ~2-3 minutes for 30-40 pages - Incremental: ~3-10 seconds per changed page
Optimization: - Cache enriched content (only rebuild if MD changed) - Parallel processing (enrich multiple pages simultaneously) - Use Haiku for simple pages, Sonnet for complex ones - Skip enrichment for non-reference docs (ADRs, guides already verbose)
When to Enrich: - Reference documentation (Database, API) → Always enrich - Standards documentation → Enrich - Guides / How-tos → Already verbose, minimal enrichment - ADRs → Already verbose, skip enrichment - Executive docs (CEO-INBOX, etc.) → Skip enrichment
7.6 Example: Before and After
Markdown Source (Schema.md):
## events
Sports events from TheSportsDB and ESPN APIs.
**Columns**:
- id (UUID, PK)
- thesportsdb_id (TEXT)
- espn_id (TEXT)
- sport_name (TEXT)
- league (TEXT)
- start_time (TIMESTAMP)
- record_status (TEXT)
**Relationships**:
- team_home_id → teams.id
- team_away_id → teams.id
**Usage**: api_events.py:45, event_matcher.py:78
Enriched HTML Output (Database-Schema.html, events section):
<section id="events-table">
<h2>🏟️ Events Table</h2>
<div class="overview">
<p>The <code>events</code> table is the heart of EPGOAT's EPG system. It stores scheduled sports events from two primary sources: <strong>TheSportsDB</strong> (comprehensive historical data) and <strong>ESPN</strong> (real-time updates).</p>
<p><strong>Why two APIs?</strong> We use a dual-source strategy (see <a href="/decisions/adr-011-dual-api-sources.html">ADR-011</a>) because no single API provides both comprehensive historical coverage AND real-time accuracy. TheSportsDB excels at breadth, ESPN excels at freshness. By combining them, we get the best of both worlds.</p>
</div>
<div class="visual-aid">
<h3>Relationship Diagram</h3>
<pre class="mermaid">
erDiagram
EVENTS ||--o{ TEAMS : "home team"
EVENTS ||--o{ TEAMS : "away team"
EVENTS }o--|| EPG_DATA : "generates"
</pre>
</div>
<div class="columns-detail">
<h3>Column Details</h3>
<div class="column">
<h4><code>id</code> (UUID, Primary Key)</h4>
<p>Unique identifier for each event. We use UUIDs instead of auto-incrementing integers because events come from multiple external sources and UUIDs prevent ID collisions.</p>
<p><strong>Example</strong>: <code>a3f2d8e7-12c4-4b5a-9d3e-7c8f2a1b4e6c</code></p>
</div>
<div class="column">
<h4><code>thesportsdb_id</code> (TEXT, Nullable)</h4>
<p>External ID from TheSportsDB API. Nullable because some events only exist in ESPN (e.g., very recent matchups).</p>
<p><strong>When to use</strong>: When querying TheSportsDB API for additional event details. Store this to avoid re-fetching known events.</p>
<p><strong>⚠️ Gotcha</strong>: Not all events have a TheSportsDB ID! Always check for NULL before using.</p>
</div>
<div class="column">
<h4><code>espn_id</code> (TEXT, Nullable)</h4>
<p>External ID from ESPN API. Nullable for the same reason as thesportsdb_id.</p>
<p><strong>Format</strong>: Usually numeric string like <code>"401547392"</code></p>
</div>
<div class="column">
<h4><code>sport_name</code> (TEXT, Required)</h4>
<p>The sport type (e.g., "NFL", "NBA", "Soccer"). Normalized to match our sport_emojis.yml configuration.</p>
<p><strong>Why normalized?</strong> TheSportsDB might say "American Football" while ESPN says "NFL". We normalize to a canonical form for consistency. See <code>backend/epgoat/config/sport_emojis.yml</code> for the full mapping.</p>
</div>
<div class="column">
<h4><code>record_status</code> (TEXT, Required)</h4>
<p>Soft delete status: <code>'active'</code>, <code>'archived'</code>, or <code>'deleted'</code>.</p>
<p><strong>⚠️ Critical</strong>: NEVER use hard deletes (DELETE statements) on this table! Always use soft deletes by setting <code>record_status='deleted'</code>. This preserves historical data for analytics and allows recovery from accidental deletions.</p>
<p><strong>Querying pattern</strong>: Always filter by <code>record_status='active'</code> unless you specifically need archived/deleted records.</p>
</div>
</div>
<div class="code-examples">
<h3>Code Examples</h3>
<div class="example">
<h4>Finding Upcoming Events</h4>
<p>Here's how we query for events in the next 7 days (from <code>backend/epgoat/services/event_matcher.py:78-85</code>):</p>
<pre><code class="language-python">
def find_upcoming_events(sport: str = None) -> list[Event]:
"""Find all active events in the next 7 days.
Args:
sport: Optional filter by sport name (e.g., 'NFL')
Returns:
List of Event objects sorted by start time
"""
query = db.query(Event).filter(
Event.record_status == 'active', # Only active events
Event.start_time >= datetime.now(),
Event.start_time <= datetime.now() + timedelta(days=7)
)
if sport:
query = query.filter(Event.sport_name == sport)
return query.order_by(Event.start_time).all()
</code></pre>
<p><strong>Note</strong>: Notice how we ALWAYS filter by <code>record_status='active'</code>. This is the soft delete pattern.</p>
</div>
<div class="example">
<h4>Creating a New Event</h4>
<p>When ingesting events from APIs (from <code>backend/epgoat/api/api_events.py:45-58</code>):</p>
<pre><code class="language-python">
def create_event_from_api(api_data: dict, source: str) -> Event:
"""Create event from TheSportsDB or ESPN API response.
Args:
api_data: Raw API response dictionary
source: 'thesportsdb' or 'espn'
"""
event = Event(
id=uuid.uuid4(),
sport_name=normalize_sport_name(api_data['sport']),
league=api_data['league'],
start_time=parse_timestamp(api_data['dateEvent']),
record_status='active' # New events are always active
)
# Store external ID based on source
if source == 'thesportsdb':
event.thesportsdb_id = api_data['idEvent']
elif source == 'espn':
event.espn_id = str(api_data['id'])
db.add(event)
db.commit()
return event
</code></pre>
</div>
</div>
<div class="related-docs">
<h3>Related Documentation</h3>
<ul>
<li>📖 <a href="/guides/event-matching.html">Event Matching Workflow</a> - How we match IPTV channels to events</li>
<li>📖 <a href="/reference/api/thesportsdb.html">TheSportsDB API Integration</a> - Fetching event data</li>
<li>📖 <a href="/reference/api/espn.html">ESPN API Integration</a> - Real-time event updates</li>
<li>🏛️ <a href="/decisions/adr-011-dual-api-sources.html">ADR-011: Dual API Source Strategy</a> - Why we use two APIs</li>
<li>📐 <a href="/standards/database.html#soft-deletes">Database Standards: Soft Delete Pattern</a> - record_status usage</li>
</ul>
</div>
<div class="indexes-triggers">
<h3>Indexes</h3>
<p>The following indexes improve query performance:</p>
<ul>
<li><code>idx_events_start_time</code> - Fast lookups by date range</li>
<li><code>idx_events_sport_name</code> - Filter by sport efficiently</li>
<li><code>idx_events_record_status</code> - Soft delete queries</li>
</ul>
<p><strong>Why these indexes?</strong> Our most common query pattern is "find active events for a specific sport in a date range". These indexes cover that exact pattern.</p>
</div>
</section>
Result: - MD: ~150 words, 10 lines → Quick AI reference - HTML: ~1,200 words, detailed teaching → Junior dev can learn the system
This is the "teacher model" in action!
8. Mobile & Responsive Design
8.1 Requirements
Priority: Desktop primary, mobile/tablet nice to have
Breakpoints: - Desktop: ≥1200px (default layout) - Tablet: 768px - 1199px (simplified sidebar) - Mobile: <768px (hamburger menu)
Responsive Behavior: - Dashboard: Single column on mobile, grid on desktop - Navigation: Collapsible sidebar on mobile - Search: Full-width on mobile - Tables: Horizontal scroll on mobile - Code blocks: Horizontal scroll on mobile
8.2 CSS Framework
Recommendation: Tailwind CSS or custom CSS Grid
Why not Bootstrap: Heavy, opinionated, not needed for simple docs site
Implementation:
/* Documentation/HTML/assets/style.css */
/* Mobile first */
.dashboard-stats {
display: grid;
grid-template-columns: 1fr;
gap: 1rem;
}
/* Tablet */
@media (min-width: 768px) {
.dashboard-stats {
grid-template-columns: repeat(2, 1fr);
}
}
/* Desktop */
@media (min-width: 1200px) {
.dashboard-stats {
grid-template-columns: repeat(3, 1fr);
}
.layout {
display: grid;
grid-template-columns: 250px 1fr;
}
}
9. Branding & Style
9.1 Current Decision
CEO preference: "Leave as-is for now"
Rationale: Focus on content organization and functionality first, visual polish later
Current style: Clean, minimal, readable (based on existing build system output)
9.2 Future Considerations
When ready to update branding: - EPGOAT color palette - Logo placement - Custom fonts - Dark mode support - Print stylesheet
10. Implementation Timeline
10.1 Phases
Phase 1: Foundation (Week 1) - Dashboard landing page - Basic statistics (static) - Quick links section - Task-oriented navigation structure
Phase 2: Page Granularity (Week 2) - Implement page splitting logic - Generate functional area pages - Generate individual table pages - Breadcrumb system
Phase 3: Search & Cross-References (Week 2-3) - Build search index generation - Implement lunr.js keyword search - Add TF-IDF similarity computation - Auto-generate cross-reference links - Hover tooltip system
Phase 4: Dynamic Features (Week 3) - GitHub API integration - Live statistics - Cache layer - Error handling
Phase 5: Documentation Agent (Week 4) - Git hooks implementation - Change detection system - Auto-update logic - TODO-BACKLOG sync - Validation system
Phase 6: Polish & Testing (Week 4) - Mobile responsiveness - Cross-browser testing - Link validation - Performance optimization - Documentation of the documentation system
10.2 Milestones
- ✅ Design document approved (Day 1)
- ⏳ Dashboard functional (Day 7)
- ⏳ All pages split and cross-linked (Day 14)
- ⏳ Search working (Day 18)
- ⏳ Documentation agent auto-updating (Day 25)
- ⏳ Production ready (Day 30)
11. Success Metrics
11.1 Quantitative
- Page load time: <1 second for any page
- Search results: <100ms for keyword search
- Zero broken links: 0 internal links broken
- Documentation coverage: >90% of code documented
- TODO-BACKLOG accuracy: 100% of completed work marked
- Auto-update success rate: >95% of documentation updates automatic
11.2 Qualitative
- Findability: Can find any topic in <3 clicks from dashboard
- Clarity: New developers can start from Quick Start Guide
- Maintenance: Documentation stays current without manual intervention
- Confidence: CEO trusts documentation is accurate and complete
12. Open Questions for CEO
Before starting implementation:
- GitHub API Token: Who will provide the Personal Access Token for private repo access?
- Deployment: Where will the HTML documentation be hosted? (Cloudflare Pages, GitHub Pages, other?)
- Priority: Which phase should be implemented first if timeline is tight?
- Build process: Does auto-build on commit solve your concern about the MD → HTML process?
- TODO-BACKLOG sync: Should we investigate why it's out of sync before implementing the agent?
13. References
- Existing Build System:
Documentation/LLM/.meta/build/scripts/build.py - Database Doc Generator:
Documentation/scripts/generate_database_docs.py - Living Document Template:
.claude/skills/manage-living-documents/references/living-document-template.md - Documentation Standards:
Documentation/LLM/02-Standards/Documentation-Standards.md - lunr.js: https://lunrjs.com/ (MIT license, 8KB gzipped)
- GitHub API: https://docs.github.com/en/rest (5,000 requests/hour for private repos)
14. Next Steps
- Get CEO approval on this design document
- Answer open questions (GitHub token, deployment, priorities)
- Investigate TODO-BACKLOG sync issues (why is completed work not marked?)
- Start Phase 1 (Dashboard implementation)
- Iterate based on feedback
Document Status: [in-progress] - Awaiting CEO review and approval Created: 2025-11-12 Last Updated: 2025-11-12