Engineering Standards & Best Practices (Educational Version)
Note: This is the educational, human-readable version with examples and detailed explanations. For the AI-optimized version, see 1-CONTEXT/_STANDARDS.md.
Engineering Standards (AI-Optimized)
Purpose: Ultra-compressed engineering rules for Claude Code Token Budget: ~3K tokens (part of 50K Layer 1 budget) Status: MANDATORY - ALL code must follow Last Updated: 2025-11-10
Full Details: Documentation/02-Standards/ (12 files with examples)
Core Principles (10 Rules - MANDATORY)
1. SOLID Principles
- Single Responsibility: One class = one reason to change
- Open/Closed: Open for extension, closed for modification (use abstractions/protocols)
- Liskov Substitution: Subtypes must be substitutable for base types
- Interface Segregation: Don't force unused methods on clients
- Dependency Inversion: Depend on abstractions, not concretions
📊 SOLID Principles Overview
graph TD
SOLID[SOLID Principles]
SOLID --> S[S - Single Responsibility]
SOLID --> O[O - Open/Closed]
SOLID --> L[L - Liskov Substitution]
SOLID --> I[I - Interface Segregation]
SOLID --> D[D - Dependency Inversion]
S --> S1[One class = one reason to change]
S --> S2[Separate concerns into focused classes]
O --> O1[Open for extension]
O --> O2[Closed for modification]
O --> O3[Use abstractions/protocols]
L --> L1[Subtypes substitutable for base types]
L --> L2[Maintain parent contracts]
I --> I1[Small, focused interfaces]
I --> I2[Clients depend only on what they use]
D --> D1[Depend on abstractions]
D --> D2[Not on concrete implementations]
style SOLID fill:#FFD700
style S fill:#90EE90
style O fill:#87CEEB
style L fill:#FFB6C1
style I fill:#DDA0DD
style D fill:#F0E68C
*SOLID Principles are the foundation of clean architecture in EPGOAT. Every class, function, and module should follow these principles.
Real Impact: EPGOAT's enrichment pipeline is a perfect example: - S: Each handler has one responsibility (cache, API, regex, etc.) - O: Add new handlers without modifying pipeline - L: All handlers implement same protocol - I: Handlers implement only EnrichmentHandler (focused interface) - D: Pipeline depends on EnrichmentHandler protocol, not concrete classes *
2. DRY (Don't Repeat Yourself)
- Extract common logic after 2nd duplication
- Single source of truth for all knowledge
📖 DRY Principle - Don't Repeat Yourself
The DRY (Don't Repeat Yourself) principle states that every piece of knowledge should have a single, authoritative representation in the system.
Why DRY Matters:
Duplication creates three major problems:
- Maintenance Burden: Change logic in one place, must remember to change all copies
- Bug Multiplication: Fix bug in one place, bug remains in other copies
- Inconsistency Risk: Logic diverges over time (one copy updated, others forgotten)
When to Apply DRY:
- After 2nd duplication: First time = write it. Second time = copy it. Third time = extract it.
- Knowledge duplication: Same business logic in multiple places
- Data duplication: Same constant/config in multiple files
- Algorithm duplication: Same processing logic repeated
When NOT to Apply DRY:
- Coincidental similarity: Code looks similar but represents different concepts
- Different change reasons: Logic changes for different business reasons
- Premature abstraction: Extracting before pattern is clear
Real EPGOAT Examples:
- BaseRepository (Good DRY):
- All repositories need soft delete logic
- Extracted to BaseRepository base class
- Single source of truth for CRUD operations
-
Changes to soft delete logic update all repositories
-
Sport Emojis (Good DRY):
- Sport-to-emoji mapping used in multiple places
- Centralized in
config/sport_emojis.yml - Single file to update for new sports
-
Loaded once, used everywhere
-
Pattern Matching (Good DRY):
- Channel regex patterns needed by multiple handlers
- Centralized in provider configs
- Single source of truth for patterns
- Easy to test and modify
Anti-Pattern Example:
# BAD - Duplicated validation logic
def create_event(data):
if not data.get("name"):
raise ValueError("Name required")
if not data.get("date"):
raise ValueError("Date required")
if not data.get("league"):
raise ValueError("League required")
return Event(**data)
def update_event(id, data):
if not data.get("name"):
raise ValueError("Name required")
if not data.get("date"):
raise ValueError("Date required")
if not data.get("league"):
raise ValueError("League required")
event = db.get(id)
event.update(**data)
return event
# GOOD - Extract validation
def validate_event_data(data):
required = ["name", "date", "league"]
for field in required:
if not data.get(field):
raise ValueError(f"{field} required")
def create_event(data):
validate_event_data(data)
return Event(**data)
def update_event(id, data):
validate_event_data(data)
event = db.get(id)
event.update(**data)
return event
Key Takeaway: DRY reduces maintenance burden and prevents bugs by ensuring each piece of knowledge exists in exactly one place.
3. KISS (Keep It Simple, Stupid)
- Simplest solution that works
- Avoid over-engineering
4. YAGNI (You Aren't Gonna Need It)
- Don't add functionality until needed
- Resist speculative features
5. Composition Over Inheritance
- Prefer has-a over is-a
- Use dependency injection
6. Fail Fast
- Validate inputs immediately
- Raise exceptions early
- No silent failures
📖 Fail Fast Principle - Validate Early
The Fail Fast principle means detecting and reporting errors as early as possible, preferably at the point where invalid data enters the system.
Why Fail Fast?
- Easier Debugging: Error reported at source, not after propagating through layers
- Clearer Error Messages: Context is available when error detected
- Prevent Corruption: Invalid data stopped before affecting database/state
- Better User Experience: Immediate feedback vs mysterious failures later
Where to Apply Fail Fast:
- API Boundaries: Validate all input immediately
- Function Entry: Check preconditions first
- Configuration Loading: Validate config at startup
- Type Conversions: Verify conversions succeed
- Resource Access: Check resources exist before use
Real EPGOAT Examples:
1. Configuration Validation at Startup:
class Config:
"""Application configuration."""
@classmethod
def validate(cls) -> None:
"""Validate all required config is present (FAIL FAST)."""
required = {
"THESPORTSDB_API_KEY": cls.THESPORTSDB_API_KEY,
"SUPABASE_URL": cls.SUPABASE_URL,
"SUPABASE_KEY": cls.SUPABASE_KEY,
}
missing = [k for k, v in required.items() if not v]
if missing:
raise ValueError(
f"Missing required environment variables: {', '.join(missing)}\n"
f"Copy .env.example to .env and fill in values."
)
# Called at application startup:
Config.validate() # Fails immediately if config is bad
Benefits: - App doesn't start with invalid config - Error message tells user exactly what's missing - Prevents runtime failures after minutes of processing
2. Input Validation at API Boundary:
def search_events(
query: str,
start_date: str,
end_date: str
) -> List[Event]:
"""Search events with date range (FAIL FAST validation)."""
# Validate inputs immediately (FAIL FAST):
if not query or not query.strip():
raise ValueError("Query cannot be empty")
if len(query) < 2:
raise ValueError("Query must be at least 2 characters")
try:
start = datetime.strptime(start_date, "%Y-%m-%d")
except ValueError:
raise ValueError(f"Invalid start_date format: {start_date} (expected YYYY-MM-DD)")
try:
end = datetime.strptime(end_date, "%Y-%m-%d")
except ValueError:
raise ValueError(f"Invalid end_date format: {end_date} (expected YYYY-MM-DD)")
if start > end:
raise ValueError(f"start_date ({start_date}) must be before end_date ({end_date})")
# Now safe to proceed:
return db.search(query, start, end)
Benefits: - Invalid input caught immediately - Clear error messages for users - Database never sees invalid data - No partial state corruption
3. Precondition Checking:
def generate_xmltv(matches: List[MatchedEvent]) -> str:
"""Generate XMLTV from matched events (FAIL FAST)."""
# Check preconditions first:
if not matches:
raise ValueError("Cannot generate XMLTV with empty matches list")
for match in matches:
if not match.event_id:
raise ValueError(f"Match missing event_id: {match}")
if not match.start_time:
raise ValueError(f"Match missing start_time: {match}")
# Proceed with generation:
return _build_xmltv(matches)
Anti-Pattern (Fail Late):
def process_events(events):
"""Process events (BAD - fails late)."""
results = []
for event in events:
# Process without validation
result = expensive_api_call(event)
result = expensive_transformation(result)
result = expensive_database_write(result)
# FAIL LATE - after expensive operations!
if not result.get("name"):
print(f"Warning: Event missing name: {event}")
continue
results.append(result)
return results
Problems: - Expensive operations wasted on invalid data - Error discovered after partial processing - Silent failure (just prints warning) - Difficult to debug (where did invalid data come from?)
Key Takeaway: Validate inputs at system boundaries and function entry points. Fail immediately with clear error messages. Never process invalid data.
7. Explicit Over Implicit
- Clear, obvious code > clever code
- No magic behavior
8. Separation of Concerns
- Domain logic separate from infrastructure
- Business logic separate from presentation
9. Immutability When Possible
- Prefer immutable data structures
- Reduce side effects
10. Performance is a Feature
- Measure before optimizing
- Use caching, indexing, batch operations
Full Details: Documentation/02-Standards/00-Core-Principles.md (300+ lines with examples)
Python Standards (MANDATORY for .py files)
Naming (STRICT)
- Functions/variables:
snake_case(e.g.,get_event_by_id,user_name) - Classes:
PascalCase(e.g.,EventRepository,HTTPClient) - Constants:
UPPER_SNAKE_CASE(e.g.,MAX_RETRIES,API_BASE_URL) - Private members: Leading underscore (
_cache,_validate)
Type Hints (100% REQUIRED)
- ALL functions must have complete type hints
- Use
Optional[T]for nullable returns - Use
List[T],Dict[K, V],Set[T]for collections - Explicit
-> Nonefor void functions - Tool:
mypyfor validation
# Required format:
def get_events(
start_date: str,
end_date: str,
leagues: List[str]
) -> List[Event]:
"""Docstring here."""
pass
🎯 Why 100% Type Hints Required?
Type hints are mandatory in EPGOAT for four critical reasons:
1. Catch Bugs Before Runtime
# Without type hints - bug ships to production:
def get_event_by_id(id):
return db.query("SELECT * FROM events WHERE id = ?", id)
event = get_event_by_id("123") # String instead of int - BUG!
# No warning, fails at runtime in production
# With type hints - mypy catches it:
def get_event_by_id(id: int) -> Event:
return db.query("SELECT * FROM events WHERE id = ?", id)
event = get_event_by_id("123") # mypy error: Expected int, got str
# Bug caught in development!
2. IDE Autocomplete and Refactoring - IDE knows types → better autocomplete - Refactoring is safer (IDE tracks types) - "Find usages" works correctly - Inline documentation (hover for types)
3. Self-Documenting Code
# What does this return?
def search_events(query, filters):
pass
# Clear without reading implementation:
def search_events(
query: str,
filters: Optional[Dict[str, Any]] = None
) -> List[Event]:
pass
4. Safer Refactoring - Change return type → mypy finds all affected code - Rename parameter → IDE updates all callers - Remove parameter → mypy reports broken calls
Real Cost Savings: - Bugs caught in CI, not production - Refactoring takes minutes, not hours - Onboarding faster (types explain code)
Enforcement: make type-check runs in CI (blocks merge if fails)
⚠️ Missing Type Hints - Caught by mypy
Problem: Functions without type hints are harder to use, test, and maintain. IDEs can't provide autocomplete. mypy can't catch type errors.
Solution: ```python
❌ BAD - mypy error: Missing type hints
def get_events(date, leagues=None): if leagues is None: leagues = [] return query_database(date, leagues)
mypy output:
error: Function is missing a type annotation
error: Function is missing a return type annotation
✅ GOOD - Complete type hints
from typing import Optional, List
def get_events( date: str, leagues: Optional[List[str]] = None ) -> List[Event]: """Fetch events for date and leagues.""" if leagues is None: leagues = [] return query_database(date, leagues)
mypy output: Success: no issues found
Run type check:
make type-check # From project root
or
mypy backend/epgoat/ # Direct mypy call
**Common Type Hint Issues**:
1. **Missing return type**:
```python
def process(): # Error: missing return annotation
return result
def process() -> ProcessResult: # Fixed
return result
```
2. **Missing parameter types**:
```python
def search(query): # Error: parameter annotation
pass
def search(query: str) -> List[Result]: # Fixed
pass
```
3. **Using old-style types**:
```python
def get_items() -> list: # Warning: use List[T]
pass
def get_items() -> List[Item]: # Fixed
pass
```
### Docstrings (MANDATORY for public APIs)
- Use Google style
- Required for: public functions, classes, modules
- Format:
```python
"""
Brief one-line summary.
Longer description if needed.
Args:
param1: Description
param2: Description
Returns:
Description of return value
Raises:
ExceptionType: When and why
"""
```
<!-- EXPAND:example:docstring_format (not found) -->
### Code Quality Tools (CI enforced)
- **Formatting**: Black (line length: 100)
- **Import sorting**: isort
- **Linting**: Ruff (replaces flake8, pylint)
- **Type checking**: mypy (strict mode)
- Run: `make lint`, `make format`, `make type-check`
### 📊 Python Quality Tools Pipeline
```mermaid
graph LR
Code[Source Code] --> Black[Black Formatter]
Black --> Isort[isort Import Sorter]
Isort --> Ruff[Ruff Linter]
Ruff --> Mypy[mypy Type Checker]
Mypy --> Pytest[pytest Test Runner]
Pytest --> Coverage[Coverage Report]
Coverage --> CI{CI Passes?}
CI -->|Yes| Merge[Merge to Main]
CI -->|No| Fix[Fix Issues]
Fix --> Code
style Code fill:#E8F5E9
style Black fill:#FFB6C1
style Isort fill:#FFB6C1
style Ruff fill:#FFD700
style Mypy fill:#FFD700
style Pytest fill:#87CEEB
style Coverage fill:#87CEEB
style Merge fill:#90EE90
style Fix fill:#FF6B6B
*EPGOAT Quality Pipeline: All code passes through 6 automated checks before merge.
Commands:
- make format: Run Black + isort (auto-fix)
- make lint: Run Ruff (detect issues)
- make type-check: Run mypy (type validation)
- make test: Run pytest with coverage
- make ci: Run all checks (CI simulation)
CI Enforcement: GitHub Actions runs make ci on every PR. Must pass to merge.
*
Key Rules
- No
except:without exception type (useexcept Exception:minimum) - No mutable default arguments (use
None, thenif param is None: param = []) - Functions must be <50 lines (extract helper functions)
- Max 4 levels of nesting (extract to functions)
- Use f-strings for formatting (not
%or.format()) - Use
pathlib.Pathfor file paths (not string manipulation)
⚠️ Bare Except Clauses - Catches Too Much
Problem: Using except: without specifying exception type catches EVERYTHING,
including KeyboardInterrupt, SystemExit, and bugs in your code.
Solution: ```python
❌ DANGEROUS
try: process_events() except: # Catches EVERYTHING (even Ctrl+C!) log_error("Failed")
✅ GOOD - Specific exception
try: process_events() except EventProcessingError as e: log_error(f"Event processing failed: {e}")
✅ ALSO GOOD - Multiple specific exceptions
try: process_events() except (APIError, DatabaseError, ValidationError) as e: log_error(f"Processing failed: {e}") raise # Re-raise to propagate
✅ Last resort - Catch Exception (not ALL exceptions)
try: process_events() except Exception as e: # Still allows KeyboardInterrupt, SystemExit log_error(f"Unexpected error: {e}") raise
**Exception Hierarchy**:
- `BaseException` (top - includes system exceptions)
- `SystemExit` (sys.exit())
- `KeyboardInterrupt` (Ctrl+C)
- `Exception` (all "normal" exceptions)
- `ValueError`, `TypeError`, `RuntimeError`, etc.
**Best Practice**:
1. Catch specific exceptions you expect
2. If unknown, catch `Exception` (not bare `except:`)
3. Always log the exception details
4. Re-raise if you can't handle it
<!-- EXPAND:example:no_mutable_defaults (not found) -->
### ⚠️ Long Functions - Over 50 Lines
**Problem**: Functions longer than 50 lines are hard to understand, test, and maintain.
Usually indicates the function is doing too many things (SRP violation).
**Solution**: ```python
# ❌ BAD - 80+ line function doing everything
def generate_epg(provider: str, date: str) -> str:
# Load config (10 lines)
config = load_yaml(f"config/{provider}.yml")
patterns = config["patterns"]
vod_filters = config["vod_filters"]
# Fetch M3U (10 lines)
response = requests.get(config["m3u_url"])
m3u_content = response.text
# Parse M3U (15 lines)
channels = []
for line in m3u_content.split("\n"):
if line.startswith("#EXTINF"):
# ... parsing logic ...
channels.append(channel)
# Match events (20 lines)
matched = []
for channel in channels:
for pattern in patterns:
if re.match(pattern, channel.name):
# ... matching logic ...
matched.append(match)
# Generate XMLTV (20 lines)
xml = '<?xml version="1.0"?>\n'
for match in matched:
# ... XML generation ...
xml += programme
return xml
# ✅ GOOD - Extract to focused functions
def generate_epg(provider: str, date: str) -> str:
"""Generate EPG for provider on date (orchestrator)."""
config = load_provider_config(provider)
m3u_content = fetch_m3u(config.m3u_url)
channels = parse_m3u(m3u_content)
matched = match_channels_to_events(channels, config, date)
xmltv = generate_xmltv(matched)
return xmltv
def load_provider_config(provider: str) -> ProviderConfig:
"""Load provider configuration from YAML."""
# 5-10 lines
def fetch_m3u(url: str) -> str:
"""Fetch M3U playlist from URL."""
# 5-10 lines
def parse_m3u(content: str) -> List[Channel]:
"""Parse M3U content into Channel objects."""
# 10-15 lines
def match_channels_to_events(
channels: List[Channel],
config: ProviderConfig,
date: str
) -> List[MatchedEvent]:
"""Match channels to events using patterns."""
# 15-20 lines
def generate_xmltv(matches: List[MatchedEvent]) -> str:
"""Generate XMLTV XML from matched events."""
# 15-20 lines
Benefits of Small Functions: - Easy to name (clear purpose) - Easy to test (focused scope) - Easy to understand (single concept) - Easy to reuse (composable) - Easy to debug (isolated logic)
🎯 Why Functions Must Be <50 Lines?
EPGOAT enforces a 50-line limit on functions because long functions are the root cause of multiple quality issues:
1. Testing Becomes Impossible
# 100-line function doing everything:
def generate_epg(provider, date):
# Load config (10 lines)
# Fetch M3U (10 lines)
# Parse M3U (20 lines)
# Match events (30 lines)
# Generate XMLTV (30 lines)
return xmltv
# How do you test M3U parsing alone? You can't!
# How do you test matching without fetching? You can't!
# How do you mock API calls? You can't!
# Result: Either no tests, or tests that are 200+ lines
2. Single Responsibility Principle Violation - Function doing 5 things = 5 reasons to change - Any change risks breaking all 5 things - Can't reuse individual steps
3. Understanding Requires Mental Stack
Reading 100-line function:
Line 1-20: "Ok, loading config..."
Line 21-40: "Wait, now fetching M3U... what were we doing?"
Line 41-60: "Parsing... I forgot what config was loaded"
Line 61-80: "Matching... which M3U parser was that again?"
Line 81-100: "XMLTV... I need to re-read this entire function"
Reading 5x 10-line functions:
generate_epg(): "Oh, it orchestrates 5 steps"
load_config(): "This loads config"
fetch_m3u(): "This fetches M3U"
... each function is instantly understood
4. Bugs Hide in Complexity - 100-line function: 10+ code paths - 10-line function: 1-2 code paths - More paths = more bugs
5. Code Review is Painful - 100-line function diff: 30+ minutes to review - 5x 20-line function diffs: 5 minutes each (25 min total) - Small functions = focused reviews
Real EPGOAT Example:
# Before refactoring - 120 lines:
def process_provider(provider_slug, date):
# 120 lines of mixed concerns
pass
# After refactoring - 5 functions, 15-25 lines each:
def process_provider(provider_slug: str, date: str) -> EPGResult:
"""Process provider EPG generation (orchestrator)."""
config = load_provider_config(provider_slug)
channels = fetch_and_parse_channels(config)
matches = enrich_channels(channels, config, date)
xmltv = generate_xmltv_output(matches)
save_results(matches, provider_slug)
return EPGResult(xmltv=xmltv, matches=len(matches))
# Benefits:
# - Each function tested independently
# - Easy to understand flow
# - Easy to modify one step
# - Easy to reuse steps
# - Easy to parallelize (future optimization)
Why 50 Lines? - Fits on one screen (no scrolling) - ~3-5 logical sections max - Human short-term memory limit (~7 items) - Empirically proven in "Code Complete" (McConnell)
Enforcement: Code review flags functions >50 lines for extraction
Full Details: Documentation/02-Standards/01-Python-Standards.md (400+ lines with examples)
Git Workflow (MANDATORY)
Commit Messages (Conventional Commits)
Format: <type>(<scope>): <description>
Valid Types:
- feat: New feature
- fix: Bug fix
- docs: Documentation only
- refactor: Code refactoring (no behavior change)
- test: Adding/updating tests
- chore: Maintenance (deps, configs)
- style: Formatting (no logic change)
- perf: Performance improvement
- ci: CI/CD changes
- build: Build system changes
Rules:
- Type is MANDATORY
- Scope is optional (e.g., feat(api):, fix(parser):)
- Description is MANDATORY (imperative mood, lowercase, no period)
- Max 72 characters
- Hook enforces these rules (will reject bad commits)
Examples:
# ✅ GOOD
feat(api): add event search endpoint
fix(parser): handle channels with missing tvg-id
docs: update architecture diagram
refactor(services): extract caching logic
# ❌ BAD (will be rejected)
Add search endpoint # Missing type
Update stuff # Vague, missing type
. # Unacceptable
Added new feature # Past tense, missing type
🎯 Why Conventional Commits Enforced?
EPGOAT enforces Conventional Commits via Git hook because consistent commit messages provide massive long-term value:
1. Instant Understanding of Changes
# Without convention - meaningless:
git log --oneline
a3f2d1 Update stuff
b7e4c9 Fix bug
c2d8e5 Changes
d9f3a2 More updates
# With convention - clear intent:
git log --oneline
a3f2d1 feat(api): add event search endpoint
b7e4c9 fix(parser): handle missing tvg-id attributes
c2d8e5 refactor(services): extract caching logic
d9f3a2 docs: update EPG generation guide
Benefit: 10 seconds to understand vs 10 minutes reading diffs
2. Automated Changelog Generation
# Generate release notes automatically:
git log --grep="^feat" --oneline v1.0.0..v2.0.0
# Output ready for release notes:
feat(api): add event search endpoint
feat(enrichment): add LLM fallback handler
feat(providers): add provider onboarding CLI
Benefit: 5-minute changelog vs 2-hour manual curation
3. Semantic Versioning Integration
# Automatically bump version:
- feat: → minor version bump (1.0.0 → 1.1.0)
- fix: → patch version bump (1.0.0 → 1.0.1)
- BREAKING CHANGE: → major bump (1.0.0 → 2.0.0)
Benefit: Correct versioning without human decision
4. Easy to Find Changes
# Find all API changes:
git log --grep="^feat(api)"
# Find all bug fixes in parser:
git log --grep="^fix(parser)"
# Find all breaking changes:
git log --grep="BREAKING CHANGE"
Benefit: Answer "when did we change X?" in 10 seconds
5. PR Review Context
PR title: feat(enrichment): add cross-provider caching
Reviewer immediately knows:
- New feature (not a fix)
- Enrichment system (not API or database)
- Caching-related (knows what to look for)
Benefit: Faster reviews with correct context
Real Time Savings: - Monthly changelog: 2 hours → 5 minutes (saves 1h 55m) - Finding when bug introduced: 30 min → 2 min (saves 28m) - PR context gathering: 10 min → 0 min (saves 10m) - Versioning decisions: 15 min → automated (saves 15m)
Total: ~3 hours/month saved = 36 hours/year per developer
Enforcement: Git hook rejects non-compliant commits immediately
Full Details: Documentation/02-Standards/06-Git-Workflow.md
Testing Standards
Coverage Requirements
- Minimum: 80% code coverage
- Target: 90%+ for critical paths (matching, parsing, XMLTV generation)
- Tool: pytest + pytest-cov
Test Structure (AAA Pattern)
def test_event_matching():
# ARRANGE - Set up test data
channel = Channel(name="NBA 01: Lakers vs Celtics")
# ACT - Execute the operation
result = matcher.match(channel)
# ASSERT - Verify expectations
assert result is not None
assert result.team1 == "Lakers"
🎯 Why AAA Test Pattern?
The Arrange-Act-Assert pattern is mandatory because it makes tests:
1. Easy to Understand
# Without AAA - unclear intent:
def test_match():
cache = EnhancedMatchCache(24)
cache.store_match("id1", "NBA 01", "NBA", "Lakers", 123, "NBA", "Basketball", 0.95)
assert cache.find_match("id1").matched_event_id == 123
cache.store_match("id2", "NFL 01", "NFL", "Patriots", 456, "NFL", "Football", 0.90)
result = cache.find_match("id2")
assert result is not None
# With AAA - crystal clear:
def test_match():
# ARRANGE - Set up cache with stored match
cache = EnhancedMatchCache(expiration_hours=24)
cache.store_match(
tvg_id="nba-lakers",
channel_name="NBA 01: Lakers vs Celtics",
channel_family="NBA",
channel_payload="Lakers vs Celtics",
matched_event_id=12345,
league="NBA",
sport="Basketball",
confidence=0.95
)
# ACT - Look up the cached match
result = cache.find_match(tvg_id="nba-lakers")
# ASSERT - Verify we got the right match
assert result is not None, "Should find cached match"
assert result.matched_event_id == 12345
assert result.sport == "Basketball"
2. Easy to Debug When Failing
# Failure message without AAA:
# "AssertionError on line 247"
# ... which assertion failed? What was being tested?
# Failure message with AAA:
# ARRANGE passed (setup successful)
# ACT passed (function executed)
# ASSERT failed: "Should find cached match"
# → Immediately know: lookup failed, not setup or execution
3. Easy to Maintain - Each section has one responsibility - Add setup → modify ARRANGE - Change behavior → modify ACT - Add verification → modify ASSERT - Sections are independent
4. Follows Given-When-Then BDD - ARRANGE = Given (context) - ACT = When (action) - ASSERT = Then (outcome) - Maps to user stories naturally
Real EPGOAT Impact: - 770 tests use AAA pattern - When tests fail, intent is immediately clear - New engineers understand tests quickly - Test reviews focus on logic, not structure
Test Types
- Unit tests: Test individual functions/classes in isolation
- Integration tests: Test component interactions
- End-to-end tests: Test full EPG generation pipeline
- Location:
backend/epgoat/tests/
📊 Testing Pyramid - EPGOAT Test Strategy
graph TD
E2E[End-to-End Tests<br/>Full EPG Pipeline<br/>~10 tests<br/>Slow, Comprehensive]
Integration[Integration Tests<br/>Multi-Component<br/>~50 tests<br/>Medium Speed]
Unit[Unit Tests<br/>Individual Functions<br/>~700 tests<br/>Fast, Focused]
E2E --> Integration
Integration --> Unit
Unit --> Examples1[Pattern matching<br/>M3U parsing<br/>Time extraction]
Integration --> Examples2[Enrichment pipeline<br/>Cache + API + DB<br/>Provider config loading]
E2E --> Examples3[generate_epg<br/>Full workflow<br/>End-to-end EPG]
style E2E fill:#FF6B6B
style Integration fill:#FFD700
style Unit fill:#90EE90
*Test Pyramid: More unit tests (fast), fewer integration tests (slower), minimal E2E tests (slowest).
EPGOAT Distribution: - Unit: 700+ tests (90%) - Fast (0.1s each) - Integration: 50+ tests (7%) - Medium (1-5s each) - E2E: 10+ tests (3%) - Slow (10-30s each)
Why This Distribution? - Unit tests catch 80% of bugs - Fast feedback (full suite runs in 30s) - Easy to debug (isolated failures) - Integration tests verify components work together - E2E tests ensure user workflows work end-to-end
Run Tests:
- pytest tests/test_patterns.py (unit)
- pytest tests/test_integration.py (integration)
- pytest tests/ (all tests)
*
Mocking
- Mock external APIs (TheSportsDB, ESPN, Claude)
- Use
pytest.fixturefor reusable test data - Use
monkeypatchfor patching - Never hit real APIs in tests
Test Naming
- Pattern:
test_<function_name>_<scenario> - Examples:
test_parse_channel_with_teams,test_match_when_no_events_found
Run Tests: make test, make test-coverage, pytest tests/test_patterns.py -v
Full Details: Documentation/02-Standards/04-Testing-Standards.md
Database Standards (Critical)
Data is Forever (NEVER DELETE)
- Use soft deletes ONLY:
record_statusfield ('active', 'archived', 'deleted') - NEVER use
DELETEstatements in production code - Historical data enables future analytics + API
🎯 Why Soft Deletes Despite Storage Cost?
EPGOAT uses soft deletes despite storage cost because the benefits far exceed the minimal cost:
Cost Analysis:
Current database size: ~100MB
Expected growth: 500MB/year (5 years = 2.5GB)
Storage cost: $0.10/GB/month (Supabase)
Hard delete savings: $0.25/month = $3/year
Soft delete cost: $0.25/month = $3/year
Net storage cost: $3/year 🥱
Benefits (Massive ROI):
1. Self-Improvement (Core Principle #2) - Analyze deleted matches → improve matching - "70% of deleted events were false positives" → fix regex - Learn from mistakes → better patterns - Historical trend analysis
Value: Improved matching accuracy saves engineering time
2. API Monetization (Core Principle #5)
# Paid API endpoint (future revenue):
GET /api/v2/events?season=2024&league=NBA&include_deleted=true
# Premium feature:
"Historical Data Access" tier: $49/month
"All NBA games from 2020-2025" → needs deleted data
Value: Potential revenue stream ($588/year per customer)
3. Data Recovery
- User: "I accidentally deleted 100 events!"
- Solution: UPDATE events SET record_status='active' WHERE id IN (...)
- 5 minute fix vs "sorry, data is gone"
Value: Customer satisfaction + retention
4. Compliance and Audit - "Who deleted what when?" - Regulatory requirements (audit trail) - Debugging production issues
Value: Risk mitigation (regulatory fines are expensive!)
Real Trade-off: - Cost: $3/year in storage - Value: Better matching + revenue + recovery + compliance - ROI: 100:1+ (conservative estimate)
Decision: Soft deletes are a no-brainer for EPGOAT
Schema Changes
- ALL changes via migrations (numbered:
001_initial.sql,002_add_teams.sql) - Never modify existing migrations (create new one)
- Test migrations: up + down
- Location:
backend/epgoat/infrastructure/database/migrations/
Queries
- Use parameterized queries (prevent SQL injection)
- Add indexes for frequently queried columns
- Use
EXPLAINfor slow queries - Batch inserts (use
executemany, not loops)
🎯 Why Parameterized Queries Are Non-Negotiable?
Parameterized queries are the ONLY acceptable way to build SQL in EPGOAT. String formatting/concatenation is a fireable offense.
Why So Strict?
1. SQL Injection Can Destroy Business
# One malicious input can:
input = "x'; DROP TABLE events; DROP TABLE participants; --"
# Unsafe query:
query = f"SELECT * FROM events WHERE name = '{input}'"
# Executes:
# SELECT * FROM events WHERE name = 'x';
# DROP TABLE events;
# DROP TABLE participants;
# --'
# Result: ENTIRE DATABASE DELETED
# Recovery time: Hours to days
# Business impact: CRITICAL
# Customer trust: DESTROYED
Real-World Example: In 2019, a SQL injection attack on a major retailer exposed 56 million credit card numbers. Company paid $150M in fines and lawsuits.
2. Customer Data Protection (Core Principle #1) - SQL injection can expose ALL customer data - Violates GDPR, CCPA, and other regulations - Legal liability: millions in fines - Reputational damage: permanent
3. It's Easy to Get Right
# Wrong way (5 characters shorter):
query = f"SELECT * FROM events WHERE league = '{league}'"
# Right way:
query = "SELECT * FROM events WHERE league = ?"
db.query(query, league)
# Difference: 5 characters
# Risk reduction: 100%
4. No Performance Penalty - Parameterized queries are FASTER (query plan caching) - Driver handles escaping efficiently - Zero performance trade-off
Enforcement in EPGOAT: - Code review rejects any string formatting in SQL - BaseRepository uses parameterized queries only - Field names validated against whitelist - grep "f\"SELECT" backend/epgoat/ → 0 results required
Bottom Line: One SQL injection can end the company. Parameterized queries cost nothing and eliminate the risk. Non-negotiable.
⚠️ N+1 Query Problem - Database Performance
Problem: Loading related data in a loop causes N+1 database queries instead of 1-2. Kills performance with hundreds/thousands of records.
Solution: ```python
❌ BAD - N+1 queries (1 for events, N for participants)
events = db.query("SELECT * FROM events WHERE date = ?", date)
1 query for events
for event in events: # N queries (one per event!) participants = db.query( "SELECT * FROM participants WHERE event_id = ?", event.id ) event.participants = participants
Total: 1 + N queries (N = number of events)
100 events = 101 queries! 🐢
✅ GOOD - Eager loading (2 queries total)
events = db.query("SELECT * FROM events WHERE date = ?", date)
1 query for events
event_ids = [e.id for e in events] participants = db.query( """ SELECT * FROM participants WHERE event_id IN ({}) """.format(','.join('?' * len(event_ids))), *event_ids )
1 query for all participants
Group participants by event_id
participants_by_event = {} for p in participants: participants_by_event.setdefault(p.event_id, []).append(p)
Attach to events
for event in events: event.participants = participants_by_event.get(event.id, [])
Total: 2 queries (regardless of N!) 🚀
✅ EVEN BETTER - Join in database
result = db.query( """ SELECT e.*, p.id AS participant_id, p.name AS participant_name, p.type AS participant_type FROM events e LEFT JOIN event_participants ep ON e.id = ep.event_id LEFT JOIN participants p ON ep.participant_id = p.id WHERE e.date = ? """, date )
1 query total! 🚀🚀🚀
Process results:
events = {} for row in result: if row.id not in events: events[row.id] = Event(id=row.id, name=row.name) if row.participant_id: events[row.id].participants.append( Participant(id=row.participant_id, name=row.participant_name) )
**Detection**:
- Watch for queries in loops
- Enable query logging in development
- Use performance profiler
**EPGOAT Example**:
Event deduplication service loads ALL events for date in one query,
not one query per event.
### ⚠️ Timezone-Naive Datetimes - Dangerous Assumption
**Problem**: Using `datetime.now()` without timezone creates naive datetime that
assumes local timezone. Causes bugs when deployed to different regions.
**Solution**: ```python
from datetime import datetime, timezone
import pytz
# ❌ BAD - Timezone-naive datetime
now = datetime.now() # What timezone is this?
event_time = datetime(2025, 11, 10, 19, 0) # 7 PM... where?
# When comparing:
if now > event_time: # Might be wrong if different timezones!
print("Event has started")
# ✅ GOOD - Timezone-aware datetime
now = datetime.now(timezone.utc) # Explicit UTC
event_time = datetime(2025, 11, 10, 19, 0, tzinfo=timezone.utc)
# Even better: Use pytz for named timezones
eastern = pytz.timezone("America/New_York")
now = datetime.now(eastern) # Current time in Eastern
event_time = eastern.localize(datetime(2025, 11, 10, 19, 0))
# Convert between timezones:
utc_time = event_time.astimezone(timezone.utc)
pacific_time = event_time.astimezone(pytz.timezone("America/Los_Angeles"))
# Real EPGOAT example:
def parse_event_time(time_str: str, tz_name: str) -> datetime:
"""Parse event time with timezone.
Args:
time_str: Time string (e.g., "7:00 PM", "19:00")
tz_name: Timezone name (e.g., "America/Chicago")
Returns:
Timezone-aware datetime in specified timezone
"""
tz = pytz.timezone(tz_name)
naive_time = datetime.strptime(time_str, "%I:%M %p")
# Combine with today's date
naive_dt = datetime.combine(datetime.today(), naive_time.time())
# Localize to timezone (handles DST)
aware_dt = tz.localize(naive_dt)
return aware_dt
EPGOAT Timezone Handling: - M3U channels have times in various timezones ("7:00 PM ET") - XMLTV requires UTC with offset ("20251110230000 +0000") - Users specify generation timezone (--tz flag) - All conversions use pytz for accuracy
Full Details: Documentation/02-Standards/07-Database-Standards.md
Error Handling
Exceptions
- Use specific exception types (not generic
Exception) - Custom exceptions inherit from
EPGOATErrorbase class - Include context in error messages
Logging
- Use Python
loggingmodule - Levels: DEBUG (dev), INFO (operations), WARNING (recoverable), ERROR (failures), CRITICAL (system issues)
- Include: timestamp, level, module, message
- Never log sensitive data (API keys, passwords, PII)
Validation
- Validate at boundaries (API input, file parsing)
- Use Pydantic models for complex validation
- Fail fast (validate early, raise immediately)
Full Details: Documentation/02-Standards/08-Error-Handling.md
Security Standards
Secrets Management
- NEVER commit secrets to Git
- Use environment variables (
.envfile locally, Workers env vars in prod) - Rotate API keys quarterly
- Use different keys for staging + production
Input Validation
- Sanitize all user input
- Prevent SQL injection (parameterized queries)
- Prevent XSS (escape HTML)
- Prevent command injection (avoid
os.system, usesubprocesswith list args)
Authentication
- Use token-based auth (JWT)
- Hash passwords (bcrypt, argon2)
- Implement rate limiting
- Use HTTPS everywhere
Full Details: Documentation/02-Standards/09-Security-Standards.md
Code Review Checklist
Before committing code, verify:
- [ ] Follows naming conventions (snake_case, PascalCase)
- [ ] 100% type hints present
- [ ] Docstrings for public APIs
- [ ] Tests written (AAA pattern)
- [ ] Coverage ≥80%
- [ ] No bare except: clauses
- [ ] No mutable defaults
- [ ] Functions <50 lines
- [ ] Passes make lint
- [ ] Passes make type-check
- [ ] Commit message follows Conventional Commits
- [ ] No secrets in code
Common Violations
Python:
- Missing type hints → Add them (100% required)
- camelCase names → Use snake_case
- No docstrings → Add Google-style docstrings
- Bare except: → Specify exception type
- Mutable defaults → Use None + conditional
- Long functions → Extract helpers (<50 lines)
Git:
- Bad commit message → Use Conventional Commits format
- Missing type → Add feat:, fix:, etc.
- Past tense → Use imperative mood ("add" not "added")
Testing: - No tests → Write unit tests (AAA pattern) - Low coverage → Add tests for uncovered paths - Hitting real APIs → Mock external calls
Enforcement
Automated (CI/CD):
- make lint: Ruff checks
- make format-check: Black formatting
- make type-check: mypy type validation
- make test: pytest with coverage
- Commit hook: Conventional Commits validation
Manual (Code Review): - Architecture patterns (SOLID, DRY) - Security vulnerabilities - Performance issues - Documentation quality
Skill Integration:
- /engineering-standards check: Auto-check code against standards
- /engineering-standards fix: Auto-fix violations
- /engineering-standards refactor: Refactor file to meet standards
For Examples: See Documentation/02-Standards/ (all standards with extensive examples)
For Questions: Use /engineering-standards skill for automated checking