EPG Generation Guide
Status: Active
Last Updated: 2025-11-10
Related Docs: Backend Overview, Command Reference
Code Location: backend/epgoat/
Comprehensive guide to EPG (Electronic Program Guide) generation for IPTV services
Table of Contents
- Overview
- Architecture
- Modules
- Configuration
- Usage
- Adding New Patterns
- Testing
- Development
- Troubleshooting
Overview
The EPG Generator processes M3U playlists and generates XMLTV-formatted Electronic Program Guides for IPTV services. It intelligently matches channel names against sport league patterns, extracts event information, schedules programming blocks, and outputs standards-compliant XMLTV files.
Key Features
- Pattern-Based Matching: 100+ regex patterns for sport leagues and streaming services
- Time Extraction: Parses event times from channel names with timezone conversion
- Smart Scheduling: Fills pre-event, live event, and post-event blocks intelligently
- Schema Validation: Pydantic-based validation for configuration integrity
- Comprehensive Testing: 250+ test cases covering all modules
- Modular Design: Clean separation of concerns across 6 specialized modules
Architecture
The system follows a modular architecture with clear separation of concerns:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β epg_generator.py β
β (Main Orchestration) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β models β β config β β patterns β
β β β β β β
β Data classesβ β YAML load β β Regex match β
β Constants β β Validation β β Classify β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β parsers β β schedulers β β xmltv β
β β β β β β
β M3U parsing β β Block fill β β XML output β
β Time extractβ β Schedule β β Formatting β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
Data Flow
- Parse M3U β Extract channel entries with metadata
- Match Patterns β Identify sport families and extract event info
- Parse Times β Extract and convert event times to target timezone
- Build Schedules β Fill pre-event, live, and post-event blocks
- Validate β Check for overlaps and duration issues
- Generate XMLTV β Output XML document with channels and programmes
Modules
models.py
Core data models and constants.
Location: backend/epgoat/domain/models.py
Classes:
- M3UEntry: Represents a single M3U playlist entry
- ChannelClassification: Classification result with diagnostic info
- EPGConstants: Configuration constants for timing and thresholds
Example:
from backend.epgoat.domain.models import M3UEntry, EPGConstants
entry = M3UEntry(
attrs={"tvg-id": "nba01", "tvg-name": "NBA 01"},
display_name="NBA 01: Lakers vs Celtics",
url="http://example.com/stream"
)
print(f"Event duration: {EPGConstants.DEFAULT_EVENT_DURATION_MIN} minutes")
config.py
Configuration loading and management with schema validation.
Location: backend/epgoat/domain/config.py
Functions:
- load_sport_config(filename, validate=True): Load and validate YAML configs
- get_sport_emoji(family_name): Get emoji for sport family
- get_sport_category(family_name): Get XMLTV category for sport
Example:
from backend.epgoat.domain.config import get_sport_emoji, get_sport_category
emoji = get_sport_emoji("NBA") # Returns 'π'
category = get_sport_category("NBA") # Returns 'Sports / Basketball / NBA'
patterns.py
Pattern matching and channel classification (300+ lines, 100+ patterns).
Location: backend/epgoat/domain/patterns.py
Functions:
- match_prefix_and_shell(name): Match channel name against patterns
- classify_channel(name, family, match_obj): Classify as "generic" or "event"
- validate_patterns(): Validate all patterns compile correctly
Example:
from backend.epgoat.domain.patterns import match_prefix_and_shell, classify_channel
matched, family, match_obj = match_prefix_and_shell("NBA 01: Lakers vs Celtics")
if matched:
classif = classify_channel("NBA 01: Lakers vs Celtics", family, match_obj)
print(f"Classification: {classif.classification}")
print(f"Payload: {classif.payload}")
parsers.py
M3U parsing and time extraction (374 lines).
Location: backend/epgoat/domain/parsers.py
Functions:
- parse_m3u(path): Parse M3U file and return list of entries
- try_parse_time(payload, year, tz, date_context): Extract time from text
- validate_url(url): Validate URL format
- is_vod_url(url): Check if URL is VOD content
Supported Time Formats:
- @ 07:30 PM ET - 12-hour with timezone
- @ 19:30 ET - 24-hour with timezone
- Oct 22 03:00 PM ET - With date
- 2025-10-23 14:30:00 - ISO format
- @ 8pm PT - Hour only
Example:
from backend.epgoat.domain.parsers import parse_m3u, try_parse_time
from zoneinfo import ZoneInfo
import datetime as dt
entries = parse_m3u("playlist.m3u")
central = ZoneInfo("America/Chicago")
event_time = try_parse_time("Game @ 07:30 PM ET", 2025, central, dt.date.today())
schedulers.py
Programme scheduling and block filling (372 lines).
Location: backend/epgoat/domain/schedulers.py
Functions:
- add_block(programs, cid, title, start, end): Add programme block
- fill_pre_event(...): Fill blocks before event start
- fill_post_event(...): Fill blocks after event end
- fill_no_programming(...): Fill with "No Event Scheduled"
- validate_schedule(programs, cid): Check for overlaps
Example:
from backend.epgoat.domain.schedulers import add_block, fill_pre_event
import datetime as dt
from zoneinfo import ZoneInfo
programs = {}
central = ZoneInfo("America/Chicago")
event_start = dt.datetime(2025, 10, 22, 19, 0, 0, tzinfo=central)
day_start = dt.datetime(2025, 10, 22, 0, 0, 0, tzinfo=central)
fill_pre_event(programs, "nba-01", day_start, event_start,
"Lakers vs Celtics", dt.date(2025, 10, 22),
dt.date(2025, 10, 22), block_minutes=120)
xmltv.py
XMLTV document generation (115 lines).
Location: backend/epgoat/domain/xmltv.py
Functions:
- build_xmltv(processed, match_data, programs, tz_name, target_date): Generate XMLTV
Example:
from backend.epgoat.domain.xmltv import build_xmltv
xmltv_content = build_xmltv(
processed=entries,
match_data=match_results,
programs=program_dict,
tz_name="America/Chicago",
target_date=dt.date(2025, 10, 22)
)
with open("epg.xml", "w") as f:
f.write(xmltv_content)
schemas.py
Pydantic schemas for configuration validation (170 lines).
Location: backend/epgoat/domain/schemas.py
Classes:
- SportEmojiConfig: Validates emoji mappings
- SportCategoryConfig: Validates category hierarchies
Example:
from backend.epgoat.domain.schemas import validate_sport_emojis
config = {
'nba': 'π',
'nfl': 'π',
'_default': 'π΄'
}
schema = validate_sport_emojis(config) # Validates or raises ValueError
Configuration
Configuration files are located in backend/config/:
sport_emojis.yml
Maps sport family names to emojis:
# Basketball
nba: 'π'
ncaab: 'π'
# American Football
nfl: 'π'
ncaaf: 'π'
# Default
_default: 'π΄'
Requirements:
- Must include _default key
- All values must be valid unicode emojis
- Keys are case-insensitive when looked up
sport_categories.yml
Maps sport families to XMLTV categories:
# Basketball
nba: 'Sports / Basketball / NBA'
ncaab: 'Sports / Basketball / NCAA'
# Generic
_default: 'Sports'
Requirements:
- Must include _default key
- Hierarchical categories must start with "Sports"
- Parts separated by " / " (space-slash-space)
- No empty parts allowed
Usage
Basic Usage
# From repository root
cd backend/epgoat
# Generate EPG for today
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago"
# Generate EPG for specific date
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --date 2025-10-22
# Enable verbose logging
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --verbose
# Specify output path
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --out epg.xml
Refreshing League and Team Reference Data
Daily enrichment relies on local JSON mirrors of TheSportsDB leagues and teams. Seed or refresh these caches with:
# From backend/epgoat
python utilities/seed_thesportsdb.py --refresh --verbose
This command writes data/leagues_db.json and data/teams_db.json, which power league inference and team lookups. It is safe to run repeatedly; unchanged datasets will simply be overwritten.
Tip: The enrichment services automatically trigger a lightweight seeding pass the first time these files are missing, using the default TheSportsDB API key (or
THESPORTSDB_API_KEYif set). Explicit refreshes remain the recommended way to keep the mirrors up to date for daily operations.
Command-Line Options
| Option | Description | Default |
|---|---|---|
--m3u PATH |
M3U playlist file or URL | Required |
--tz TIMEZONE |
Target timezone (IANA format) | America/Chicago |
--date YYYY-MM-DD |
Target date | Today |
--out PATH |
Output XML file path | epg_output.xml |
--verbose |
Enable debug logging | False |
Tracking API Mismatches
The generator persists every enrichment miss to dist/mismatches.db. Each run logs the database path:
Mismatch tracker database: dist/mismatches.db
Inspect unresolved cases with the analysis helper:
# From backend/epgoat
python utilities/analyze_mismatches.py --db dist/mismatches.db
Add --family <family-slug> (and optionally --family-limit 40) to print concrete samples with record IDs, payloads, and parsed teams so you can map fixes back to specific database rows.
Programmatic Usage
from backend.epgoat.domain.parsers import parse_m3u
from backend.epgoat.domain.patterns import match_prefix_and_shell, classify_channel
from backend.epgoat.domain.schedulers import fill_pre_event, add_block
from backend.epgoat.domain.xmltv import build_xmltv
from zoneinfo import ZoneInfo
import datetime as dt
# Parse M3U
entries = parse_m3u("playlist.m3u")
# Process channels
programs = {}
processed = []
match_data = []
for entry in entries:
matched, family, match_obj = match_prefix_and_shell(entry.display_name)
if matched:
processed.append(entry)
match_data.append((family, match_obj))
# Build schedule for this channel
# ... (schedule building logic)
# Generate XMLTV
xmltv = build_xmltv(processed, match_data, programs, "America/Chicago", dt.date.today())
Adding New Patterns
To add support for a new sport league or streaming service:
1. Add Pattern to patterns.py
Edit the ALLOWED_CHANNEL_PATTERNS list in backend/epgoat/domain/patterns.py:
ALLOWED_CHANNEL_PATTERNS = [
# ... existing patterns ...
# Your new pattern
(r'^NEW-LEAGUE\s+\d+\s*:?', 'NEW-LEAGUE'),
]
Pattern Guidelines:
- Use ^ to anchor to start of string
- Use \s+ for flexible whitespace
- Use \d+ to match channel numbers
- Make colon optional with :? for streaming services
- Use re.IGNORECASE flag (automatically applied)
2. Add Emoji Mapping
Edit backend/config/sport_emojis.yml:
# Your new league
new-league: 'β½' # Choose appropriate emoji
3. Add Category Mapping
Edit backend/config/sport_categories.yml:
# Your new league
new-league: 'Sports / Soccer / New League'
4. Test Your Pattern
from backend.epgoat.domain.patterns import match_prefix_and_shell
# Test matching
matched, family, match_obj = match_prefix_and_shell("NEW-LEAGUE 01: Team A vs Team B")
assert matched == True
assert family == "NEW-LEAGUE"
5. Add Unit Tests
Add test cases to backend/epgoat/tests/test_patterns.py:
def test_new_league_patterns(self):
"""Test NEW-LEAGUE patterns."""
test_cases = [
("NEW-LEAGUE 01: Team A vs Team B", True, "NEW-LEAGUE"),
("NEW-LEAGUE 05: Championship", True, "NEW-LEAGUE"),
]
for channel_name, should_match, expected_family in test_cases:
matched, family, _ = match_prefix_and_shell(channel_name)
assert matched == should_match
if should_match:
assert family == expected_family
Testing
The project has comprehensive test coverage with 250+ tests across 6 test files.
Running Tests
# From repository root
cd backend/epgoat
# Run all tests
pytest
# Run specific test file
pytest tests/test_patterns.py -v
# Run with coverage
pytest --cov=. --cov-report=html
# Run integration tests only
pytest tests/test_integration.py -v
Test Files
| File | Tests | Coverage |
|---|---|---|
tests/test_patterns.py |
60+ | Pattern matching and classification |
tests/test_parsers.py |
50+ | M3U parsing and time extraction |
tests/test_schedulers.py |
40+ | Schedule building and validation |
tests/test_config.py |
30+ | Configuration loading |
tests/test_schemas.py |
50+ | Schema validation |
tests/test_integration.py |
20+ | End-to-end workflows |
Test Coverage Areas
- β All 100+ channel patterns
- β All 8 time parsing formats
- β Timezone conversions (ET, CT, PT, etc.)
- β Schedule block filling (pre/live/post-event)
- β XMLTV generation
- β Configuration validation
- β Error handling and edge cases
Development
Setup Development Environment
# Clone repository
git clone <repository-url>
cd epgoat-internal/backend/epgoat
# Create virtual environment
python -m venv venv
source venv/bin/activate # On macOS/Linux
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest -v
# Run generator
python epg_generator.py --m3u sample.m3u --verbose
Code Style
- Python 3.11+ required
- Type hints required (100%)
- Google-style docstrings for all public functions
- Max line length: 100 characters
- Use
logginginstead ofprint() - Follow Engineering Standards
Adding New Features
- Create feature branch:
git checkout -b feature/my-feature - Implement changes with tests
- Run full test suite:
pytest - Run linting:
make lint(from repository root) - Commit with descriptive message (conventional commits format)
- Push and create pull request
Project Structure
backend/epgoat/
βββ epg_generator.py # Main entry point
βββ domain/ # Core business logic
β βββ models.py # Data classes (72 lines)
β βββ config.py # Configuration (90 lines)
β βββ patterns.py # Pattern matching (300 lines)
β βββ parsers.py # M3U and time parsing (374 lines)
β βββ schedulers.py # Schedule building (372 lines)
β βββ xmltv.py # XML generation (115 lines)
β βββ schemas.py # Pydantic validation (170 lines)
βββ tests/ # Test files (1,680 lines)
β βββ test_patterns.py
β βββ test_parsers.py
β βββ test_schedulers.py
β βββ test_config.py
β βββ test_schemas.py
β βββ test_integration.py
βββ utilities/ # Helper scripts
β βββ seed_thesportsdb.py
β βββ analyze_mismatches.py
βββ README.md # Package documentation
backend/config/
βββ sport_emojis.yml # 89 emoji mappings
βββ sport_categories.yml # 77 category mappings
Performance Considerations
- Pattern matching: O(nΓm) where n=channels, m=patterns
- Schedule building: O(n) per channel
- XML generation: O(n) where n=total programmes
- Memory usage: ~1MB per 1000 channels
For large playlists (10,000+ channels), consider: - Filtering by specific sport families first - Caching pattern compilation results - Using multiprocessing for parallel processing
Troubleshooting
Common Issues
Issue: ModuleNotFoundError: No module named 'pydantic'
pip install -r requirements.txt
Issue: Configuration validation fails
# Check config syntax
python -c "from backend.epgoat.domain.config import SPORT_EMOJIS; print(len(SPORT_EMOJIS))"
# Disable validation temporarily
python -c "from backend.epgoat.domain.config import load_sport_config; config = load_sport_config('sport_emojis.yml', validate=False)"
Issue: Pattern not matching channels
from backend.epgoat.domain.patterns import match_prefix_and_shell
matched, family, match_obj = match_prefix_and_shell("YOUR CHANNEL NAME")
print(f"Matched: {matched}, Family: {family}")
Issue: Time parsing fails
from backend.epgoat.domain.parsers import try_parse_time
from zoneinfo import ZoneInfo
import datetime as dt
result = try_parse_time("YOUR TIME STRING", 2025, ZoneInfo("America/Chicago"), dt.date.today())
print(f"Parsed time: {result}")
Issue: Low match rates
Run the mismatch analysis tool to debug low match rates (see Tracking API Mismatches section above).
Related Documentation
- Backend Overview - Backend architecture
- Command Reference - All CLI commands
- Quick Start - Getting started guide
Last Updated: 2025-11-10
Version: 2.4.0 (Documentation Migration)
Migration: Migrated from backend/epgoat/README.md in Phase 4.2