EPG Generation Guide

Status: Active Last Updated: 2025-11-10 Related Docs: Backend Overview, Command Reference Code Location: backend/epgoat/

Comprehensive guide to EPG (Electronic Program Guide) generation for IPTV services

Overview
Architecture
Modules
Configuration
Usage
Adding New Patterns
Testing
Development
Troubleshooting

Overview

The EPG Generator processes M3U playlists and generates XMLTV-formatted Electronic Program Guides for IPTV services. It intelligently matches channel names against sport league patterns, extracts event information, schedules programming blocks, and outputs standards-compliant XMLTV files.

Key Features

Pattern-Based Matching: 100+ regex patterns for sport leagues and streaming services
Time Extraction: Parses event times from channel names with timezone conversion
Smart Scheduling: Fills pre-event, live event, and post-event blocks intelligently
Schema Validation: Pydantic-based validation for configuration integrity
Comprehensive Testing: 250+ test cases covering all modules
Modular Design: Clean separation of concerns across 6 specialized modules

Architecture

The system follows a modular architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────────────────┐
│                      epg_generator.py                            │
│                   (Main Orchestration)                           │
└─────────────────────────────────────────────────────────────────┘
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
         ▼                     ▼                     ▼
  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
  │   models    │      │   config    │      │  patterns   │
  │             │      │             │      │             │
  │ Data classes│      │ YAML load   │      │ Regex match │
  │ Constants   │      │ Validation  │      │ Classify    │
  └─────────────┘      └─────────────┘      └─────────────┘
         │                     │                     │
         └─────────────────────┼─────────────────────┘
                               │
         ┌─────────────────────┼─────────────────────┐
         │                     │                     │
         ▼                     ▼                     ▼
  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
  │   parsers   │      │ schedulers  │      │    xmltv    │
  │             │      │             │      │             │
  │ M3U parsing │      │ Block fill  │      │ XML output  │
  │ Time extract│      │ Schedule    │      │ Formatting  │
  └─────────────┘      └─────────────┘      └─────────────┘

Data Flow

Parse M3U → Extract channel entries with metadata
Match Patterns → Identify sport families and extract event info
Parse Times → Extract and convert event times to target timezone
Build Schedules → Fill pre-event, live, and post-event blocks
Validate → Check for overlaps and duration issues
Generate XMLTV → Output XML document with channels and programmes

Modules

models.py

Core data models and constants.

Location: backend/epgoat/domain/models.py

Classes: - M3UEntry: Represents a single M3U playlist entry - ChannelClassification: Classification result with diagnostic info - EPGConstants: Configuration constants for timing and thresholds

Example:

from backend.epgoat.domain.models import M3UEntry, EPGConstants

entry = M3UEntry(
    attrs={"tvg-id": "nba01", "tvg-name": "NBA 01"},
    display_name="NBA 01: Lakers vs Celtics",
    url="http://example.com/stream"
)

print(f"Event duration: {EPGConstants.DEFAULT_EVENT_DURATION_MIN} minutes")

config.py

Configuration loading and management with schema validation.

Location: backend/epgoat/domain/config.py

Functions: - load_sport_config(filename, validate=True): Load and validate YAML configs - get_sport_emoji(family_name): Get emoji for sport family - get_sport_category(family_name): Get XMLTV category for sport

Example:

from backend.epgoat.domain.config import get_sport_emoji, get_sport_category

emoji = get_sport_emoji("NBA")  # Returns '🏀'
category = get_sport_category("NBA")  # Returns 'Sports / Basketball / NBA'

patterns.py

Pattern matching and channel classification (300+ lines, 100+ patterns).

Location: backend/epgoat/domain/patterns.py

Functions: - match_prefix_and_shell(name): Match channel name against patterns - classify_channel(name, family, match_obj): Classify as "generic" or "event" - validate_patterns(): Validate all patterns compile correctly

Example:

from backend.epgoat.domain.patterns import match_prefix_and_shell, classify_channel

matched, family, match_obj = match_prefix_and_shell("NBA 01: Lakers vs Celtics")
if matched:
    classif = classify_channel("NBA 01: Lakers vs Celtics", family, match_obj)
    print(f"Classification: {classif.classification}")
    print(f"Payload: {classif.payload}")

parsers.py

M3U parsing and time extraction (374 lines).

Location: backend/epgoat/domain/parsers.py

Functions: - parse_m3u(path): Parse M3U file and return list of entries - try_parse_time(payload, year, tz, date_context): Extract time from text - validate_url(url): Validate URL format - is_vod_url(url): Check if URL is VOD content

Supported Time Formats: - @ 07:30 PM ET - 12-hour with timezone - @ 19:30 ET - 24-hour with timezone - Oct 22 03:00 PM ET - With date - 2025-10-23 14:30:00 - ISO format - @ 8pm PT - Hour only

Example:

from backend.epgoat.domain.parsers import parse_m3u, try_parse_time
from zoneinfo import ZoneInfo
import datetime as dt

entries = parse_m3u("playlist.m3u")
central = ZoneInfo("America/Chicago")
event_time = try_parse_time("Game @ 07:30 PM ET", 2025, central, dt.date.today())

schedulers.py

Programme scheduling and block filling (372 lines).

Location: backend/epgoat/domain/schedulers.py

Functions: - add_block(programs, cid, title, start, end): Add programme block - fill_pre_event(...): Fill blocks before event start - fill_post_event(...): Fill blocks after event end - fill_no_programming(...): Fill with "No Event Scheduled" - validate_schedule(programs, cid): Check for overlaps

Example:

from backend.epgoat.domain.schedulers import add_block, fill_pre_event
import datetime as dt
from zoneinfo import ZoneInfo

programs = {}
central = ZoneInfo("America/Chicago")
event_start = dt.datetime(2025, 10, 22, 19, 0, 0, tzinfo=central)
day_start = dt.datetime(2025, 10, 22, 0, 0, 0, tzinfo=central)

fill_pre_event(programs, "nba-01", day_start, event_start,
              "Lakers vs Celtics", dt.date(2025, 10, 22),
              dt.date(2025, 10, 22), block_minutes=120)

xmltv.py

XMLTV document generation (115 lines).

Location: backend/epgoat/domain/xmltv.py

Functions: - build_xmltv(processed, match_data, programs, tz_name, target_date): Generate XMLTV

Example:

from backend.epgoat.domain.xmltv import build_xmltv

xmltv_content = build_xmltv(
    processed=entries,
    match_data=match_results,
    programs=program_dict,
    tz_name="America/Chicago",
    target_date=dt.date(2025, 10, 22)
)

with open("epg.xml", "w") as f:
    f.write(xmltv_content)

schemas.py

Pydantic schemas for configuration validation (170 lines).

Location: backend/epgoat/domain/schemas.py

Classes: - SportEmojiConfig: Validates emoji mappings - SportCategoryConfig: Validates category hierarchies

Example:

from backend.epgoat.domain.schemas import validate_sport_emojis

config = {
    'nba': '🏀',
    'nfl': '🏈',
    '_default': '🔴'
}

schema = validate_sport_emojis(config)  # Validates or raises ValueError

Configuration

Configuration files are located in backend/config/:

sport_emojis.yml

Maps sport family names to emojis:

# Basketball
nba: '🏀'
ncaab: '🏀'

# American Football
nfl: '🏈'
ncaaf: '🏈'

# Default
_default: '🔴'

Requirements: - Must include _default key - All values must be valid unicode emojis - Keys are case-insensitive when looked up

sport_categories.yml

Maps sport families to XMLTV categories:

# Basketball
nba: 'Sports / Basketball / NBA'
ncaab: 'Sports / Basketball / NCAA'

# Generic
_default: 'Sports'

Requirements: - Must include _default key - Hierarchical categories must start with "Sports" - Parts separated by " / " (space-slash-space) - No empty parts allowed

Usage

Basic Usage

# From repository root
cd backend/epgoat

# Generate EPG for today
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago"

# Generate EPG for specific date
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --date 2025-10-22

# Enable verbose logging
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --verbose

# Specify output path
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --out epg.xml

Refreshing League and Team Reference Data

Daily enrichment relies on local JSON mirrors of TheSportsDB leagues and teams. Seed or refresh these caches with:

# From backend/epgoat
python utilities/seed_thesportsdb.py --refresh --verbose

This command writes data/leagues_db.json and data/teams_db.json, which power league inference and team lookups. It is safe to run repeatedly; unchanged datasets will simply be overwritten.

Tip: The enrichment services automatically trigger a lightweight seeding pass the first time these files are missing, using the default TheSportsDB API key (or THESPORTSDB_API_KEY if set). Explicit refreshes remain the recommended way to keep the mirrors up to date for daily operations.

Command-Line Options

Option	Description	Default
`--m3u PATH`	M3U playlist file or URL	Required
`--tz TIMEZONE`	Target timezone (IANA format)	`America/Chicago`
`--date YYYY-MM-DD`	Target date	Today
`--out PATH`	Output XML file path	`epg_output.xml`
`--verbose`	Enable debug logging	False

Tracking API Mismatches

The generator persists every enrichment miss to dist/mismatches.db. Each run logs the database path:

Mismatch tracker database: dist/mismatches.db

Inspect unresolved cases with the analysis helper:

# From backend/epgoat
python utilities/analyze_mismatches.py --db dist/mismatches.db

Add --family <family-slug> (and optionally --family-limit 40) to print concrete samples with record IDs, payloads, and parsed teams so you can map fixes back to specific database rows.

Programmatic Usage

from backend.epgoat.domain.parsers import parse_m3u
from backend.epgoat.domain.patterns import match_prefix_and_shell, classify_channel
from backend.epgoat.domain.schedulers import fill_pre_event, add_block
from backend.epgoat.domain.xmltv import build_xmltv
from zoneinfo import ZoneInfo
import datetime as dt

# Parse M3U
entries = parse_m3u("playlist.m3u")

# Process channels
programs = {}
processed = []
match_data = []

for entry in entries:
    matched, family, match_obj = match_prefix_and_shell(entry.display_name)
    if matched:
        processed.append(entry)
        match_data.append((family, match_obj))

        # Build schedule for this channel
        # ... (schedule building logic)

# Generate XMLTV
xmltv = build_xmltv(processed, match_data, programs, "America/Chicago", dt.date.today())

Adding New Patterns

To add support for a new sport league or streaming service:

1. Add Pattern to patterns.py

Edit the ALLOWED_CHANNEL_PATTERNS list in backend/epgoat/domain/patterns.py:

ALLOWED_CHANNEL_PATTERNS = [
    # ... existing patterns ...

    # Your new pattern
    (r'^NEW-LEAGUE\s+\d+\s*:?', 'NEW-LEAGUE'),
]

Pattern Guidelines: - Use ^ to anchor to start of string - Use \s+ for flexible whitespace - Use \d+ to match channel numbers - Make colon optional with :? for streaming services - Use re.IGNORECASE flag (automatically applied)

2. Add Emoji Mapping

Edit backend/config/sport_emojis.yml:

# Your new league
new-league: '⚽'  # Choose appropriate emoji

3. Add Category Mapping

Edit backend/config/sport_categories.yml:

# Your new league
new-league: 'Sports / Soccer / New League'

4. Test Your Pattern

from backend.epgoat.domain.patterns import match_prefix_and_shell

# Test matching
matched, family, match_obj = match_prefix_and_shell("NEW-LEAGUE 01: Team A vs Team B")
assert matched == True
assert family == "NEW-LEAGUE"

5. Add Unit Tests

Add test cases to backend/epgoat/tests/test_patterns.py:

def test_new_league_patterns(self):
    """Test NEW-LEAGUE patterns."""
    test_cases = [
        ("NEW-LEAGUE 01: Team A vs Team B", True, "NEW-LEAGUE"),
        ("NEW-LEAGUE 05: Championship", True, "NEW-LEAGUE"),
    ]
    for channel_name, should_match, expected_family in test_cases:
        matched, family, _ = match_prefix_and_shell(channel_name)
        assert matched == should_match
        if should_match:
            assert family == expected_family

Testing

The project has comprehensive test coverage with 250+ tests across 6 test files.

Running Tests

# From repository root
cd backend/epgoat

# Run all tests
pytest

# Run specific test file
pytest tests/test_patterns.py -v

# Run with coverage
pytest --cov=. --cov-report=html

# Run integration tests only
pytest tests/test_integration.py -v

Test Files

File	Tests	Coverage
`tests/test_patterns.py`	60+	Pattern matching and classification
`tests/test_parsers.py`	50+	M3U parsing and time extraction
`tests/test_schedulers.py`	40+	Schedule building and validation
`tests/test_config.py`	30+	Configuration loading
`tests/test_schemas.py`	50+	Schema validation
`tests/test_integration.py`	20+	End-to-end workflows

Test Coverage Areas

✅ All 100+ channel patterns
✅ All 8 time parsing formats
✅ Timezone conversions (ET, CT, PT, etc.)
✅ Schedule block filling (pre/live/post-event)
✅ XMLTV generation
✅ Configuration validation
✅ Error handling and edge cases

Development

Setup Development Environment

# Clone repository
git clone <repository-url>
cd epgoat-internal/backend/epgoat

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest -v

# Run generator
python epg_generator.py --m3u sample.m3u --verbose

Code Style

Python 3.11+ required
Type hints required (100%)
Google-style docstrings for all public functions
Max line length: 100 characters
Use logging instead of print()
Follow Engineering Standards

Adding New Features

Create feature branch: git checkout -b feature/my-feature
Implement changes with tests
Run full test suite: pytest
Run linting: make lint (from repository root)
Commit with descriptive message (conventional commits format)
Push and create pull request

Project Structure

backend/epgoat/
├── epg_generator.py           # Main entry point
├── domain/                    # Core business logic
│   ├── models.py             # Data classes (72 lines)
│   ├── config.py             # Configuration (90 lines)
│   ├── patterns.py           # Pattern matching (300 lines)
│   ├── parsers.py            # M3U and time parsing (374 lines)
│   ├── schedulers.py         # Schedule building (372 lines)
│   ├── xmltv.py             # XML generation (115 lines)
│   └── schemas.py           # Pydantic validation (170 lines)
├── tests/                    # Test files (1,680 lines)
│   ├── test_patterns.py
│   ├── test_parsers.py
│   ├── test_schedulers.py
│   ├── test_config.py
│   ├── test_schemas.py
│   └── test_integration.py
├── utilities/               # Helper scripts
│   ├── seed_thesportsdb.py
│   └── analyze_mismatches.py
└── README.md                # Package documentation

backend/config/
├── sport_emojis.yml         # 89 emoji mappings
└── sport_categories.yml     # 77 category mappings

Performance Considerations

Pattern matching: O(n×m) where n=channels, m=patterns
Schedule building: O(n) per channel
XML generation: O(n) where n=total programmes
Memory usage: ~1MB per 1000 channels

For large playlists (10,000+ channels), consider: - Filtering by specific sport families first - Caching pattern compilation results - Using multiprocessing for parallel processing

Troubleshooting

Common Issues

Issue: ModuleNotFoundError: No module named 'pydantic'

pip install -r requirements.txt

Issue: Configuration validation fails

# Check config syntax
python -c "from backend.epgoat.domain.config import SPORT_EMOJIS; print(len(SPORT_EMOJIS))"

# Disable validation temporarily
python -c "from backend.epgoat.domain.config import load_sport_config; config = load_sport_config('sport_emojis.yml', validate=False)"

Issue: Pattern not matching channels

from backend.epgoat.domain.patterns import match_prefix_and_shell
matched, family, match_obj = match_prefix_and_shell("YOUR CHANNEL NAME")
print(f"Matched: {matched}, Family: {family}")

Issue: Time parsing fails

from backend.epgoat.domain.parsers import try_parse_time
from zoneinfo import ZoneInfo
import datetime as dt

result = try_parse_time("YOUR TIME STRING", 2025, ZoneInfo("America/Chicago"), dt.date.today())
print(f"Parsed time: {result}")

Issue: Low match rates

Run the mismatch analysis tool to debug low match rates (see Tracking API Mismatches section above).

Backend Overview - Backend architecture
Command Reference - All CLI commands
Quick Start - Getting started guide

Last Updated: 2025-11-10 Version: 2.4.0 (Documentation Migration) Migration: Migrated from backend/epgoat/README.md in Phase 4.2

EPG Generation Guide

Table of Contents

Overview

Key Features

Architecture

Data Flow

Modules

models.py

config.py

patterns.py

parsers.py

schedulers.py

xmltv.py

schemas.py

Configuration

sport_emojis.yml

sport_categories.yml

Usage

Basic Usage

Refreshing League and Team Reference Data

Command-Line Options

Tracking API Mismatches

Programmatic Usage

Adding New Patterns

1. Add Pattern to patterns.py

2. Add Emoji Mapping

3. Add Category Mapping

4. Test Your Pattern

5. Add Unit Tests

Testing

Running Tests

Test Files

Test Coverage Areas

Development

Setup Development Environment

Code Style

Adding New Features

Project Structure

Performance Considerations

Troubleshooting

Common Issues

Related Documentation