Epg Generation

EPGOAT Documentation - User Guides

EPG Generation Guide

Status: Active Last Updated: 2025-11-10 Related Docs: Backend Overview, Command Reference Code Location: backend/epgoat/


Comprehensive guide to EPG (Electronic Program Guide) generation for IPTV services

Table of Contents

Overview

The EPG Generator processes M3U playlists and generates XMLTV-formatted Electronic Program Guides for IPTV services. It intelligently matches channel names against sport league patterns, extracts event information, schedules programming blocks, and outputs standards-compliant XMLTV files.

Key Features

  • Pattern-Based Matching: 100+ regex patterns for sport leagues and streaming services
  • Time Extraction: Parses event times from channel names with timezone conversion
  • Smart Scheduling: Fills pre-event, live event, and post-event blocks intelligently
  • Schema Validation: Pydantic-based validation for configuration integrity
  • Comprehensive Testing: 250+ test cases covering all modules
  • Modular Design: Clean separation of concerns across 6 specialized modules

Architecture

The system follows a modular architecture with clear separation of concerns:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      epg_generator.py                            β”‚
β”‚                   (Main Orchestration)                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                     β”‚                     β”‚
         β–Ό                     β–Ό                     β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   models    β”‚      β”‚   config    β”‚      β”‚  patterns   β”‚
  β”‚             β”‚      β”‚             β”‚      β”‚             β”‚
  β”‚ Data classesβ”‚      β”‚ YAML load   β”‚      β”‚ Regex match β”‚
  β”‚ Constants   β”‚      β”‚ Validation  β”‚      β”‚ Classify    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                     β”‚                     β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                     β”‚                     β”‚
         β–Ό                     β–Ό                     β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   parsers   β”‚      β”‚ schedulers  β”‚      β”‚    xmltv    β”‚
  β”‚             β”‚      β”‚             β”‚      β”‚             β”‚
  β”‚ M3U parsing β”‚      β”‚ Block fill  β”‚      β”‚ XML output  β”‚
  β”‚ Time extractβ”‚      β”‚ Schedule    β”‚      β”‚ Formatting  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

  1. Parse M3U β†’ Extract channel entries with metadata
  2. Match Patterns β†’ Identify sport families and extract event info
  3. Parse Times β†’ Extract and convert event times to target timezone
  4. Build Schedules β†’ Fill pre-event, live, and post-event blocks
  5. Validate β†’ Check for overlaps and duration issues
  6. Generate XMLTV β†’ Output XML document with channels and programmes

Modules

models.py

Core data models and constants.

Location: backend/epgoat/domain/models.py

Classes: - M3UEntry: Represents a single M3U playlist entry - ChannelClassification: Classification result with diagnostic info - EPGConstants: Configuration constants for timing and thresholds

Example:

from backend.epgoat.domain.models import M3UEntry, EPGConstants

entry = M3UEntry(
    attrs={"tvg-id": "nba01", "tvg-name": "NBA 01"},
    display_name="NBA 01: Lakers vs Celtics",
    url="http://example.com/stream"
)

print(f"Event duration: {EPGConstants.DEFAULT_EVENT_DURATION_MIN} minutes")

config.py

Configuration loading and management with schema validation.

Location: backend/epgoat/domain/config.py

Functions: - load_sport_config(filename, validate=True): Load and validate YAML configs - get_sport_emoji(family_name): Get emoji for sport family - get_sport_category(family_name): Get XMLTV category for sport

Example:

from backend.epgoat.domain.config import get_sport_emoji, get_sport_category

emoji = get_sport_emoji("NBA")  # Returns 'πŸ€'
category = get_sport_category("NBA")  # Returns 'Sports / Basketball / NBA'

patterns.py

Pattern matching and channel classification (300+ lines, 100+ patterns).

Location: backend/epgoat/domain/patterns.py

Functions: - match_prefix_and_shell(name): Match channel name against patterns - classify_channel(name, family, match_obj): Classify as "generic" or "event" - validate_patterns(): Validate all patterns compile correctly

Example:

from backend.epgoat.domain.patterns import match_prefix_and_shell, classify_channel

matched, family, match_obj = match_prefix_and_shell("NBA 01: Lakers vs Celtics")
if matched:
    classif = classify_channel("NBA 01: Lakers vs Celtics", family, match_obj)
    print(f"Classification: {classif.classification}")
    print(f"Payload: {classif.payload}")

parsers.py

M3U parsing and time extraction (374 lines).

Location: backend/epgoat/domain/parsers.py

Functions: - parse_m3u(path): Parse M3U file and return list of entries - try_parse_time(payload, year, tz, date_context): Extract time from text - validate_url(url): Validate URL format - is_vod_url(url): Check if URL is VOD content

Supported Time Formats: - @ 07:30 PM ET - 12-hour with timezone - @ 19:30 ET - 24-hour with timezone - Oct 22 03:00 PM ET - With date - 2025-10-23 14:30:00 - ISO format - @ 8pm PT - Hour only

Example:

from backend.epgoat.domain.parsers import parse_m3u, try_parse_time
from zoneinfo import ZoneInfo
import datetime as dt

entries = parse_m3u("playlist.m3u")
central = ZoneInfo("America/Chicago")
event_time = try_parse_time("Game @ 07:30 PM ET", 2025, central, dt.date.today())

schedulers.py

Programme scheduling and block filling (372 lines).

Location: backend/epgoat/domain/schedulers.py

Functions: - add_block(programs, cid, title, start, end): Add programme block - fill_pre_event(...): Fill blocks before event start - fill_post_event(...): Fill blocks after event end - fill_no_programming(...): Fill with "No Event Scheduled" - validate_schedule(programs, cid): Check for overlaps

Example:

from backend.epgoat.domain.schedulers import add_block, fill_pre_event
import datetime as dt
from zoneinfo import ZoneInfo

programs = {}
central = ZoneInfo("America/Chicago")
event_start = dt.datetime(2025, 10, 22, 19, 0, 0, tzinfo=central)
day_start = dt.datetime(2025, 10, 22, 0, 0, 0, tzinfo=central)

fill_pre_event(programs, "nba-01", day_start, event_start,
              "Lakers vs Celtics", dt.date(2025, 10, 22),
              dt.date(2025, 10, 22), block_minutes=120)

xmltv.py

XMLTV document generation (115 lines).

Location: backend/epgoat/domain/xmltv.py

Functions: - build_xmltv(processed, match_data, programs, tz_name, target_date): Generate XMLTV

Example:

from backend.epgoat.domain.xmltv import build_xmltv

xmltv_content = build_xmltv(
    processed=entries,
    match_data=match_results,
    programs=program_dict,
    tz_name="America/Chicago",
    target_date=dt.date(2025, 10, 22)
)

with open("epg.xml", "w") as f:
    f.write(xmltv_content)

schemas.py

Pydantic schemas for configuration validation (170 lines).

Location: backend/epgoat/domain/schemas.py

Classes: - SportEmojiConfig: Validates emoji mappings - SportCategoryConfig: Validates category hierarchies

Example:

from backend.epgoat.domain.schemas import validate_sport_emojis

config = {
    'nba': 'πŸ€',
    'nfl': '🏈',
    '_default': 'πŸ”΄'
}

schema = validate_sport_emojis(config)  # Validates or raises ValueError

Configuration

Configuration files are located in backend/config/:

sport_emojis.yml

Maps sport family names to emojis:

# Basketball
nba: 'πŸ€'
ncaab: 'πŸ€'

# American Football
nfl: '🏈'
ncaaf: '🏈'

# Default
_default: 'πŸ”΄'

Requirements: - Must include _default key - All values must be valid unicode emojis - Keys are case-insensitive when looked up

sport_categories.yml

Maps sport families to XMLTV categories:

# Basketball
nba: 'Sports / Basketball / NBA'
ncaab: 'Sports / Basketball / NCAA'

# Generic
_default: 'Sports'

Requirements: - Must include _default key - Hierarchical categories must start with "Sports" - Parts separated by " / " (space-slash-space) - No empty parts allowed

Usage

Basic Usage

# From repository root
cd backend/epgoat

# Generate EPG for today
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago"

# Generate EPG for specific date
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --date 2025-10-22

# Enable verbose logging
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --verbose

# Specify output path
python epg_generator.py --m3u playlist.m3u --tz "America/Chicago" --out epg.xml

Refreshing League and Team Reference Data

Daily enrichment relies on local JSON mirrors of TheSportsDB leagues and teams. Seed or refresh these caches with:

# From backend/epgoat
python utilities/seed_thesportsdb.py --refresh --verbose

This command writes data/leagues_db.json and data/teams_db.json, which power league inference and team lookups. It is safe to run repeatedly; unchanged datasets will simply be overwritten.

Tip: The enrichment services automatically trigger a lightweight seeding pass the first time these files are missing, using the default TheSportsDB API key (or THESPORTSDB_API_KEY if set). Explicit refreshes remain the recommended way to keep the mirrors up to date for daily operations.

Command-Line Options

Option Description Default
--m3u PATH M3U playlist file or URL Required
--tz TIMEZONE Target timezone (IANA format) America/Chicago
--date YYYY-MM-DD Target date Today
--out PATH Output XML file path epg_output.xml
--verbose Enable debug logging False

Tracking API Mismatches

The generator persists every enrichment miss to dist/mismatches.db. Each run logs the database path:

Mismatch tracker database: dist/mismatches.db

Inspect unresolved cases with the analysis helper:

# From backend/epgoat
python utilities/analyze_mismatches.py --db dist/mismatches.db

Add --family <family-slug> (and optionally --family-limit 40) to print concrete samples with record IDs, payloads, and parsed teams so you can map fixes back to specific database rows.

Programmatic Usage

from backend.epgoat.domain.parsers import parse_m3u
from backend.epgoat.domain.patterns import match_prefix_and_shell, classify_channel
from backend.epgoat.domain.schedulers import fill_pre_event, add_block
from backend.epgoat.domain.xmltv import build_xmltv
from zoneinfo import ZoneInfo
import datetime as dt

# Parse M3U
entries = parse_m3u("playlist.m3u")

# Process channels
programs = {}
processed = []
match_data = []

for entry in entries:
    matched, family, match_obj = match_prefix_and_shell(entry.display_name)
    if matched:
        processed.append(entry)
        match_data.append((family, match_obj))

        # Build schedule for this channel
        # ... (schedule building logic)

# Generate XMLTV
xmltv = build_xmltv(processed, match_data, programs, "America/Chicago", dt.date.today())

Adding New Patterns

To add support for a new sport league or streaming service:

1. Add Pattern to patterns.py

Edit the ALLOWED_CHANNEL_PATTERNS list in backend/epgoat/domain/patterns.py:

ALLOWED_CHANNEL_PATTERNS = [
    # ... existing patterns ...

    # Your new pattern
    (r'^NEW-LEAGUE\s+\d+\s*:?', 'NEW-LEAGUE'),
]

Pattern Guidelines: - Use ^ to anchor to start of string - Use \s+ for flexible whitespace - Use \d+ to match channel numbers - Make colon optional with :? for streaming services - Use re.IGNORECASE flag (automatically applied)

2. Add Emoji Mapping

Edit backend/config/sport_emojis.yml:

# Your new league
new-league: '⚽'  # Choose appropriate emoji

3. Add Category Mapping

Edit backend/config/sport_categories.yml:

# Your new league
new-league: 'Sports / Soccer / New League'

4. Test Your Pattern

from backend.epgoat.domain.patterns import match_prefix_and_shell

# Test matching
matched, family, match_obj = match_prefix_and_shell("NEW-LEAGUE 01: Team A vs Team B")
assert matched == True
assert family == "NEW-LEAGUE"

5. Add Unit Tests

Add test cases to backend/epgoat/tests/test_patterns.py:

def test_new_league_patterns(self):
    """Test NEW-LEAGUE patterns."""
    test_cases = [
        ("NEW-LEAGUE 01: Team A vs Team B", True, "NEW-LEAGUE"),
        ("NEW-LEAGUE 05: Championship", True, "NEW-LEAGUE"),
    ]
    for channel_name, should_match, expected_family in test_cases:
        matched, family, _ = match_prefix_and_shell(channel_name)
        assert matched == should_match
        if should_match:
            assert family == expected_family

Testing

The project has comprehensive test coverage with 250+ tests across 6 test files.

Running Tests

# From repository root
cd backend/epgoat

# Run all tests
pytest

# Run specific test file
pytest tests/test_patterns.py -v

# Run with coverage
pytest --cov=. --cov-report=html

# Run integration tests only
pytest tests/test_integration.py -v

Test Files

File Tests Coverage
tests/test_patterns.py 60+ Pattern matching and classification
tests/test_parsers.py 50+ M3U parsing and time extraction
tests/test_schedulers.py 40+ Schedule building and validation
tests/test_config.py 30+ Configuration loading
tests/test_schemas.py 50+ Schema validation
tests/test_integration.py 20+ End-to-end workflows

Test Coverage Areas

  • βœ… All 100+ channel patterns
  • βœ… All 8 time parsing formats
  • βœ… Timezone conversions (ET, CT, PT, etc.)
  • βœ… Schedule block filling (pre/live/post-event)
  • βœ… XMLTV generation
  • βœ… Configuration validation
  • βœ… Error handling and edge cases

Development

Setup Development Environment

# Clone repository
git clone <repository-url>
cd epgoat-internal/backend/epgoat

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run tests
pytest -v

# Run generator
python epg_generator.py --m3u sample.m3u --verbose

Code Style

  • Python 3.11+ required
  • Type hints required (100%)
  • Google-style docstrings for all public functions
  • Max line length: 100 characters
  • Use logging instead of print()
  • Follow Engineering Standards

Adding New Features

  1. Create feature branch: git checkout -b feature/my-feature
  2. Implement changes with tests
  3. Run full test suite: pytest
  4. Run linting: make lint (from repository root)
  5. Commit with descriptive message (conventional commits format)
  6. Push and create pull request

Project Structure

backend/epgoat/
β”œβ”€β”€ epg_generator.py           # Main entry point
β”œβ”€β”€ domain/                    # Core business logic
β”‚   β”œβ”€β”€ models.py             # Data classes (72 lines)
β”‚   β”œβ”€β”€ config.py             # Configuration (90 lines)
β”‚   β”œβ”€β”€ patterns.py           # Pattern matching (300 lines)
β”‚   β”œβ”€β”€ parsers.py            # M3U and time parsing (374 lines)
β”‚   β”œβ”€β”€ schedulers.py         # Schedule building (372 lines)
β”‚   β”œβ”€β”€ xmltv.py             # XML generation (115 lines)
β”‚   └── schemas.py           # Pydantic validation (170 lines)
β”œβ”€β”€ tests/                    # Test files (1,680 lines)
β”‚   β”œβ”€β”€ test_patterns.py
β”‚   β”œβ”€β”€ test_parsers.py
β”‚   β”œβ”€β”€ test_schedulers.py
β”‚   β”œβ”€β”€ test_config.py
β”‚   β”œβ”€β”€ test_schemas.py
β”‚   └── test_integration.py
β”œβ”€β”€ utilities/               # Helper scripts
β”‚   β”œβ”€β”€ seed_thesportsdb.py
β”‚   └── analyze_mismatches.py
└── README.md                # Package documentation

backend/config/
β”œβ”€β”€ sport_emojis.yml         # 89 emoji mappings
└── sport_categories.yml     # 77 category mappings

Performance Considerations

  • Pattern matching: O(nΓ—m) where n=channels, m=patterns
  • Schedule building: O(n) per channel
  • XML generation: O(n) where n=total programmes
  • Memory usage: ~1MB per 1000 channels

For large playlists (10,000+ channels), consider: - Filtering by specific sport families first - Caching pattern compilation results - Using multiprocessing for parallel processing

Troubleshooting

Common Issues

Issue: ModuleNotFoundError: No module named 'pydantic'

pip install -r requirements.txt

Issue: Configuration validation fails

# Check config syntax
python -c "from backend.epgoat.domain.config import SPORT_EMOJIS; print(len(SPORT_EMOJIS))"

# Disable validation temporarily
python -c "from backend.epgoat.domain.config import load_sport_config; config = load_sport_config('sport_emojis.yml', validate=False)"

Issue: Pattern not matching channels

from backend.epgoat.domain.patterns import match_prefix_and_shell
matched, family, match_obj = match_prefix_and_shell("YOUR CHANNEL NAME")
print(f"Matched: {matched}, Family: {family}")

Issue: Time parsing fails

from backend.epgoat.domain.parsers import try_parse_time
from zoneinfo import ZoneInfo
import datetime as dt

result = try_parse_time("YOUR TIME STRING", 2025, ZoneInfo("America/Chicago"), dt.date.today())
print(f"Parsed time: {result}")

Issue: Low match rates

Run the mismatch analysis tool to debug low match rates (see Tracking API Mismatches section above).


Last Updated: 2025-11-10 Version: 2.4.0 (Documentation Migration) Migration: Migrated from backend/epgoat/README.md in Phase 4.2