Pattern Matching

EPGOAT Documentation - User Guides

Pattern Addition Guide

Status: Active Last Updated: 2025-11-02 Related Docs: EPG Matching Pipeline, Command Reference Code Location: backend/epgoat/domain/patterns.py, backend/config/sport_emojis.yml, backend/config/sport_categories.yml


Quick reference for adding new sport leagues and channel patterns

Quick Start

Adding support for a new sport league requires 4 steps:

  1. Add regex patternbackend/epgoat/domain/patterns.py
  2. Add emoji mappingconfig/sport_emojis.yml
  3. Add category mappingconfig/sport_categories.yml
  4. Add teststest_patterns.py

Step-by-Step Example

Let's add support for a new league called "XFL" (Extreme Football League).

Step 1: Add Pattern to patterns.py

Edit backend/epgoat/domain/patterns.py and add to ALLOWED_CHANNEL_PATTERNS:

ALLOWED_CHANNEL_PATTERNS = [
    # ... existing patterns ...

    # XFL - Extreme Football League
    (r'^XFL\s+\d+\s*:', 'XFL'),
]

Pattern Breakdown: - ^ - Start of string (required) - XFL - League name (case-insensitive due to IGNORECASE flag) - \s+ - One or more whitespace characters - \d+ - One or more digits (channel number) - \s* - Optional whitespace - : - Colon (required for this pattern)

Pattern Variations:

For streaming services where colon is optional:

(r'^XFL\s+\d+\s*:?', 'XFL'),  # Colon optional

For patterns with pipe separator:

(r'^XFL\s*\|\s*\d+', 'XFL |'),  # Note the space in family name

For patterns with special names:

(r'^XFL\s+Game\s+Pass\s+\d+', 'XFL Game Pass'),

Step 2: Add Emoji Mapping

Edit backend/config/sport_emojis.yml:

# ... existing mappings ...

# Extreme Football
xfl: '🏈'

Important: - Key must be lowercase - Value must be a valid unicode emoji - Use existing emoji if appropriate (e.g., '🏈' for football)

Common Emojis: - 🏀 Basketball - 🏈 American Football - ⚽ Soccer - 🏒 Hockey - ⚾ Baseball - 🥊 Combat Sports - 🎾 Tennis - 🏉 Rugby - 🏎️ Motorsports - 🔴 Generic/Streaming

Step 3: Add Category Mapping

Edit backend/config/sport_categories.yml:

# ... existing mappings ...

# Extreme Football
xfl: 'Sports / American Football / XFL'

Format Rules: - Must start with Sports - Parts separated by / (space-slash-space) - Follow hierarchy: Sports / <Sport Type> / <League> - No empty parts

Common Sport Types: - Basketball - American Football - Soccer - Ice Hockey - Baseball - Combat Sports - Tennis - Motorsports

Step 4: Add Tests

Edit test_patterns.py and add test cases:

def test_xfl_patterns(self):
    """Test XFL patterns."""
    test_cases = [
        ("XFL 01: Dragons vs Guardians", True, "XFL"),
        ("XFL 05: Championship Game", True, "XFL"),
        ("XFL 10: FHD", True, "XFL"),
    ]
    for channel_name, should_match, expected_family in test_cases:
        matched, family, _ = match_prefix_and_shell(channel_name)
        assert matched == should_match, f"Failed for: {channel_name}"
        if should_match:
            assert family == expected_family

Add to appropriate test class in test_patterns.py: - TestSportLeaguePatterns for sport leagues - TestStreamingServicePatterns for streaming services

Step 5: Verify

Run tests to verify:

# Run pattern tests
pytest test_patterns.py::TestSportLeaguePatterns::test_xfl_patterns -v

# Run all pattern tests
pytest test_patterns.py -v

# Test manually
python -c "
from patterns import match_prefix_and_shell
from config import get_sport_emoji, get_sport_category

matched, family, match_obj = match_prefix_and_shell('XFL 01: Dragons vs Guardians')
print(f'Matched: {matched}')
print(f'Family: {family}')
print(f'Emoji: {get_sport_emoji(family)}')
print(f'Category: {get_sport_category(family)}')
"

Expected output:

Matched: True
Family: XFL
Emoji: 🏈
Category: Sports / American Football / XFL

Pattern Examples

Standard League Pattern

Channel names like: NBA 01: Lakers vs Celtics

(r'^NBA\s+\d+\s*:', 'NBA'),

Streaming Service (Optional Colon)

Channel names like: ESPN+ 01 or ESPN+ 01:

(r'^ESPN\+\s+\d+\s*:?', 'ESPN+'),

Note: Escape + with \+

League with Pipe Separator

Channel names like: NFL | 03: Game

(r'^NFL\s*\|\s*\d+', 'NFL |'),

Note: Family name includes the space and pipe

League with Sub-Brand

Channel names like: NFL Game Pass 1: RedZone

(r'^NFL\s+Game\s+Pass\s+\d+', 'NFL Game Pass'),

International Services

Channel names like: DAZN CA 01: Boxing

(r'^DAZN\s+CA\s+\d+\s*:?', 'DAZN CA'),

Multi-Word Leagues

Channel names like: UEFA Champions League 01: Final

(r'^UEFA\s+Champions\s+League\s+\d+', 'UEFA Champions League'),

Special Characters

Channel names like: SEC+ 03: Game

(r'^SEC\+\s+\d+\s*:?', 'SEC+'),

Escape special regex characters: + . * ? [ ] ( ) { } ^ $ | \

Pattern Testing Checklist

Before committing your pattern:

  • [ ] Pattern compiles without errors
  • [ ] Pattern matches expected channel names
  • [ ] Pattern doesn't match unrelated channels (test negative cases)
  • [ ] Emoji is valid unicode character
  • [ ] Category follows hierarchical format
  • [ ] Unit tests added and passing
  • [ ] Manual verification completed

Common Mistakes

❌ Missing Anchor

(r'NBA\s+\d+\s*:', 'NBA'),  # Wrong - could match mid-string

Correct:

(r'^NBA\s+\d+\s*:', 'NBA'),  # Anchored to start

❌ Not Escaping Special Characters

(r'^ESPN+\s+\d+', 'ESPN+'),  # Wrong - + is a regex operator

Correct:

(r'^ESPN\+\s+\d+', 'ESPN+'),  # Escaped with \+

❌ Family Name Doesn't Match Pattern

(r'^NBA\s+\d+\s*:', 'Basketball'),  # Wrong - use 'NBA'

Correct:

(r'^NBA\s+\d+\s*:', 'NBA'),  # Family matches league name

❌ Category Doesn't Start with Sports

nba: 'Basketball / NBA'  # Wrong

Correct:

nba: 'Sports / Basketball / NBA'

❌ Incorrect Category Separator

nba: 'Sports/Basketball/NBA'  # Wrong - no spaces

Correct:

nba: 'Sports / Basketball / NBA'  # Space-slash-space

Pattern Priority

Patterns are checked in order, so more specific patterns should come before generic ones:

ALLOWED_CHANNEL_PATTERNS = [
    # Specific patterns first
    (r'^NFL\s+Game\s+Pass\s+\d+', 'NFL Game Pass'),
    (r'^NFL\s+Multi\s+Screen', 'NFL Multi Screen'),

    # Generic pattern last
    (r'^NFL\s+\d+\s*:', 'NFL'),
]

If generic NFL pattern came first, it would match "NFL Game Pass 1" before the specific pattern.

Advanced Patterns

Optional Time in Channel Name

Some channels include time: NBA 01 @ 7:30 PM: Lakers vs Celtics

The pattern still works because we only match the prefix:

(r'^NBA\s+\d+\s*:', 'NBA'),  # Matches up to the colon

The rest (@ 7:30 PM: Lakers vs Celtics) becomes the "shell" that's parsed for event info.

Channels with Multiple Formats

If a league uses inconsistent naming, add multiple patterns:

ALLOWED_CHANNEL_PATTERNS = [
    (r'^MLS\s+\d+\s*:', 'MLS'),           # "MLS 01: Game"
    (r'^MLS\s*\|\s*\d+', 'MLS |'),        # "MLS | 01"
    (r'^MLS\s+Espanol\s+\d+', 'MLS Espanol'),  # "MLS Espanol 01"
]

Different family names allow different configurations while keeping the base league recognizable.

Testing Your Pattern

Unit Test Template

def test_your_league_patterns(self):
    """Test YOUR-LEAGUE patterns."""
    test_cases = [
        # Format: (channel_name, should_match, expected_family)
        ("YOUR-LEAGUE 01: Event Name", True, "YOUR-LEAGUE"),
        ("YOUR-LEAGUE 05: Another Event", True, "YOUR-LEAGUE"),
        ("YOUR-LEAGUE 99:", True, "YOUR-LEAGUE"),
        ("NOT-YOUR-LEAGUE 01:", False, None),  # Negative case
    ]
    for channel_name, should_match, expected_family in test_cases:
        matched, family, _ = match_prefix_and_shell(channel_name)
        assert matched == should_match, f"Failed for: {channel_name}"
        if should_match:
            assert family == expected_family

Integration Test

Verify the complete workflow:

def test_your_league_workflow(self):
    """Test complete workflow for YOUR-LEAGUE."""
    channel_name = "YOUR-LEAGUE 01: Team A vs Team B"

    # Step 1: Pattern matching
    matched, family, match_obj = match_prefix_and_shell(channel_name)
    assert matched is True
    assert family == "YOUR-LEAGUE"

    # Step 2: Classification
    classif = classify_channel(channel_name, family, match_obj)
    assert classif.classification == "event"
    assert "Team A vs Team B" in classif.payload

    # Step 3: Config lookups
    emoji = get_sport_emoji(family)
    category = get_sport_category(family)
    assert emoji == '🏈'  # Your expected emoji
    assert "YOUR-LEAGUE" in category

Validation

After adding your pattern, validate the configuration:

# Validate emoji config
python -c "from schemas import validate_sport_emojis; from config import load_sport_config; validate_sport_emojis(load_sport_config('sport_emojis.yml', validate=False))"

# Validate category config
python -c "from schemas import validate_sport_categories; from config import load_sport_config; validate_sport_categories(load_sport_config('sport_categories.yml', validate=False))"

# Run all validation tests
pytest test_schemas.py -v

Debugging Tips

Pattern Not Matching

import re

pattern = r'^YOUR-LEAGUE\s+\d+\s*:'
channel = "YOUR-LEAGUE 01: Event"

rx = re.compile(pattern, re.IGNORECASE)
match = rx.match(channel)

if match:
    print(f"Matched: {match.group()}")
    print(f"Match object: {match}")
else:
    print("No match - check pattern")

Check Existing Patterns

from patterns import ALLOWED_CHANNEL_PATTERNS

# List all patterns
for pattern, family in ALLOWED_CHANNEL_PATTERNS:
    print(f"{family:30} {pattern}")

Verify Config Loading

from config import SPORT_EMOJIS, SPORT_CATEGORIES

print(f"Emoji for 'your-league': {SPORT_EMOJIS.get('your-league')}")
print(f"Category for 'your-league': {SPORT_CATEGORIES.get('your-league')}")

Resources

Getting Help

If you're stuck:

  1. Check existing patterns for similar examples
  2. Run pytest test_patterns.py -v to see all test cases
  3. Use regex101.com to test your pattern
  4. Check the test files for usage examples

Last Updated: 2025-10-24 See Also: README.md for full documentation