Api Canonical Integration Guide

EPGOAT Documentation - User Guides

API Canonical Data & Error Tracking Integration Guide

Status: ✅ Implemented Date: 2025-11-07 Author: Claude (CTO)

Overview

This guide documents the new API canonical data system and system-wide error tracking infrastructure. This solves the critical problem of API naming inconsistencies between ESPN and TheSportsDB that were breaking downstream queries.

Problem Statement

The Issue

Events table contains data from both ESPN and TheSportsDB APIs with different naming conventions:

  • ESPN: Calls it "football"
  • TheSportsDB: Expects "American Football"
  • Result: If team inherits "football" from ESPN event, later TheSportsDB queries fail

The Solution

  1. Canonical Tables: TheSportsDB as source of truth
  2. Mapping Tables: ESPN-specific names mapped to canonical with confidence scoring
  3. Normalization: All API data normalized to canonical at write time
  4. Error Tracking: Automatic GitHub issues for ALL API failures system-wide

Architecture

Database Schema

api_sports_canonical (TheSportsDB source of truth)
├── canonical_name (UNIQUE)
├── thesportsdb_sport_id
├── is_verified
└── discovery_source

api_leagues_canonical (TheSportsDB source of truth)
├── canonical_name (UNIQUE)
├── sport_id (FK → api_sports_canonical)
├── thesportsdb_league_id
├── country, is_active
└── is_verified

espn_sport_mappings (ESPN → Canonical)
├── espn_name (UNIQUE)
├── canonical_sport_id (FK → api_sports_canonical)
├── confidence (0.0 - 1.0)
└── is_verified

espn_league_mappings (ESPN → Canonical)
├── espn_name (UNIQUE)
├── canonical_league_id (FK → api_leagues_canonical)
├── confidence (0.0 - 1.0)
└── is_verified

api_call_failures (System-wide error tracking)
├── api_name, endpoint, error_type
├── request_params, error_message, stack_trace
├── github_issue_number, github_issue_url
├── resolution_status, occurrence_count
└── UNIQUE(api_name, endpoint, error_type, resolution_status)

Data Flow

ESPN Event → ESPN Names → Mapping Tables → Canonical Names → TheSportsDB API ✅
          ↓
     (normalized at write time)

Installation & Setup

Step 1: Run Database Migrations

cd backend/epgoat

# Run migrations 015 and 016
npx supabase db execute epgoat-db --file=infrastructure/backend/epgoat/infrastructure/database/migrations/015_add_api_canonical_sports_leagues.sql
npx supabase db execute epgoat-db --file=infrastructure/backend/epgoat/infrastructure/database/migrations/016_add_espn_mappings_and_api_failures.sql

Step 2: Bootstrap Canonical Data (ONE-TIME)

# Ensure environment variables are set
source .env.supabase

# Run bootstrap script (populates canonical tables from TheSportsDB)
python utilities/bootstrap_canonical_data.py --verbose

# Expected output:
# ✓ Fetched 89 sports from TheSportsDB
# ✓ Inserted 89 sports into api_sports_canonical
# ✓ Fetched 450+ leagues from TheSportsDB
# ✓ Inserted 450+ leagues into api_leagues_canonical

Step 3: Configure GitHub Integration

# Add to .env or GitHub Secrets
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxx"
export GITHUB_REPO="user/epgoat-internal"

Step 4: Set Up Weekly Sync (Automated)

The GitHub Action .github/workflows/sync-canonical-data.yml runs automatically every Monday at 00:00 UTC.

Manual trigger:

# From GitHub Actions UI, or locally:
python utilities/sync_canonical_data.py --create-issues --verbose

Usage

Wrapping API Calls with Error Tracking

from infrastructure.error_handling import APIErrorHandler, track_api_call
from supabase import create_client

# Initialize error handler (once at startup)
db = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)
error_handler = APIErrorHandler(
    db=db,
    github_token=os.getenv("GITHUB_TOKEN"),
    github_repo=os.getenv("GITHUB_REPO"),
    environment="production",
)

# Wrap API calls
@track_api_call(error_handler, api_name="thesportsdb", endpoint="/search_teams.php")
def search_teams(team_name: str) -> List[Dict]:
    """Search teams via TheSportsDB API."""
    response = requests.get(
        f"https://www.thesportsdb.com/api/v1/json/{API_KEY}/searchteams.php",
        params={"t": team_name},
        timeout=10,
    )
    response.raise_for_status()
    return response.json().get("teams", [])

# Usage
try:
    teams = search_teams("Lakers")
except Exception as e:
    # Error automatically logged to database
    # GitHub issue automatically created
    # Original exception re-raised
    logger.error(f"Failed to search teams: {e}")

Option 2: Manual Logging

from infrastructure.error_handling import APIErrorHandler

error_handler = APIErrorHandler(db=db)

try:
    response = requests.get("https://api.example.com/data")
    response.raise_for_status()
except Exception as e:
    # Manually log failure
    error_handler.log_failure(
        api_name="example_api",
        endpoint="/data",
        error=e,
        request_params={"param": "value"},
        http_method="GET",
    )
    raise  # Re-raise to preserve original behavior

Normalizing ESPN Data to Canonical

from supabase import create_client

db = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)

def normalize_espn_sport(espn_sport_name: str) -> str:
    """Normalize ESPN sport name to canonical.

    Args:
        espn_sport_name: ESPN sport name (e.g., "football")

    Returns:
        Canonical sport name (e.g., "American Football")
    """
    # Lookup ESPN mapping
    result = (
        db.table("espn_sport_mappings")
        .select("canonical_sport_id")
        .eq("espn_name", espn_sport_name)
        .limit(1)
        .execute()
    )

    if not result.data:
        logger.warning(f"No mapping found for ESPN sport: {espn_sport_name}")
        return espn_sport_name  # Fallback to original

    canonical_sport_id = result.data[0]["canonical_sport_id"]

    # Fetch canonical name
    canonical_result = (
        db.table("api_sports_canonical")
        .select("canonical_name")
        .eq("id", canonical_sport_id)
        .limit(1)
        .execute()
    )

    return canonical_result.data[0]["canonical_name"]

# Usage
espn_sport = "football"  # From ESPN API
canonical_sport = normalize_espn_sport(espn_sport)
print(canonical_sport)  # "American Football"

# Now use canonical_sport for TheSportsDB queries
teams = thesportsdb_client.search_teams_by_sport(canonical_sport)

Querying Canonical Data

# Get all sports
sports = db.table("api_sports_canonical").select("*").execute()

# Get leagues for a specific sport
basketball_sport = db.table("api_sports_canonical").select("id").eq("canonical_name", "Basketball").single().execute()
basketball_leagues = (
    db.table("api_leagues_canonical")
    .select("*")
    .eq("sport_id", basketball_sport.data["id"])
    .eq("is_active", True)
    .execute()
)

# Get ESPN mapping for a sport
espn_mapping = (
    db.table("espn_sport_mappings")
    .select("*, canonical_sport:canonical_sport_id(*)")
    .eq("espn_name", "football")
    .single()
    .execute()
)

print(f"ESPN 'football' maps to: {espn_mapping.data['canonical_sport']['canonical_name']}")

Monitoring

Check API Failures

-- Open failures
SELECT api_name, endpoint, error_type, occurrence_count, github_issue_number
FROM api_call_failures
WHERE resolution_status = 'open'
ORDER BY occurrence_count DESC, last_occurred_at DESC;

-- Most frequent failures
SELECT api_name, COUNT(*) as failure_count
FROM api_call_failures
GROUP BY api_name
ORDER BY failure_count DESC;

-- Recent failures (last 7 days)
SELECT *
FROM api_call_failures
WHERE created_at >= NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;

Check Mapping Quality

-- Low-confidence sport mappings
SELECT espn_name, confidence, verified_at
FROM espn_sport_mappings
WHERE confidence < 0.8 OR is_verified = FALSE
ORDER BY confidence ASC;

-- Low-confidence league mappings
SELECT espn_name, confidence, verified_at
FROM espn_league_mappings
WHERE confidence < 0.8 OR is_verified = FALSE
ORDER BY confidence ASC;

-- Unverified canonical data
SELECT canonical_name, discovery_source, discovered_at
FROM api_sports_canonical
WHERE is_verified = FALSE;

SELECT canonical_name, discovery_source, discovered_at
FROM api_leagues_canonical
WHERE is_verified = FALSE;

GitHub Issues

All API failures and canonical data changes automatically create GitHub issues with these labels:

  • api-failure - API call failed
  • canonical-data - Canonical data change detected
  • needs-investigation - Requires human review
  • data-sync - Weekly sync report
  • {api_name} - Specific API name (e.g., thesportsdb, espn)

Maintenance

Weekly Sync (Automated)

The GitHub Action runs automatically every Monday at 00:00 UTC. It:

  1. Fetches current sports/leagues from TheSportsDB
  2. Compares against canonical tables
  3. Auto-adds new entries with is_verified=false
  4. Marks inactive leagues as is_active=false
  5. Creates GitHub issue summarizing changes

Manual run:

python utilities/sync_canonical_data.py --create-issues --verbose

Verifying Mappings

When low-confidence mappings are discovered:

  1. Review the GitHub issue created by sync script
  2. Verify the mapping is correct
  3. Update the database:
UPDATE espn_sport_mappings
SET confidence = 1.0,
    is_verified = TRUE,
    verified_at = NOW(),
    verified_by = 'your_username',
    verification_notes = 'Manually verified against ESPN and TheSportsDB APIs'
WHERE espn_name = 'football';

Resolving API Failures

When GitHub issue is created for API failure:

  1. Investigate root cause (check issue details)
  2. Fix the issue (update mappings, fix code, etc.)
  3. Mark as resolved:
error_handler.mark_resolved(
    failure_id=123,
    resolution_status="fixed",
    resolution_notes="Updated ESPN sport mapping for 'football' → 'American Football'",
    fix_version="abc123",  # Git SHA
    resolved_by="your_username",
)

Or via SQL:

UPDATE api_call_failures
SET resolution_status = 'fixed',
    resolved_at = NOW(),
    resolved_by = 'your_username',
    resolution_notes = 'Fixed ESPN sport mapping',
    fix_version = 'abc123'
WHERE id = 123;

Best Practices

1. Always Normalize at Write Time

Bad: Store ESPN names directly

event = {"sport": "football", "league": "NFL"}
db.table("events").insert(event).execute()

Good: Normalize before writing

canonical_sport = normalize_espn_sport("football")  # "American Football"
canonical_league = normalize_espn_league("NFL")     # "NFL" (already canonical)

event = {"sport": canonical_sport, "league": canonical_league}
db.table("events").insert(event).execute()

2. Wrap All API Calls

Bad: Naked API call

def search_teams(name: str):
    return requests.get(f"https://api.example.com/search?t={name}").json()

Good: Wrapped with error tracking

@track_api_call(error_handler, api_name="example_api", endpoint="/search")
def search_teams(name: str):
    response = requests.get(f"https://api.example.com/search?t={name}")
    response.raise_for_status()
    return response.json()

3. Review GitHub Issues Regularly

  • Check for new api-failure issues weekly
  • Review low-confidence mappings (confidence < 0.8)
  • Verify and update canonical data from sync reports

4. Test Mappings Before Production

# Test ESPN → Canonical mapping
espn_sport = "football"
canonical_sport = normalize_espn_sport(espn_sport)

# Verify TheSportsDB accepts canonical name
teams = thesportsdb_client.search_teams_by_sport(canonical_sport)
assert len(teams) > 0, f"No teams found for {canonical_sport}"

Troubleshooting

Migration Fails

Error: relation "api_sports_canonical" already exists

Solution: Tables already exist, skip migration or drop tables first:

DROP TABLE IF EXISTS api_call_failures;
DROP TABLE IF EXISTS espn_league_mappings;
DROP TABLE IF EXISTS espn_sport_mappings;
DROP TABLE IF EXISTS api_leagues_canonical;
DROP TABLE IF EXISTS api_sports_canonical;

Bootstrap Fails with "No data returned"

Error: TheSportsDB API returns empty response

Solution: 1. Check API key is valid 2. Verify TheSportsDB API is operational 3. Try with free tier key "3": --api-key 3

GitHub Issues Not Created

Error: API failures logged but no GitHub issues

Solution: 1. Verify GITHUB_TOKEN is set and valid 2. Verify GITHUB_REPO format is owner/repo 3. Check token has repo scope permissions 4. Enable issue creation: create_issues=True in handler init

ESPN Mapping Not Found

Error: No mapping found for ESPN sport: football

Solution: 1. Run discovery script: python utilities/discover_espn_mappings.py (TO BE CREATED) 2. Manually add mapping:

INSERT INTO espn_sport_mappings (espn_name, canonical_sport_id, confidence, mapping_method)
SELECT 'football', id, 1.0, 'manual'
FROM api_sports_canonical
WHERE canonical_name = 'American Football';

Files Created

Database Migrations

  • infrastructure/backend/epgoat/infrastructure/database/migrations/015_add_api_canonical_sports_leagues.sql
  • infrastructure/backend/epgoat/infrastructure/database/migrations/016_add_espn_mappings_and_api_failures.sql

Utilities

  • utilities/bootstrap_canonical_data.py - One-time canonical data bootstrap
  • utilities/sync_canonical_data.py - Weekly sync to detect changes

Infrastructure

  • infrastructure/error_handling/__init__.py
  • infrastructure/error_handling/api_error_handler.py - Error handler class
  • infrastructure/error_handling/api_decorators.py - @track_api_call decorator

CI/CD

  • .github/workflows/sync-canonical-data.yml - Weekly automated sync

Documentation

  • backend/epgoat/docs/API-CANONICAL-INTEGRATION-GUIDE.md (this file)

Next Steps

  1. Migrations Applied: Run migrations 015 and 016
  2. Bootstrap Complete: Run bootstrap_canonical_data.py
  3. Create ESPN Discovery Script: Auto-discover ESPN mappings from events table
  4. Integrate Error Handler: Add @track_api_call to existing API calls
  5. Write Tests: Unit tests for bootstrap, sync, error handler
  6. Update Documentation: Add to main architecture docs

Support

For questions or issues: 1. Check this guide for troubleshooting steps 2. Review GitHub issues with labels api-failure, canonical-data 3. Check database for error details: SELECT * FROM api_call_failures 4. Contact CTO (Claude) for architectural questions


Last Updated: 2025-11-07 Version: 1.0.0 Status: ✅ Production Ready