API Canonical Data & Error Tracking Integration Guide
Status: ✅ Implemented Date: 2025-11-07 Author: Claude (CTO)
Overview
This guide documents the new API canonical data system and system-wide error tracking infrastructure. This solves the critical problem of API naming inconsistencies between ESPN and TheSportsDB that were breaking downstream queries.
Problem Statement
The Issue
Events table contains data from both ESPN and TheSportsDB APIs with different naming conventions:
- ESPN: Calls it "football"
- TheSportsDB: Expects "American Football"
- Result: If team inherits "football" from ESPN event, later TheSportsDB queries fail
The Solution
- Canonical Tables: TheSportsDB as source of truth
- Mapping Tables: ESPN-specific names mapped to canonical with confidence scoring
- Normalization: All API data normalized to canonical at write time
- Error Tracking: Automatic GitHub issues for ALL API failures system-wide
Architecture
Database Schema
api_sports_canonical (TheSportsDB source of truth)
├── canonical_name (UNIQUE)
├── thesportsdb_sport_id
├── is_verified
└── discovery_source
api_leagues_canonical (TheSportsDB source of truth)
├── canonical_name (UNIQUE)
├── sport_id (FK → api_sports_canonical)
├── thesportsdb_league_id
├── country, is_active
└── is_verified
espn_sport_mappings (ESPN → Canonical)
├── espn_name (UNIQUE)
├── canonical_sport_id (FK → api_sports_canonical)
├── confidence (0.0 - 1.0)
└── is_verified
espn_league_mappings (ESPN → Canonical)
├── espn_name (UNIQUE)
├── canonical_league_id (FK → api_leagues_canonical)
├── confidence (0.0 - 1.0)
└── is_verified
api_call_failures (System-wide error tracking)
├── api_name, endpoint, error_type
├── request_params, error_message, stack_trace
├── github_issue_number, github_issue_url
├── resolution_status, occurrence_count
└── UNIQUE(api_name, endpoint, error_type, resolution_status)
Data Flow
ESPN Event → ESPN Names → Mapping Tables → Canonical Names → TheSportsDB API ✅
↓
(normalized at write time)
Installation & Setup
Step 1: Run Database Migrations
cd backend/epgoat
# Run migrations 015 and 016
npx supabase db execute epgoat-db --file=infrastructure/backend/epgoat/infrastructure/database/migrations/015_add_api_canonical_sports_leagues.sql
npx supabase db execute epgoat-db --file=infrastructure/backend/epgoat/infrastructure/database/migrations/016_add_espn_mappings_and_api_failures.sql
Step 2: Bootstrap Canonical Data (ONE-TIME)
# Ensure environment variables are set
source .env.supabase
# Run bootstrap script (populates canonical tables from TheSportsDB)
python utilities/bootstrap_canonical_data.py --verbose
# Expected output:
# ✓ Fetched 89 sports from TheSportsDB
# ✓ Inserted 89 sports into api_sports_canonical
# ✓ Fetched 450+ leagues from TheSportsDB
# ✓ Inserted 450+ leagues into api_leagues_canonical
Step 3: Configure GitHub Integration
# Add to .env or GitHub Secrets
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxx"
export GITHUB_REPO="user/epgoat-internal"
Step 4: Set Up Weekly Sync (Automated)
The GitHub Action .github/workflows/sync-canonical-data.yml runs automatically every Monday at 00:00 UTC.
Manual trigger:
# From GitHub Actions UI, or locally:
python utilities/sync_canonical_data.py --create-issues --verbose
Usage
Wrapping API Calls with Error Tracking
Option 1: Decorator (Recommended)
from infrastructure.error_handling import APIErrorHandler, track_api_call
from supabase import create_client
# Initialize error handler (once at startup)
db = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)
error_handler = APIErrorHandler(
db=db,
github_token=os.getenv("GITHUB_TOKEN"),
github_repo=os.getenv("GITHUB_REPO"),
environment="production",
)
# Wrap API calls
@track_api_call(error_handler, api_name="thesportsdb", endpoint="/search_teams.php")
def search_teams(team_name: str) -> List[Dict]:
"""Search teams via TheSportsDB API."""
response = requests.get(
f"https://www.thesportsdb.com/api/v1/json/{API_KEY}/searchteams.php",
params={"t": team_name},
timeout=10,
)
response.raise_for_status()
return response.json().get("teams", [])
# Usage
try:
teams = search_teams("Lakers")
except Exception as e:
# Error automatically logged to database
# GitHub issue automatically created
# Original exception re-raised
logger.error(f"Failed to search teams: {e}")
Option 2: Manual Logging
from infrastructure.error_handling import APIErrorHandler
error_handler = APIErrorHandler(db=db)
try:
response = requests.get("https://api.example.com/data")
response.raise_for_status()
except Exception as e:
# Manually log failure
error_handler.log_failure(
api_name="example_api",
endpoint="/data",
error=e,
request_params={"param": "value"},
http_method="GET",
)
raise # Re-raise to preserve original behavior
Normalizing ESPN Data to Canonical
from supabase import create_client
db = create_client(SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY)
def normalize_espn_sport(espn_sport_name: str) -> str:
"""Normalize ESPN sport name to canonical.
Args:
espn_sport_name: ESPN sport name (e.g., "football")
Returns:
Canonical sport name (e.g., "American Football")
"""
# Lookup ESPN mapping
result = (
db.table("espn_sport_mappings")
.select("canonical_sport_id")
.eq("espn_name", espn_sport_name)
.limit(1)
.execute()
)
if not result.data:
logger.warning(f"No mapping found for ESPN sport: {espn_sport_name}")
return espn_sport_name # Fallback to original
canonical_sport_id = result.data[0]["canonical_sport_id"]
# Fetch canonical name
canonical_result = (
db.table("api_sports_canonical")
.select("canonical_name")
.eq("id", canonical_sport_id)
.limit(1)
.execute()
)
return canonical_result.data[0]["canonical_name"]
# Usage
espn_sport = "football" # From ESPN API
canonical_sport = normalize_espn_sport(espn_sport)
print(canonical_sport) # "American Football"
# Now use canonical_sport for TheSportsDB queries
teams = thesportsdb_client.search_teams_by_sport(canonical_sport)
Querying Canonical Data
# Get all sports
sports = db.table("api_sports_canonical").select("*").execute()
# Get leagues for a specific sport
basketball_sport = db.table("api_sports_canonical").select("id").eq("canonical_name", "Basketball").single().execute()
basketball_leagues = (
db.table("api_leagues_canonical")
.select("*")
.eq("sport_id", basketball_sport.data["id"])
.eq("is_active", True)
.execute()
)
# Get ESPN mapping for a sport
espn_mapping = (
db.table("espn_sport_mappings")
.select("*, canonical_sport:canonical_sport_id(*)")
.eq("espn_name", "football")
.single()
.execute()
)
print(f"ESPN 'football' maps to: {espn_mapping.data['canonical_sport']['canonical_name']}")
Monitoring
Check API Failures
-- Open failures
SELECT api_name, endpoint, error_type, occurrence_count, github_issue_number
FROM api_call_failures
WHERE resolution_status = 'open'
ORDER BY occurrence_count DESC, last_occurred_at DESC;
-- Most frequent failures
SELECT api_name, COUNT(*) as failure_count
FROM api_call_failures
GROUP BY api_name
ORDER BY failure_count DESC;
-- Recent failures (last 7 days)
SELECT *
FROM api_call_failures
WHERE created_at >= NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;
Check Mapping Quality
-- Low-confidence sport mappings
SELECT espn_name, confidence, verified_at
FROM espn_sport_mappings
WHERE confidence < 0.8 OR is_verified = FALSE
ORDER BY confidence ASC;
-- Low-confidence league mappings
SELECT espn_name, confidence, verified_at
FROM espn_league_mappings
WHERE confidence < 0.8 OR is_verified = FALSE
ORDER BY confidence ASC;
-- Unverified canonical data
SELECT canonical_name, discovery_source, discovered_at
FROM api_sports_canonical
WHERE is_verified = FALSE;
SELECT canonical_name, discovery_source, discovered_at
FROM api_leagues_canonical
WHERE is_verified = FALSE;
GitHub Issues
All API failures and canonical data changes automatically create GitHub issues with these labels:
api-failure- API call failedcanonical-data- Canonical data change detectedneeds-investigation- Requires human reviewdata-sync- Weekly sync report{api_name}- Specific API name (e.g.,thesportsdb,espn)
Maintenance
Weekly Sync (Automated)
The GitHub Action runs automatically every Monday at 00:00 UTC. It:
- Fetches current sports/leagues from TheSportsDB
- Compares against canonical tables
- Auto-adds new entries with
is_verified=false - Marks inactive leagues as
is_active=false - Creates GitHub issue summarizing changes
Manual run:
python utilities/sync_canonical_data.py --create-issues --verbose
Verifying Mappings
When low-confidence mappings are discovered:
- Review the GitHub issue created by sync script
- Verify the mapping is correct
- Update the database:
UPDATE espn_sport_mappings
SET confidence = 1.0,
is_verified = TRUE,
verified_at = NOW(),
verified_by = 'your_username',
verification_notes = 'Manually verified against ESPN and TheSportsDB APIs'
WHERE espn_name = 'football';
Resolving API Failures
When GitHub issue is created for API failure:
- Investigate root cause (check issue details)
- Fix the issue (update mappings, fix code, etc.)
- Mark as resolved:
error_handler.mark_resolved(
failure_id=123,
resolution_status="fixed",
resolution_notes="Updated ESPN sport mapping for 'football' → 'American Football'",
fix_version="abc123", # Git SHA
resolved_by="your_username",
)
Or via SQL:
UPDATE api_call_failures
SET resolution_status = 'fixed',
resolved_at = NOW(),
resolved_by = 'your_username',
resolution_notes = 'Fixed ESPN sport mapping',
fix_version = 'abc123'
WHERE id = 123;
Best Practices
1. Always Normalize at Write Time
❌ Bad: Store ESPN names directly
event = {"sport": "football", "league": "NFL"}
db.table("events").insert(event).execute()
✅ Good: Normalize before writing
canonical_sport = normalize_espn_sport("football") # "American Football"
canonical_league = normalize_espn_league("NFL") # "NFL" (already canonical)
event = {"sport": canonical_sport, "league": canonical_league}
db.table("events").insert(event).execute()
2. Wrap All API Calls
❌ Bad: Naked API call
def search_teams(name: str):
return requests.get(f"https://api.example.com/search?t={name}").json()
✅ Good: Wrapped with error tracking
@track_api_call(error_handler, api_name="example_api", endpoint="/search")
def search_teams(name: str):
response = requests.get(f"https://api.example.com/search?t={name}")
response.raise_for_status()
return response.json()
3. Review GitHub Issues Regularly
- Check for new
api-failureissues weekly - Review low-confidence mappings (confidence < 0.8)
- Verify and update canonical data from sync reports
4. Test Mappings Before Production
# Test ESPN → Canonical mapping
espn_sport = "football"
canonical_sport = normalize_espn_sport(espn_sport)
# Verify TheSportsDB accepts canonical name
teams = thesportsdb_client.search_teams_by_sport(canonical_sport)
assert len(teams) > 0, f"No teams found for {canonical_sport}"
Troubleshooting
Migration Fails
Error: relation "api_sports_canonical" already exists
Solution: Tables already exist, skip migration or drop tables first:
DROP TABLE IF EXISTS api_call_failures;
DROP TABLE IF EXISTS espn_league_mappings;
DROP TABLE IF EXISTS espn_sport_mappings;
DROP TABLE IF EXISTS api_leagues_canonical;
DROP TABLE IF EXISTS api_sports_canonical;
Bootstrap Fails with "No data returned"
Error: TheSportsDB API returns empty response
Solution:
1. Check API key is valid
2. Verify TheSportsDB API is operational
3. Try with free tier key "3": --api-key 3
GitHub Issues Not Created
Error: API failures logged but no GitHub issues
Solution:
1. Verify GITHUB_TOKEN is set and valid
2. Verify GITHUB_REPO format is owner/repo
3. Check token has repo scope permissions
4. Enable issue creation: create_issues=True in handler init
ESPN Mapping Not Found
Error: No mapping found for ESPN sport: football
Solution:
1. Run discovery script: python utilities/discover_espn_mappings.py (TO BE CREATED)
2. Manually add mapping:
INSERT INTO espn_sport_mappings (espn_name, canonical_sport_id, confidence, mapping_method)
SELECT 'football', id, 1.0, 'manual'
FROM api_sports_canonical
WHERE canonical_name = 'American Football';
Files Created
Database Migrations
infrastructure/backend/epgoat/infrastructure/database/migrations/015_add_api_canonical_sports_leagues.sqlinfrastructure/backend/epgoat/infrastructure/database/migrations/016_add_espn_mappings_and_api_failures.sql
Utilities
utilities/bootstrap_canonical_data.py- One-time canonical data bootstraputilities/sync_canonical_data.py- Weekly sync to detect changes
Infrastructure
infrastructure/error_handling/__init__.pyinfrastructure/error_handling/api_error_handler.py- Error handler classinfrastructure/error_handling/api_decorators.py- @track_api_call decorator
CI/CD
.github/workflows/sync-canonical-data.yml- Weekly automated sync
Documentation
backend/epgoat/docs/API-CANONICAL-INTEGRATION-GUIDE.md(this file)
Next Steps
- ✅ Migrations Applied: Run migrations 015 and 016
- ✅ Bootstrap Complete: Run
bootstrap_canonical_data.py - ⏳ Create ESPN Discovery Script: Auto-discover ESPN mappings from events table
- ⏳ Integrate Error Handler: Add
@track_api_callto existing API calls - ⏳ Write Tests: Unit tests for bootstrap, sync, error handler
- ⏳ Update Documentation: Add to main architecture docs
Support
For questions or issues:
1. Check this guide for troubleshooting steps
2. Review GitHub issues with labels api-failure, canonical-data
3. Check database for error details: SELECT * FROM api_call_failures
4. Contact CTO (Claude) for architectural questions
Last Updated: 2025-11-07 Version: 1.0.0 Status: ✅ Production Ready