Data Freshness

Model Graph isn't a static list. A background worker runs daily automated ingestion from multiple sources to keep the registry current. This page explains the ingestion tiers, update frequency, and how conflicts between sources are resolved.

Ingestion Tiers

Data flows into the registry from four tiers of sources, each with different authority levels and refresh frequencies.

Tier 1: Provider Model Listing APIs

The most authoritative source. Each major provider exposes a model listing API that the worker polls daily:

Provider	Endpoint	Data Available
OpenAI	`GET /v1/models`	Model IDs, creation timestamps
Anthropic	`GET /v1/models`	Model IDs, display names, creation timestamps
Google	`GET /v1beta/models`	Model names, token limits, supported actions
Mistral	`GET /v1/models`	Model IDs, deprecation dates, replacement models, aliases, context lengths
Cohere	`GET /v1/models`	Model names, deprecation boolean, context lengths
xAI	`GET /v1/models`	Model IDs (OpenAI-compatible format)

tip

Mistral's API is the richest — it's the only major provider that exposes deprecation dates, replacement models, and aliases directly in its model listing API.

Tier 2: Deprecation Data

deprecations.info is a community-maintained, daily-scraped aggregation of deprecation announcements from all major providers. It provides:

announcement_date → mapped to deprecation_date
shutdown_date → mapped to sunset_date
replacement_models → mapped to successor_model_id

This fills the deprecation gap for providers whose APIs don't include deprecation information (which is most of them).

Tier 3: Hugging Face Hub

For open-source models, the worker syncs metadata from the Hugging Face Hub for well-known model organizations:

Meta Llama (meta-llama/*)
Mistral AI (mistralai/*)
Cohere (CohereForAI/*)
Google (google/*)
Qwen (Qwen/*)
DeepSeek (deepseek-ai/*)

Data extracted includes model IDs, creation dates, tags, and parameter counts (parsed from model names or config.json). Each model gets a canonical_url pointing to its Hugging Face page.

Tier 4: Manual Curation

An initial seed migration populates the database with known providers, families, models, and aliases. Admin API endpoints allow manual corrections for edge cases — adding missing aliases, fixing dates, setting successor links, or adjusting statuses.

Update Frequency

Source	Frequency	Trigger
Provider APIs (Tier 1)	Daily at 02:00 UTC	Automated cron job
deprecations.info (Tier 2)	Daily at 02:00 UTC	Automated cron job
Hugging Face Hub (Tier 3)	Daily at 02:00 UTC	Automated cron job
Manual curation (Tier 4)	As needed	Admin API call

All automated sources run in a single daily ingestion cycle. Each source runs independently — if one provider's API is down, the rest still complete successfully.

Conflict Resolution

When multiple sources provide data about the same model, conflicts are resolved using a priority hierarchy:

Manual curation (highest priority) — admin corrections override everything
Provider APIs — authoritative for their own models (release dates, names, context windows)
deprecations.info — supplements with deprecation/sunset dates; never overwrites provider-sourced data
Hugging Face — authoritative for open-source metadata (canonical URLs, parameter counts)

Key Rules

Deprecation data from Tier 2 never overwrites release dates or other metadata from Tier 1
Every ingestion run is idempotent — upserts on natural keys (slug for models, alias for aliases)
Every run is logged in the ingestion_runs table with counts and error details
Individual ingester failures don't affect other ingesters — partial success is better than total failure

Monitoring Ingestion

Admins can monitor ingestion health via the admin endpoints:

# View recent ingestion runs
curl -H "Authorization: Bearer $API_KEY" \
  https://api.modelgraph.ai/api/v1/admin/ingestion-runs

# Manually trigger a refresh for a specific source
curl -X POST -H "Authorization: Bearer $API_KEY" \
  https://api.modelgraph.ai/api/v1/admin/ingest/openai-api

Each run reports models_added, models_updated, and any error_message for debugging.

Ingestion Tiers​

Tier 1: Provider Model Listing APIs​

Tier 2: Deprecation Data​

Tier 3: Hugging Face Hub​

Tier 4: Manual Curation​

Update Frequency​

Conflict Resolution​

Key Rules​

Monitoring Ingestion​