Skip to main content

Data Freshness

Model Graph isn't a static list. A background worker runs daily automated ingestion from multiple sources to keep the registry current. This page explains the ingestion tiers, update frequency, and how conflicts between sources are resolved.

Ingestion Tiers

Data flows into the registry from four tiers of sources, each with different authority levels and refresh frequencies.

Tier 1: Provider Model Listing APIs

The most authoritative source. Each major provider exposes a model listing API that the worker polls daily:

ProviderEndpointData Available
OpenAIGET /v1/modelsModel IDs, creation timestamps
AnthropicGET /v1/modelsModel IDs, display names, creation timestamps
GoogleGET /v1beta/modelsModel names, token limits, supported actions
MistralGET /v1/modelsModel IDs, deprecation dates, replacement models, aliases, context lengths
CohereGET /v1/modelsModel names, deprecation boolean, context lengths
xAIGET /v1/modelsModel IDs (OpenAI-compatible format)
tip

Mistral's API is the richest — it's the only major provider that exposes deprecation dates, replacement models, and aliases directly in its model listing API.

Tier 2: Deprecation Data

deprecations.info is a community-maintained, daily-scraped aggregation of deprecation announcements from all major providers. It provides:

  • announcement_date → mapped to deprecation_date
  • shutdown_date → mapped to sunset_date
  • replacement_models → mapped to successor_model_id

This fills the deprecation gap for providers whose APIs don't include deprecation information (which is most of them).

Tier 3: Hugging Face Hub

For open-source models, the worker syncs metadata from the Hugging Face Hub for well-known model organizations:

  • Meta Llama (meta-llama/*)
  • Mistral AI (mistralai/*)
  • Cohere (CohereForAI/*)
  • Google (google/*)
  • Qwen (Qwen/*)
  • DeepSeek (deepseek-ai/*)

Data extracted includes model IDs, creation dates, tags, and parameter counts (parsed from model names or config.json). Each model gets a canonical_url pointing to its Hugging Face page.

Tier 4: Manual Curation

An initial seed migration populates the database with known providers, families, models, and aliases. Admin API endpoints allow manual corrections for edge cases — adding missing aliases, fixing dates, setting successor links, or adjusting statuses.

Update Frequency

SourceFrequencyTrigger
Provider APIs (Tier 1)Daily at 02:00 UTCAutomated cron job
deprecations.info (Tier 2)Daily at 02:00 UTCAutomated cron job
Hugging Face Hub (Tier 3)Daily at 02:00 UTCAutomated cron job
Manual curation (Tier 4)As neededAdmin API call

All automated sources run in a single daily ingestion cycle. Each source runs independently — if one provider's API is down, the rest still complete successfully.

Conflict Resolution

When multiple sources provide data about the same model, conflicts are resolved using a priority hierarchy:

  1. Manual curation (highest priority) — admin corrections override everything
  2. Provider APIs — authoritative for their own models (release dates, names, context windows)
  3. deprecations.info — supplements with deprecation/sunset dates; never overwrites provider-sourced data
  4. Hugging Face — authoritative for open-source metadata (canonical URLs, parameter counts)

Key Rules

  • Deprecation data from Tier 2 never overwrites release dates or other metadata from Tier 1
  • Every ingestion run is idempotent — upserts on natural keys (slug for models, alias for aliases)
  • Every run is logged in the ingestion_runs table with counts and error details
  • Individual ingester failures don't affect other ingesters — partial success is better than total failure

Monitoring Ingestion

Admins can monitor ingestion health via the admin endpoints:

# View recent ingestion runs
curl -H "Authorization: Bearer $API_KEY" \
https://api.modelgraph.ai/api/v1/admin/ingestion-runs

# Manually trigger a refresh for a specific source
curl -X POST -H "Authorization: Bearer $API_KEY" \
https://api.modelgraph.ai/api/v1/admin/ingest/openai-api

Each run reports models_added, models_updated, and any error_message for debugging.