nvidia/earth2studio
Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
AI weather/climate modeling framework with pre-trained models and workflows
Weather data flows from external sources through standardized interfaces, gets processed by AI models, and outputs to various storage backends
Under the hood, the system uses 2 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.
A 10-component weather climate with 6 connections. 341 files analyzed. Data flows through 5 distinct pipeline stages.
How Data Flows Through the System
Weather data flows from external sources through standardized interfaces, gets processed by AI models, and outputs to various storage backends
- Data Ingestion — Fetch weather data from sources like GFS, IFS, or satellite feeds using async caching
- Preprocessing — Transform data to model-expected format using lexicon mappings and coordinate systems
- Model Inference — Run AI weather models (FCN3, GraphCast, AIFS) to generate forecasts
- Postprocessing — Apply perturbations, statistics, or ensemble aggregation to model outputs
- Output Storage — Save results to Zarr, NetCDF, or other formats for analysis and visualization
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Downloaded weather data files cached locally to avoid repeated downloads
Pending and active inference jobs managed by RQ
Pre-trained model weights downloaded from HuggingFace Hub
Feedback Loops
- Cache Hit Check (cache-invalidation, balancing) — Trigger: Data request. Action: Check local cache before downloading. Exit: Cache hit or successful download.
- Job Retry Logic (retry, balancing) — Trigger: Inference job failure. Action: Requeue failed jobs with exponential backoff. Exit: Max retries reached or success.
Delays
- Model Download (async-processing, ~varies by model size) — First-time model usage requires downloading multi-GB checkpoint files
- Data Fetching (async-processing, ~varies by data size) — Weather data downloads can take minutes for large spatiotemporal domains
- Queue Processing (queue-drain, ~depends on job complexity) — Inference jobs wait in Redis queue until worker capacity available
Control Points
- Cache Directory (env-var) — Controls: Location for storing downloaded data files. Default: $HOME/.cache/earth2studio
- Worker Count (env-var) — Controls: Number of concurrent inference workers. Default: null
- Model Repository (runtime-toggle) — Controls: Whether to use HuggingFace Hub or local model files. Default: null
Technology Stack
Deep learning framework for model inference
N-dimensional labeled arrays for weather data
REST API framework for production deployment
Job queuing and caching for scalable inference
Model repository and automatic downloading
Chunked array storage for large weather datasets
Configuration management for complex workflows
Metrics collection for production monitoring
Key Components
- DataSource (type-def) — Protocol defining interface for weather data sources
earth2studio/data/base.py - GFS (class) — Fetches GFS forecast data from NOAA with async caching
earth2studio/data/gfs.py - FCN3 (class) — NVIDIA FourCastNet3 model wrapper for weather prediction
earth2studio/models/px.py - ZarrBackend (class) — Saves forecast outputs to Zarr format for analysis
earth2studio/io/ - deterministic (function) — High-level runner for deterministic weather forecasts
earth2studio/run/ - server_main (service) — FastAPI server providing REST endpoints for workflow execution
earth2studio/serve/server/main.py - workflow_registry (registry) — Registry of available inference workflows for REST API
earth2studio/serve/server/workflow.py - run_inference (function) — Executes hurricane ensemble forecasting with cyclone tracking
recipes/hens/src/hens_run.py - Lexicon (module) — Variable name mappings between different weather data formats
earth2studio/lexicon/ - AutoModelMixin (class) — Base mixin for automatic model loading from HuggingFace Hub
earth2studio/models/auto/mixin.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare earth2studio
Related Weather Climate Repositories
Frequently Asked Questions
What is earth2studio used for?
AI weather/climate modeling framework with pre-trained models and workflows nvidia/earth2studio is a 10-component weather climate written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 341 files.
How is earth2studio architected?
earth2studio is organized into 5 architecture layers: Data Layer, Model Layer, Workflow Layer, I/O Layer, and 1 more. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through earth2studio?
Data moves through 5 stages: Data Ingestion → Preprocessing → Model Inference → Postprocessing → Output Storage. Weather data flows from external sources through standardized interfaces, gets processed by AI models, and outputs to various storage backends This pipeline design reflects a complex multi-stage processing system.
What technologies does earth2studio use?
The core stack includes PyTorch (Deep learning framework for model inference), Xarray (N-dimensional labeled arrays for weather data), FastAPI (REST API framework for production deployment), Redis (Job queuing and caching for scalable inference), HuggingFace Hub (Model repository and automatic downloading), Zarr (Chunked array storage for large weather datasets), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does earth2studio have?
earth2studio exhibits 3 data pools (Local Cache, Redis Job Queue), 2 feedback loops, 3 control points, 3 delays. The feedback loops handle cache-invalidation and retry. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does earth2studio use?
5 design patterns detected: Protocol-based Interfaces, Lexicon Translation, Async Caching, AutoModel Pattern, Hydra Configuration.
How does earth2studio compare to alternatives?
CodeSea has side-by-side architecture comparisons of earth2studio with graphcast. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on March 25, 2026 by CodeSea. Written by Karolina Sarna.