berriai/litellm

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

43,933 stars Python 7 components

Routes 100+ LLM API calls through a unified gateway with cost tracking and access controls

Requests enter through the proxy server's HTTP endpoints, get authenticated against the database, then pass through the router which selects an available LLM provider. The core completion function transforms the request to the provider's format, makes the API call, normalizes the response back to OpenAI format, and returns it through middleware that handles logging, caching, and cost tracking. Enterprise hooks can intercept at multiple points for content moderation and compliance.

Under the hood, the system uses 3 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.

A 7-component repository. 5169 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Requests enter through the proxy server's HTTP endpoints, get authenticated against the database, then pass through the router which selects an available LLM provider. The core completion function transforms the request to the provider's format, makes the API call, normalizes the response back to OpenAI format, and returns it through middleware that handles logging, caching, and cost tracking. Enterprise hooks can intercept at multiple points for content moderation and compliance.

  1. HTTP request ingestion — FastAPI proxy server receives OpenAI-compatible requests at /chat/completions and other endpoints, parsing JSON body into request objects
  2. Authentication and authorization — Proxy extracts API key from Authorization header, queries PrismaClient to validate key and load UserAPIKeyAuth with permissions and budget limits [ChatCompletionRequest → UserAPIKeyAuth] (config: general_settings.master_key)
  3. Router model selection — Router examines model field in request, applies routing strategy (round-robin, least-latency, etc.) to select from available deployments in model_list [ChatCompletionRequest → selected provider config] (config: model_list, litellm_settings.routing_strategy)
  4. Provider API transformation — LLMProvider subclass converts OpenAI-format request to provider's native format, handling authentication headers and parameter mapping [ChatCompletionRequest → provider-specific request] (config: litellm_settings.drop_params)
  5. LLM API call execution — HTTP client makes actual API call to selected LLM provider (OpenAI, Anthropic, etc.) with transformed request [provider-specific request → provider-specific response] (config: litellm_settings.request_timeout, litellm_settings.num_retries)
  6. Response normalization — Provider adapter converts response back to OpenAI ModelResponse format, standardizing field names and structures across all providers [provider-specific response → ModelResponse]
  7. Apply response middleware — CustomLogger callbacks process the response for cost tracking, usage logging, caching, and enterprise guardrails before returning to client [ModelResponse → logged ModelResponse] (config: litellm_settings.success_callback, litellm_settings.cache)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ChatCompletionRequest litellm/types/utils.py
OpenAI-compatible dict with model: str, messages: List[dict], temperature: float, max_tokens: int, plus provider-specific parameters
Created from incoming HTTP request, normalized to OpenAI format, then transformed to provider-specific format before API call
ModelResponse litellm/types/utils.py
Standardized response with choices: List[Choice], usage: Usage dict, model: str, created: int timestamp
Generated from provider API response, normalized to OpenAI format, then passed through logging and caching middleware
UserAPIKeyAuth litellm/proxy/_types.py
Pydantic model with api_key: str, user_id: str, team_id: str, permissions: dict, budget limits and usage tracking fields
Loaded from database during auth, cached in memory, updated with usage tracking after each request
RouterConfig litellm/router_utils/router_config.py
Dict with model_list: List[ModelConfig], routing_strategy: str, fallback_models: List[str], retry_policy: dict
Loaded from YAML config file at startup, parsed into router data structures for request routing decisions
ProxyConfig litellm/proxy/_types.py
Configuration object with model_list, general_settings, litellm_settings, and environment-specific parameters from YAML files
Parsed from proxy_server_config.yaml at startup, drives all proxy server behavior and feature enablement

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract unguarded

The core completion function assumes all provider-specific LLMProvider classes implement the same interface for request transformation and response normalization, but there's no abstract base class or validation to enforce this contract

If this fails: When new providers are added with missing or incorrectly named methods, requests silently fail with AttributeError or return malformed responses that break downstream consumers

litellm/main.py:completion
critical Shape weakly guarded

Router assumes model_list configuration contains deployments with consistent structure (model_name, litellm_params, etc.) but only validates top-level dict existence, not required nested fields

If this fails: Missing required fields like api_key or api_base in deployment configs cause KeyError crashes during actual API calls, not during configuration validation

litellm/router.py:Router
critical Domain unguarded

Authentication middleware assumes API keys in Authorization header follow 'Bearer sk-...' or 'sk-...' format but doesn't validate the actual key structure or length before database queries

If this fails: Malformed API keys cause expensive database scans or SQL injection vulnerabilities if the key contains special characters that aren't properly escaped

litellm/proxy/proxy_server.py:authentication
critical Temporal unguarded

Cache system assumes cached ModelResponse objects remain valid and compatible with current response schema, but doesn't version cache entries or validate schema on retrieval

If this fails: When ModelResponse structure changes between versions, clients receive cached responses with missing or wrongly-typed fields, causing silent data corruption

litellm/caching/caching.py:DualCache
warning Resource unguarded

Model health tracking assumes in-memory success/failure counters won't overflow or consume unbounded memory, but doesn't implement cleanup for inactive models or cap the number of tracked deployments

If this fails: Long-running proxy servers with many model deployments experience memory leaks as health metrics accumulate indefinitely, eventually causing OOM crashes

litellm/router.py:health_tracking
warning Ordering unguarded

Database operations assume UserAPIKeyAuth records are updated atomically for usage tracking, but concurrent requests can race to update the same user's budget/usage counters

If this fails: Multiple simultaneous requests from the same API key can cause budget enforcement to fail, allowing users to exceed spending limits until the next database sync

litellm/proxy/utils.py:PrismaClient
warning Scale unguarded

Request transformation assumes message content and parameters fit within reasonable size limits, but doesn't validate total request payload size before sending to LLM providers

If this fails: Extremely large requests (multi-MB prompts) get sent to providers that reject them with cryptic errors, wasting API quota and causing confusing timeouts

litellm/main.py:completion
warning Environment unguarded

Provider-specific classes assume environment variables and API keys are available when making requests, but don't validate credentials are valid or have sufficient permissions until the actual API call

If this fails: Invalid or expired provider API keys cause authentication failures that surface as generic HTTP 401/403 errors, making it hard to diagnose which specific provider credential is broken

litellm/llms/*/LLMProvider
warning Contract unguarded

Success callbacks in CustomLogger assume the ModelResponse object passed to them is complete and immutable, but there's no enforcement preventing callbacks from modifying the response object

If this fails: Poorly written logging callbacks can accidentally modify response data, causing later callbacks or the final client response to contain corrupted usage metrics or response content

litellm/integrations/custom_logger.py:CustomLogger
warning Temporal unguarded

Failover logic assumes fallback models in the configuration are always available and healthy when primary models fail, but doesn't validate fallback model health before attempting the retry

If this fails: When primary and all fallback models are simultaneously unhealthy, requests fail with the last fallback's error message instead of a clear 'all models unavailable' error

litellm/router.py:fallback_models

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Response cache (cache)
DualCache stores LLM responses with TTL to avoid repeat API calls for identical requests
User database (database)
PostgreSQL/SQLite stores API keys, user permissions, team configurations, and usage metrics
Model health tracking (in-memory)
Router maintains success/failure rates and latency metrics for each model deployment
Configuration state (in-memory)
Loaded YAML config drives all proxy behavior including model routing and feature flags

Feedback Loops

Delays

Control Points

Technology Stack

FastAPI (framework)
HTTP server framework for the proxy gateway, handling OpenAI-compatible REST endpoints
Prisma (database)
Database ORM for user management, API key storage, and usage tracking in the proxy server
Redis (database)
L2 cache backend in DualCache system for storing LLM responses and configuration data
httpx (library)
Async HTTP client for making API calls to 100+ LLM providers with retry and timeout handling
Pydantic (library)
Data validation and serialization for request/response models and configuration schemas
Docker (infra)
Containerization for proxy server deployment with hardened security configurations
PostgreSQL (database)
Primary database backend for multi-tenant proxy deployments with full ACID compliance

Key Components

Package Structure

litellm-core (app)
Core SDK and proxy server implementation that handles LLM API unification, routing, and middleware.
enterprise (library)
Enterprise-specific hooks and guardrails for advanced security and compliance features.
litellm-proxy-extras (tooling)
Database utilities and deployment helpers for the proxy server infrastructure.

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare litellm

Related Repository Repositories

Frequently Asked Questions

What is litellm used for?

Routes 100+ LLM API calls through a unified gateway with cost tracking and access controls berriai/litellm is a 7-component repository written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 5169 files.

How is litellm architected?

litellm is organized into 4 architecture layers: LLM Interface Layer, Proxy Gateway Layer, Router & Load Balancer, Enterprise Security. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through litellm?

Data moves through 7 stages: HTTP request ingestion → Authentication and authorization → Router model selection → Provider API transformation → LLM API call execution → .... Requests enter through the proxy server's HTTP endpoints, get authenticated against the database, then pass through the router which selects an available LLM provider. The core completion function transforms the request to the provider's format, makes the API call, normalizes the response back to OpenAI format, and returns it through middleware that handles logging, caching, and cost tracking. Enterprise hooks can intercept at multiple points for content moderation and compliance. This pipeline design reflects a complex multi-stage processing system.

What technologies does litellm use?

The core stack includes FastAPI (HTTP server framework for the proxy gateway, handling OpenAI-compatible REST endpoints), Prisma (Database ORM for user management, API key storage, and usage tracking in the proxy server), Redis (L2 cache backend in DualCache system for storing LLM responses and configuration data), httpx (Async HTTP client for making API calls to 100+ LLM providers with retry and timeout handling), Pydantic (Data validation and serialization for request/response models and configuration schemas), Docker (Containerization for proxy server deployment with hardened security configurations), and 1 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does litellm have?

litellm exhibits 4 data pools (Response cache, User database), 3 feedback loops, 5 control points, 3 delays. The feedback loops handle self-correction and retry. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does litellm use?

4 design patterns detected: Provider Adapter Pattern, Plugin Hook System, Multi-tier Caching, Config-driven Architecture.

How does litellm compare to alternatives?

CodeSea has side-by-side architecture comparisons of litellm with vllm. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by .