openai/openai-python
The official Python library for the OpenAI API
Provides typed Python bindings for sending HTTP requests to OpenAI's REST API
Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests.
Under the hood, the system uses 2 feedback loops, 2 data pools, 4 control points to manage its runtime behavior.
A 9-component library. 1227 files analyzed. Data flows through 5 distinct pipeline stages.
How Data Flows Through the System
Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests.
- Initialize client with authentication — OpenAI() constructor processes api_key, azure_endpoint, or workload_identity config, creates httpx.Client with base URL, headers, and timeout settings [Client configuration → OpenAI client instance]
- Create API request — Resource methods like client.chat.completions.create() validate parameters against Pydantic models, transform to FinalRequestOptions with URL path, HTTP method, headers [ChatCompletionMessage → FinalRequestOptions]
- Execute HTTP request — BaseClient._request() adds authentication headers, handles retries with exponential backoff, sends HTTP request via httpx, validates response status [FinalRequestOptions → Raw HTTP response]
- Parse response — Response JSON is deserialized into Pydantic models like ChatCompletion, with parse_chat_completion() extracting structured outputs when response_format is specified [Raw HTTP response → ParsedChatCompletion]
- Stream processing — For streaming requests, ChatCompletionStream processes SSE chunks, accumulates deltas using accumulate_delta(), emits ContentDeltaEvent and ContentDoneEvent [ChatCompletionChunk → ChatCompletionStreamEvent]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
src/openai/types/chat/chat_completion_message.pyPydantic model with role: str, content: str|None, name: Optional[str], tool_calls: Optional[List[ChatCompletionMessageToolCall]]
Created from user input, sent in API request payload, returned in completion response, parsed back to typed object
src/openai/types/chat/chat_completion_chunk.pyPydantic model with id: str, object: Literal['chat.completion.chunk'], choices: List[Choice], created: int - represents streaming delta
Generated by OpenAI API in streaming mode, parsed from SSE events, accumulated into final completion
src/openai/types/shared_params/function_definition.pyPydantic model with name: str, description: Optional[str], parameters: object (JSON schema dict)
Created from Pydantic models via pydantic_function_tool, serialized as JSON schema, sent to API
src/openai/types/chat/parsed_chat_completion.pyGeneric[ResponseFormatT] extending ChatCompletion with parsed: Optional[ResponseFormatT] field
Created by parsing library from raw ChatCompletion, adds typed parsed field, returned to user
src/openai/auth.pyDict with client_id: str, identity_provider_id: str, token_provider: Optional[Callable] for cloud auth
Configured by user for cloud environments, used by BaseClient to obtain short-lived tokens
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
JSON schemas generated from Pydantic models will be valid under OpenAI's 'strict' mode constraints, which requires specific property patterns and forbids certain schema constructs
If this fails: If a Pydantic model uses features incompatible with OpenAI's strict mode (like anyOf, oneOf, or flexible typing), the API request silently fails or returns unexpected structured outputs
src/openai/lib/_pydantic.py:to_strict_json_schema
Streaming delta chunks arrive in the correct order and contain compatible field types that can be merged without conflict
If this fails: If chunks arrive out of order or contain incompatible data types (e.g., string vs list), delta accumulation produces corrupted final objects with mixed or missing content
src/openai/lib/streaming/_deltas.py:accumulate_delta
When response_format parameter contains a Pydantic model, the API response will contain valid JSON matching that model's schema in the content field
If this fails: If OpenAI returns malformed JSON or JSON that doesn't match the expected schema, parsing fails with cryptic validation errors instead of graceful fallback to raw text
src/openai/lib/_parsing/_completions.py:parse_chat_completion
The hardcoded set of deployment endpoints remains synchronized with Azure OpenAI's actual supported deployment patterns
If this fails: If Azure adds new deployment endpoints or changes routing patterns, requests to new endpoints bypass deployment-based URL rewriting and fail with 404 errors
src/openai/lib/azure.py:_deployments_endpoints
Azure AD token providers return valid, non-expired tokens synchronously without network timeouts or credential failures
If this fails: If token provider blocks, times out, or returns expired tokens, all API requests hang or fail with authentication errors that don't clearly indicate the token source
src/openai/lib/azure.py:AzureADTokenProvider
Server-Sent Events arrive as complete, parseable JSON chunks terminated by proper [DONE] markers
If this fails: If the stream contains partial JSON, malformed chunks, or missing [DONE] markers, the parser hangs indefinitely or crashes with JSON decode errors
src/openai/lib/streaming/chat/_completions.py:ChatCompletionStream
Streaming assistant responses contain manageable numbers of events that fit in memory during processing
If this fails: For very long assistant runs with thousands of tool calls or messages, the event handler accumulates unbounded state leading to memory exhaustion
src/openai/lib/streaming/_assistants.py:AssistantEventHandler
Pydantic models used as function tools contain only JSON-serializable field types and avoid circular references
If this fails: Models with non-serializable fields (datetime, custom objects) or circular references cause JSON schema generation to fail or produce invalid tool definitions that the API rejects
src/openai/lib/_tools.py:pydantic_function_tool
The generic ResponseFormatT parameter maintains type safety throughout the parsing pipeline without runtime type checking
If this fails: Type mismatches between expected and actual parsed responses pass static analysis but cause runtime failures when accessing typed fields that don't exist
src/openai/lib/_parsing/_completions.py:ResponseFormatT
JSON parsing via jiter can handle arbitrarily large streaming chunks without memory limits or parsing timeouts
If this fails: Extremely large API responses (e.g., massive function call arguments) cause JSON parsing to consume excessive memory or time, blocking the event loop
src/openai/lib/streaming/chat/_completions.py:from_json
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
httpx client maintains connection pool and optional response caching for efficiency
Temporary buffer that progressively builds complete objects from streaming delta chunks
Feedback Loops
- HTTP Retry Loop (retry, balancing) — Trigger: HTTP 429, 5xx errors or network failures. Action: BaseClient waits with exponential backoff (2^attempt seconds) then retries request. Exit: Success response, max retries reached (3), or non-retriable error.
- Streaming Accumulation (accumulation, reinforcing) — Trigger: New SSE chunk received. Action: accumulate_delta() merges new chunk data with existing snapshot using field-specific merge rules. Exit: Stream ends with [DONE] marker or connection closes.
Delays
- Request Timeout (timeout, ~600 seconds default) — HTTP requests are cancelled and raise timeout exception if they exceed configured duration
- Retry Backoff (rate-limit, ~2^attempt seconds) — Exponentially increasing delays between retry attempts to avoid overwhelming the API
- Streaming Buffer (async-processing, ~Variable based on network) — SSE chunks are buffered and parsed asynchronously, creating natural backpressure
Control Points
- API Key Authentication (env-var) — Controls: All API request authentication via OPENAI_API_KEY environment variable. Default: null
- Base URL Override (env-var) — Controls: API endpoint targeting via OPENAI_BASE_URL - enables custom deployments or proxies. Default: https://api.openai.com/v1
- Max Retries (runtime-toggle) — Controls: Number of retry attempts for failed requests before giving up. Default: 3
- Response Format (architecture-switch) — Controls: Enables structured output parsing when Pydantic model is provided to response_format parameter. Default: null
Technology Stack
HTTP client library providing sync/async request capabilities with connection pooling, retries, and timeout handling
Data validation and serialization for all API request/response models, with JSON schema generation for structured outputs
Code generation framework that builds the SDK from OpenAPI specifications, creating typed resource classes and models
Test framework for unit and integration tests across sync/async clients, streaming, and parsing functionality
Fast JSON parsing for high-performance deserialization of API responses and streaming chunks
Key Components
- BaseClient (gateway) — Handles HTTP requests with authentication, retries, response parsing, and error handling for all API calls
src/openai/_base_client.py - OpenAI (orchestrator) — Main synchronous client that exposes all API resource endpoints through typed properties like .chat.completions
src/openai/_client.py - CompletionsResource (adapter) — Provides typed create() method for chat completions with support for streaming and structured outputs
src/openai/resources/chat/completions/completions.py - parse_chat_completion (transformer) — Converts raw ChatCompletion JSON into typed Pydantic models based on response_format parameter
src/openai/lib/_parsing/_completions.py - ChatCompletionStream (processor) — Accumulates streaming chat completion chunks into progressive snapshots with delta events
src/openai/lib/streaming/chat/_completions.py - pydantic_function_tool (encoder) — Converts Pydantic models into OpenAI function tool definitions with JSON schema parameters
src/openai/lib/_tools.py - to_strict_json_schema (encoder) — Generates strict JSON schemas from Pydantic models that conform to OpenAI's structured outputs format
src/openai/lib/_pydantic.py - AzureOpenAI (adapter) — Specialized client for Azure OpenAI with deployment-based URL routing and Azure AD authentication
src/openai/lib/azure.py - AssistantEventHandler (orchestrator) — Event-driven handler for streaming assistant responses with run steps, messages, and tool calls
src/openai/lib/streaming/_assistants.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Library Repositories
Frequently Asked Questions
What is openai-python used for?
Provides typed Python bindings for sending HTTP requests to OpenAI's REST API openai/openai-python is a 9-component library written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 1227 files.
How is openai-python architected?
openai-python is organized into 5 architecture layers: HTTP Foundation, API Resources, Response Parsing, Streaming Abstractions, and 1 more. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through openai-python?
Data moves through 5 stages: Initialize client with authentication → Create API request → Execute HTTP request → Parse response → Stream processing. Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests. This pipeline design reflects a complex multi-stage processing system.
What technologies does openai-python use?
The core stack includes httpx (HTTP client library providing sync/async request capabilities with connection pooling, retries, and timeout handling), Pydantic (Data validation and serialization for all API request/response models, with JSON schema generation for structured outputs), Stainless (Code generation framework that builds the SDK from OpenAPI specifications, creating typed resource classes and models), pytest (Test framework for unit and integration tests across sync/async clients, streaming, and parsing functionality), jiter (Fast JSON parsing for high-performance deserialization of API responses and streaming chunks). A focused set of dependencies that keeps the build manageable.
What system dynamics does openai-python have?
openai-python exhibits 2 data pools (HTTP Response Cache, Delta Accumulator), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle retry and accumulation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does openai-python use?
5 design patterns detected: Resource Composition, Sync/Async Parallel, Streaming Delta Accumulation, Pydantic Schema Generation, Legacy API Deprecation.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.