openai/openai-python

The official Python library for the OpenAI API

30,543 stars Python 9 components

Provides typed Python bindings for sending HTTP requests to OpenAI's REST API

Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests.

Under the hood, the system uses 2 feedback loops, 2 data pools, 4 control points to manage its runtime behavior.

A 9-component library. 1227 files analyzed. Data flows through 5 distinct pipeline stages.

How Data Flows Through the System

Initialize client with authentication — OpenAI() constructor processes api_key, azure_endpoint, or workload_identity config, creates httpx.Client with base URL, headers, and timeout settings [Client configuration → OpenAI client instance]
Create API request — Resource methods like client.chat.completions.create() validate parameters against Pydantic models, transform to FinalRequestOptions with URL path, HTTP method, headers [ChatCompletionMessage → FinalRequestOptions]
Execute HTTP request — BaseClient._request() adds authentication headers, handles retries with exponential backoff, sends HTTP request via httpx, validates response status [FinalRequestOptions → Raw HTTP response]
Parse response — Response JSON is deserialized into Pydantic models like ChatCompletion, with parse_chat_completion() extracting structured outputs when response_format is specified [Raw HTTP response → ParsedChatCompletion]
Stream processing — For streaming requests, ChatCompletionStream processes SSE chunks, accumulates deltas using accumulate_delta(), emits ContentDeltaEvent and ContentDoneEvent [ChatCompletionChunk → ChatCompletionStreamEvent]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ChatCompletionMessage src/openai/types/chat/chat_completion_message.py
Pydantic model with role: str, content: str|None, name: Optional[str], tool_calls: Optional[List[ChatCompletionMessageToolCall]]
Created from user input, sent in API request payload, returned in completion response, parsed back to typed object

ChatCompletionChunk src/openai/types/chat/chat_completion_chunk.py
Pydantic model with id: str, object: Literal['chat.completion.chunk'], choices: List[Choice], created: int - represents streaming delta
Generated by OpenAI API in streaming mode, parsed from SSE events, accumulated into final completion

FunctionDefinition src/openai/types/shared_params/function_definition.py
Pydantic model with name: str, description: Optional[str], parameters: object (JSON schema dict)
Created from Pydantic models via pydantic_function_tool, serialized as JSON schema, sent to API

ParsedChatCompletion src/openai/types/chat/parsed_chat_completion.py
Generic[ResponseFormatT] extending ChatCompletion with parsed: Optional[ResponseFormatT] field
Created by parsing library from raw ChatCompletion, adds typed parsed field, returned to user

WorkloadIdentity src/openai/auth.py
Dict with client_id: str, identity_provider_id: str, token_provider: Optional[Callable] for cloud auth
Configured by user for cloud environments, used by BaseClient to obtain short-lived tokens

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Domain weakly guarded

JSON schemas generated from Pydantic models will be valid under OpenAI's 'strict' mode constraints, which requires specific property patterns and forbids certain schema constructs

If this fails: If a Pydantic model uses features incompatible with OpenAI's strict mode (like anyOf, oneOf, or flexible typing), the API request silently fails or returns unexpected structured outputs

src/openai/lib/_pydantic.py:to_strict_json_schema

critical Shape unguarded

Streaming delta chunks arrive in the correct order and contain compatible field types that can be merged without conflict

If this fails: If chunks arrive out of order or contain incompatible data types (e.g., string vs list), delta accumulation produces corrupted final objects with mixed or missing content

src/openai/lib/streaming/_deltas.py:accumulate_delta

critical Contract weakly guarded

When response_format parameter contains a Pydantic model, the API response will contain valid JSON matching that model's schema in the content field

If this fails: If OpenAI returns malformed JSON or JSON that doesn't match the expected schema, parsing fails with cryptic validation errors instead of graceful fallback to raw text

src/openai/lib/_parsing/_completions.py:parse_chat_completion

warning Temporal unguarded

The hardcoded set of deployment endpoints remains synchronized with Azure OpenAI's actual supported deployment patterns

If this fails: If Azure adds new deployment endpoints or changes routing patterns, requests to new endpoints bypass deployment-based URL rewriting and fail with 404 errors

src/openai/lib/azure.py:_deployments_endpoints

critical Environment unguarded

Azure AD token providers return valid, non-expired tokens synchronously without network timeouts or credential failures

If this fails: If token provider blocks, times out, or returns expired tokens, all API requests hang or fail with authentication errors that don't clearly indicate the token source

src/openai/lib/azure.py:AzureADTokenProvider

critical Ordering weakly guarded

Server-Sent Events arrive as complete, parseable JSON chunks terminated by proper [DONE] markers

If this fails: If the stream contains partial JSON, malformed chunks, or missing [DONE] markers, the parser hangs indefinitely or crashes with JSON decode errors

src/openai/lib/streaming/chat/_completions.py:ChatCompletionStream

warning Scale unguarded

Streaming assistant responses contain manageable numbers of events that fit in memory during processing

If this fails: For very long assistant runs with thousands of tool calls or messages, the event handler accumulates unbounded state leading to memory exhaustion

src/openai/lib/streaming/_assistants.py:AssistantEventHandler

warning Domain unguarded

Pydantic models used as function tools contain only JSON-serializable field types and avoid circular references

If this fails: Models with non-serializable fields (datetime, custom objects) or circular references cause JSON schema generation to fail or produce invalid tool definitions that the API rejects

src/openai/lib/_tools.py:pydantic_function_tool

warning Contract weakly guarded

The generic ResponseFormatT parameter maintains type safety throughout the parsing pipeline without runtime type checking

If this fails: Type mismatches between expected and actual parsed responses pass static analysis but cause runtime failures when accessing typed fields that don't exist

src/openai/lib/_parsing/_completions.py:ResponseFormatT

info Resource unguarded

JSON parsing via jiter can handle arbitrarily large streaming chunks without memory limits or parsing timeouts

If this fails: Extremely large API responses (e.g., massive function call arguments) cause JSON parsing to consume excessive memory or time, blocking the event loop

src/openai/lib/streaming/chat/_completions.py:from_json

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

HTTP Response Cache (cache)
httpx client maintains connection pool and optional response caching for efficiency

Delta Accumulator (buffer)
Temporary buffer that progressively builds complete objects from streaming delta chunks

Feedback Loops

HTTP Retry Loop (retry, balancing) — Trigger: HTTP 429, 5xx errors or network failures. Action: BaseClient waits with exponential backoff (2^attempt seconds) then retries request. Exit: Success response, max retries reached (3), or non-retriable error.
Streaming Accumulation (accumulation, reinforcing) — Trigger: New SSE chunk received. Action: accumulate_delta() merges new chunk data with existing snapshot using field-specific merge rules. Exit: Stream ends with [DONE] marker or connection closes.

Delays

Request Timeout (timeout, ~600 seconds default) — HTTP requests are cancelled and raise timeout exception if they exceed configured duration
Retry Backoff (rate-limit, ~2^attempt seconds) — Exponentially increasing delays between retry attempts to avoid overwhelming the API
Streaming Buffer (async-processing, ~Variable based on network) — SSE chunks are buffered and parsed asynchronously, creating natural backpressure

Control Points

API Key Authentication (env-var) — Controls: All API request authentication via OPENAI_API_KEY environment variable. Default: null
Base URL Override (env-var) — Controls: API endpoint targeting via OPENAI_BASE_URL - enables custom deployments or proxies. Default: https://api.openai.com/v1
Max Retries (runtime-toggle) — Controls: Number of retry attempts for failed requests before giving up. Default: 3
Response Format (architecture-switch) — Controls: Enables structured output parsing when Pydantic model is provided to response_format parameter. Default: null

Technology Stack

httpx (library)
HTTP client library providing sync/async request capabilities with connection pooling, retries, and timeout handling

Pydantic (library)
Data validation and serialization for all API request/response models, with JSON schema generation for structured outputs

Stainless (build)
Code generation framework that builds the SDK from OpenAPI specifications, creating typed resource classes and models

pytest (testing)
Test framework for unit and integration tests across sync/async clients, streaming, and parsing functionality

jiter (library)
Fast JSON parsing for high-performance deserialization of API responses and streaming chunks

Key Components

BaseClient (gateway) — Handles HTTP requests with authentication, retries, response parsing, and error handling for all API calls src/openai/_base_client.py
OpenAI (orchestrator) — Main synchronous client that exposes all API resource endpoints through typed properties like .chat.completions src/openai/_client.py
CompletionsResource (adapter) — Provides typed create() method for chat completions with support for streaming and structured outputs src/openai/resources/chat/completions/completions.py
parse_chat_completion (transformer) — Converts raw ChatCompletion JSON into typed Pydantic models based on response_format parameter src/openai/lib/_parsing/_completions.py
ChatCompletionStream (processor) — Accumulates streaming chat completion chunks into progressive snapshots with delta events src/openai/lib/streaming/chat/_completions.py
pydantic_function_tool (encoder) — Converts Pydantic models into OpenAI function tool definitions with JSON schema parameters src/openai/lib/_tools.py
to_strict_json_schema (encoder) — Generates strict JSON schemas from Pydantic models that conform to OpenAI's structured outputs format src/openai/lib/_pydantic.py
AzureOpenAI (adapter) — Specialized client for Azure OpenAI with deployment-based URL routing and Azure AD authentication src/openai/lib/azure.py
AssistantEventHandler (orchestrator) — Event-driven handler for streaming assistant responses with run steps, messages, and tool calls src/openai/lib/streaming/_assistants.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is openai-python used for?

Provides typed Python bindings for sending HTTP requests to OpenAI's REST API openai/openai-python is a 9-component library written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 1227 files.

How is openai-python architected?

openai-python is organized into 5 architecture layers: HTTP Foundation, API Resources, Response Parsing, Streaming Abstractions, and 1 more. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through openai-python?

Data moves through 5 stages: Initialize client with authentication → Create API request → Execute HTTP request → Parse response → Stream processing. Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests. This pipeline design reflects a complex multi-stage processing system.

What technologies does openai-python use?

The core stack includes httpx (HTTP client library providing sync/async request capabilities with connection pooling, retries, and timeout handling), Pydantic (Data validation and serialization for all API request/response models, with JSON schema generation for structured outputs), Stainless (Code generation framework that builds the SDK from OpenAPI specifications, creating typed resource classes and models), pytest (Test framework for unit and integration tests across sync/async clients, streaming, and parsing functionality), jiter (Fast JSON parsing for high-performance deserialization of API responses and streaming chunks). A focused set of dependencies that keeps the build manageable.

What system dynamics does openai-python have?

openai-python exhibits 2 data pools (HTTP Response Cache, Delta Accumulator), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle retry and accumulation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does openai-python use?

5 design patterns detected: Resource Composition, Sync/Async Parallel, Streaming Delta Accumulation, Pydantic Schema Generation, Legacy API Deprecation.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.