openai/openai-python

The official Python library for the OpenAI API

30,543 stars Python 9 components

Provides typed Python bindings for sending HTTP requests to OpenAI's REST API

Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests.

Under the hood, the system uses 2 feedback loops, 2 data pools, 4 control points to manage its runtime behavior.

A 9-component library. 1227 files analyzed. Data flows through 5 distinct pipeline stages.

How Data Flows Through the System

Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests.

  1. Initialize client with authentication — OpenAI() constructor processes api_key, azure_endpoint, or workload_identity config, creates httpx.Client with base URL, headers, and timeout settings [Client configuration → OpenAI client instance]
  2. Create API request — Resource methods like client.chat.completions.create() validate parameters against Pydantic models, transform to FinalRequestOptions with URL path, HTTP method, headers [ChatCompletionMessage → FinalRequestOptions]
  3. Execute HTTP request — BaseClient._request() adds authentication headers, handles retries with exponential backoff, sends HTTP request via httpx, validates response status [FinalRequestOptions → Raw HTTP response]
  4. Parse response — Response JSON is deserialized into Pydantic models like ChatCompletion, with parse_chat_completion() extracting structured outputs when response_format is specified [Raw HTTP response → ParsedChatCompletion]
  5. Stream processing — For streaming requests, ChatCompletionStream processes SSE chunks, accumulates deltas using accumulate_delta(), emits ContentDeltaEvent and ContentDoneEvent [ChatCompletionChunk → ChatCompletionStreamEvent]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ChatCompletionMessage src/openai/types/chat/chat_completion_message.py
Pydantic model with role: str, content: str|None, name: Optional[str], tool_calls: Optional[List[ChatCompletionMessageToolCall]]
Created from user input, sent in API request payload, returned in completion response, parsed back to typed object
ChatCompletionChunk src/openai/types/chat/chat_completion_chunk.py
Pydantic model with id: str, object: Literal['chat.completion.chunk'], choices: List[Choice], created: int - represents streaming delta
Generated by OpenAI API in streaming mode, parsed from SSE events, accumulated into final completion
FunctionDefinition src/openai/types/shared_params/function_definition.py
Pydantic model with name: str, description: Optional[str], parameters: object (JSON schema dict)
Created from Pydantic models via pydantic_function_tool, serialized as JSON schema, sent to API
ParsedChatCompletion src/openai/types/chat/parsed_chat_completion.py
Generic[ResponseFormatT] extending ChatCompletion with parsed: Optional[ResponseFormatT] field
Created by parsing library from raw ChatCompletion, adds typed parsed field, returned to user
WorkloadIdentity src/openai/auth.py
Dict with client_id: str, identity_provider_id: str, token_provider: Optional[Callable] for cloud auth
Configured by user for cloud environments, used by BaseClient to obtain short-lived tokens

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Domain weakly guarded

JSON schemas generated from Pydantic models will be valid under OpenAI's 'strict' mode constraints, which requires specific property patterns and forbids certain schema constructs

If this fails: If a Pydantic model uses features incompatible with OpenAI's strict mode (like anyOf, oneOf, or flexible typing), the API request silently fails or returns unexpected structured outputs

src/openai/lib/_pydantic.py:to_strict_json_schema
critical Shape unguarded

Streaming delta chunks arrive in the correct order and contain compatible field types that can be merged without conflict

If this fails: If chunks arrive out of order or contain incompatible data types (e.g., string vs list), delta accumulation produces corrupted final objects with mixed or missing content

src/openai/lib/streaming/_deltas.py:accumulate_delta
critical Contract weakly guarded

When response_format parameter contains a Pydantic model, the API response will contain valid JSON matching that model's schema in the content field

If this fails: If OpenAI returns malformed JSON or JSON that doesn't match the expected schema, parsing fails with cryptic validation errors instead of graceful fallback to raw text

src/openai/lib/_parsing/_completions.py:parse_chat_completion
warning Temporal unguarded

The hardcoded set of deployment endpoints remains synchronized with Azure OpenAI's actual supported deployment patterns

If this fails: If Azure adds new deployment endpoints or changes routing patterns, requests to new endpoints bypass deployment-based URL rewriting and fail with 404 errors

src/openai/lib/azure.py:_deployments_endpoints
critical Environment unguarded

Azure AD token providers return valid, non-expired tokens synchronously without network timeouts or credential failures

If this fails: If token provider blocks, times out, or returns expired tokens, all API requests hang or fail with authentication errors that don't clearly indicate the token source

src/openai/lib/azure.py:AzureADTokenProvider
critical Ordering weakly guarded

Server-Sent Events arrive as complete, parseable JSON chunks terminated by proper [DONE] markers

If this fails: If the stream contains partial JSON, malformed chunks, or missing [DONE] markers, the parser hangs indefinitely or crashes with JSON decode errors

src/openai/lib/streaming/chat/_completions.py:ChatCompletionStream
warning Scale unguarded

Streaming assistant responses contain manageable numbers of events that fit in memory during processing

If this fails: For very long assistant runs with thousands of tool calls or messages, the event handler accumulates unbounded state leading to memory exhaustion

src/openai/lib/streaming/_assistants.py:AssistantEventHandler
warning Domain unguarded

Pydantic models used as function tools contain only JSON-serializable field types and avoid circular references

If this fails: Models with non-serializable fields (datetime, custom objects) or circular references cause JSON schema generation to fail or produce invalid tool definitions that the API rejects

src/openai/lib/_tools.py:pydantic_function_tool
warning Contract weakly guarded

The generic ResponseFormatT parameter maintains type safety throughout the parsing pipeline without runtime type checking

If this fails: Type mismatches between expected and actual parsed responses pass static analysis but cause runtime failures when accessing typed fields that don't exist

src/openai/lib/_parsing/_completions.py:ResponseFormatT
info Resource unguarded

JSON parsing via jiter can handle arbitrarily large streaming chunks without memory limits or parsing timeouts

If this fails: Extremely large API responses (e.g., massive function call arguments) cause JSON parsing to consume excessive memory or time, blocking the event loop

src/openai/lib/streaming/chat/_completions.py:from_json

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

HTTP Response Cache (cache)
httpx client maintains connection pool and optional response caching for efficiency
Delta Accumulator (buffer)
Temporary buffer that progressively builds complete objects from streaming delta chunks

Feedback Loops

Delays

Control Points

Technology Stack

httpx (library)
HTTP client library providing sync/async request capabilities with connection pooling, retries, and timeout handling
Pydantic (library)
Data validation and serialization for all API request/response models, with JSON schema generation for structured outputs
Stainless (build)
Code generation framework that builds the SDK from OpenAPI specifications, creating typed resource classes and models
pytest (testing)
Test framework for unit and integration tests across sync/async clients, streaming, and parsing functionality
jiter (library)
Fast JSON parsing for high-performance deserialization of API responses and streaming chunks

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is openai-python used for?

Provides typed Python bindings for sending HTTP requests to OpenAI's REST API openai/openai-python is a 9-component library written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 1227 files.

How is openai-python architected?

openai-python is organized into 5 architecture layers: HTTP Foundation, API Resources, Response Parsing, Streaming Abstractions, and 1 more. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through openai-python?

Data moves through 5 stages: Initialize client with authentication → Create API request → Execute HTTP request → Parse response → Stream processing. Requests flow from user code through typed resource methods into BaseClient HTTP handling, then responses return through parsing layers. For streaming, Server-Sent Events are decoded into chunks, accumulated with delta logic, and emitted as typed events. Authentication tokens are obtained from providers (API key, Azure AD, workload identity) and attached to all requests. This pipeline design reflects a complex multi-stage processing system.

What technologies does openai-python use?

The core stack includes httpx (HTTP client library providing sync/async request capabilities with connection pooling, retries, and timeout handling), Pydantic (Data validation and serialization for all API request/response models, with JSON schema generation for structured outputs), Stainless (Code generation framework that builds the SDK from OpenAPI specifications, creating typed resource classes and models), pytest (Test framework for unit and integration tests across sync/async clients, streaming, and parsing functionality), jiter (Fast JSON parsing for high-performance deserialization of API responses and streaming chunks). A focused set of dependencies that keeps the build manageable.

What system dynamics does openai-python have?

openai-python exhibits 2 data pools (HTTP Response Cache, Delta Accumulator), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle retry and accumulation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does openai-python use?

5 design patterns detected: Resource Composition, Sync/Async Parallel, Streaming Delta Accumulation, Pydantic Schema Generation, Legacy API Deprecation.

Analyzed on April 20, 2026 by CodeSea. Written by .