microsoft/semantic-kernel
Integrate cutting-edge LLM technology quickly and easily into your apps
SDK for building AI agents and workflows with pluggable LLMs and memory
User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services.
Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.
A 8-component ml inference. 4274 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services.
- Initialize Kernel with services and plugins — Kernel.builder() registers LLM connectors (OpenAIChatCompletion), memory stores (VectorStoreRecordCollection), and plugins (KernelPlugin) through dependency injection [KernelBuilder → Kernel]
- Create agent with kernel and instructions — ChatCompletionAgent constructor takes kernel, name, instructions, and optional execution_settings to create an agent instance with conversation state [Kernel → ChatCompletionAgent]
- Process user message through agent — Agent.invoke_async() converts user input to ChatMessageContent, maintains conversation history in ChatHistory, and sends to LLM connector [ChatMessageContent → ChatMessageContent]
- Execute tool calls from LLM response — If LLM response contains function_calls, agent iterates through them, looks up KernelFunction in registered plugins, and invokes with extracted arguments [FunctionCall → FunctionResult]
- Retrieve context from vector memory — Memory search uses EmbeddingGenerator to convert query text to vectors, then VectorStoreRecordCollection.search() finds similar MemoryRecord entries [TextContent → MemoryRecord]
- Orchestrate multi-step process workflow — ProcessStepBuilder creates workflow steps that emit events, maintain state in ProcessData, and coordinate between local functions and external services via gRPC/SignalR [ProcessData → ProcessEvent]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
python/semantic_kernel/contents/chat_message_content.pyclass with role: AuthorRole, content: str, name: Optional[str], metadata: dict, items: list[KernelContent] for multimodal content
Created from user input or LLM responses, flows through agents and plugins, stored in conversation history
python/semantic_kernel/functions/kernel_function.pyclass with name: str, plugin_name: str, description: str, parameters: list[KernelParameterMetadata], return_parameter: KernelReturnParameterMetadata, method: Callable
Registered with kernel at startup, discovered by agents for tool calling, invoked during execution
dotnet/samples/Demos/ProcessWithCloudEvents/ProcessWithCloudEvents.Client/src/services/grpc/gen/documentGeneration.tsprotobuf message with processId: string identifying the running process instance
Generated when starting a process, passed between process steps, used to correlate events across distributed services
python/semantic_kernel/memory/memory_record.pydataclass with id: str, text: str, embedding: ndarray, metadata: dict, is_reference: bool, external_source_name: str, description: str
Created from text chunks with embeddings, stored in vector databases, retrieved during similarity searches
python/semantic_kernel/kernel_arguments.pydict-like container with typed parameters, execution_settings: dict[str, PromptExecutionSettings]
Built from user input and context, passed to functions during execution, modified by plugins
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
SignalR hub is running at hardcoded URL 'http://localhost:5125/pfevents' and has methods 'UserRequestFeatureDocumentation', 'RequestUserReviewDocumentationFromProcess', etc.
If this fails: Client fails silently or throws connection errors if hub URL changes, hub is unavailable, or method names don't match server implementation
dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:connection
HTTP API server is running at 'http://localhost:5125' with endpoints '/api/generate-doc' and '/api/reviewed-doc' accepting specific request schemas
If this fails: All HTTP requests fail with network errors, 404s, or 400s if server is down, URLs change, or request/response schemas don't match backend
dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/ProcessFrameworkClient.ts:API_BASE_URL
bertscore.compute() expects 'predictions' and 'references' arrays to have matching lengths and all contain valid text strings
If this fails: BERT scoring fails with index errors or produces meaningless scores if array lengths differ or contain null/empty strings
dotnet/samples/Demos/QualityCheck/python-server/app/main.py:bertscore.compute
Environment variables CONTAINER_APP_NAME and CONTAINER_APP_ENV_DNS_SUFFIX are always set and contain valid Azure Container App values
If this fails: Manifest generation creates invalid URLs with 'None' values, breaking bot registration and webhook routing
python/samples/demos/copilot_studio_skill/src/api/app.py:copilot_manifest
System has sufficient disk space and network connectivity to download 'Unbabel/wmt22-cometkiwi-da' model (~2GB) on first request
If this fails: First comet_score request hangs indefinitely or crashes with disk space errors; subsequent requests may fail if download was incomplete
dotnet/samples/Demos/QualityCheck/python-server/app/main.py:comet_score
OAuth authorization callback arrives at HTTP server within a reasonable timeframe and contains expected 'code' parameter
If this fails: OAuth flow hangs forever if user closes browser, takes too long, or if authorization server sends different parameter names
python/samples/demos/mcp_with_oauth/agent/main.py:CallbackHandler
Process events 'PublishDocumentation' and 'RequestUserReview' contain structured message objects that handlers can process without type checking
If this fails: Event handlers receive unexpected data types or malformed messages, causing runtime errors or silent data corruption in UI state
dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:subscribeToProcessEvents
SignalR event handlers ('RequestUserReview', 'PublishDocumentation') are registered before connection.start() completes and no events arrive during startup
If this fails: Early process events are lost if server emits them immediately after connection establishment but before handlers are fully registered
dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:constructor
All text content is in English language ('lang="en"') as hardcoded, regardless of actual input language
If this fails: BERT scores are meaningless for non-English text, leading to incorrect quality assessments for multilingual content
dotnet/samples/Demos/QualityCheck/python-server/app/main.py:bertscore
COMET model predictions run on CPU ('accelerator="cpu"') regardless of available hardware or input batch size
If this fails: Processing large translation batches becomes extremely slow (minutes instead of seconds) even when GPUs are available
dotnet/samples/Demos/QualityCheck/python-server/app/main.py:comet_score
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Stores conversation messages for each agent session with role-based access and metadata
Persists embeddings and metadata for semantic search across different vector database implementations
Maintains workflow execution state across distributed process steps with event correlation
Dependency injection container for AI services, connectors, and plugins used by agents
Feedback Loops
- Agent conversation loop (recursive, reinforcing) — Trigger: User message or function result. Action: Agent processes input, calls LLM, executes tools if needed, generates response. Exit: No more function calls in LLM response.
- Process workflow execution (training-loop, reinforcing) — Trigger: Process event emission. Action: Steps execute in sequence, emit events to trigger next steps, update state. Exit: Workflow reaches terminal step or error condition.
- Memory similarity search refinement (recursive, balancing) — Trigger: Low relevance scores in search results. Action: Adjust embedding generation parameters, re-index content, refine query. Exit: Relevance threshold met or max iterations reached.
Delays
- LLM API response time (async-processing, ~200ms-5s) — Agent waits for completion before proceeding with tool calls or response generation
- Vector embedding generation (async-processing, ~50-500ms) — Memory operations wait for text-to-vector conversion before similarity search
- Process step coordination (eventual-consistency, ~Variable) — Distributed steps may execute out of order until event correlation completes
Control Points
- Model selection (architecture-switch) — Controls: Which LLM provider and model variant agents use for completions. Default: Configured via environment variables
- Temperature and top_p (hyperparameter) — Controls: Randomness and creativity of LLM responses. Default: Default 0.7 temperature
- Vector similarity threshold (threshold) — Controls: Minimum relevance score for memory retrieval results. Default: 0.8 default
- Tool calling enablement (feature-flag) — Controls: Whether agents can execute functions during conversation. Default: Enabled by default
Technology Stack
Provides HTTP client and response models for OpenAI and Azure OpenAI API interactions
Validates and serializes structured data models for agent inputs/outputs and configuration
Enables non-blocking LLM API calls and concurrent plugin execution in Python implementation
Real-time communication between process orchestrator and web frontend for live workflow updates
High-performance RPC protocol for process step coordination and external service integration
Vector database backends for semantic memory storage and similarity search
Template engines for dynamic prompt generation with variable substitution and control flow
Provides dependency injection, configuration, and hosting infrastructure for .NET implementation
Key Components
- Kernel (orchestrator) — Central coordinator that manages plugins, services, and execution context — handles function registration, service dependency injection, and coordinates between agents, memory, and LLM connectors
python/semantic_kernel/kernel.py - ChatCompletionAgent (agent) — Wraps LLM chat completion APIs with conversation state management, tool calling capabilities, and structured output parsing
python/semantic_kernel/agents/chat_completion_agent.py - VectorStoreRecordCollection (adapter) — Abstract interface for vector database operations — provides upsert, get, delete, and vector search methods across different vector store implementations
python/semantic_kernel/data/vector_store_record_collection.py - ProcessStepBuilder (factory) — Builds process step definitions with event routing, state management, and external integration configuration for workflow orchestration
dotnet/src/Experimental/Process.Abstractions/ProcessStepBuilder.cs - OpenAIChatCompletion (adapter) — Connector for OpenAI and Azure OpenAI chat APIs — handles authentication, request formatting, response parsing, and streaming for chat completions
python/semantic_kernel/connectors/ai/open_ai/services/open_ai_chat_completion.py - PromptTemplate (processor) — Renders prompt templates with variable substitution, conditional logic, and formatting — supports Jinja2 and handlebars syntax for dynamic prompt generation
python/semantic_kernel/prompt_template/prompt_template.py - EmbeddingGenerator (transformer) — Abstract base for converting text into vector embeddings using various embedding models — handles batching, normalization, and caching
python/semantic_kernel/connectors/ai/embeddings/embedding_generator_base.py - KernelPlugin (registry) — Container for related functions that can be discovered and invoked by agents — manages function metadata, validation, and access control
python/semantic_kernel/functions/kernel_plugin.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Inference Repositories
Frequently Asked Questions
What is semantic-kernel used for?
SDK for building AI agents and workflows with pluggable LLMs and memory microsoft/semantic-kernel is a 8-component ml inference written in C#. Data flows through 6 distinct pipeline stages. The codebase contains 4274 files.
How is semantic-kernel architected?
semantic-kernel is organized into 5 architecture layers: Agent Layer, Connectors, Memory & Vector Stores, Process Framework, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through semantic-kernel?
Data moves through 6 stages: Initialize Kernel with services and plugins → Create agent with kernel and instructions → Process user message through agent → Execute tool calls from LLM response → Retrieve context from vector memory → .... User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services. This pipeline design reflects a complex multi-stage processing system.
What technologies does semantic-kernel use?
The core stack includes OpenAI SDK (Provides HTTP client and response models for OpenAI and Azure OpenAI API interactions), Pydantic (Validates and serializes structured data models for agent inputs/outputs and configuration), AsyncIO (Enables non-blocking LLM API calls and concurrent plugin execution in Python implementation), SignalR (Real-time communication between process orchestrator and web frontend for live workflow updates), gRPC (High-performance RPC protocol for process step coordination and external service integration), Chroma/Pinecone/Azure AI Search (Vector database backends for semantic memory storage and similarity search), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does semantic-kernel have?
semantic-kernel exhibits 4 data pools (ChatHistory, VectorStore), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does semantic-kernel use?
5 design patterns detected: Plugin Architecture, Provider Abstraction, Streaming Response Handling, Event-Driven Process Orchestration, Dependency Injection Service Location.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.