microsoft/semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps

27,745 stars C# 8 components

SDK for building AI agents and workflows with pluggable LLMs and memory

User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 8-component ml inference. 4274 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Initialize Kernel with services and plugins — Kernel.builder() registers LLM connectors (OpenAIChatCompletion), memory stores (VectorStoreRecordCollection), and plugins (KernelPlugin) through dependency injection [KernelBuilder → Kernel]
Create agent with kernel and instructions — ChatCompletionAgent constructor takes kernel, name, instructions, and optional execution_settings to create an agent instance with conversation state [Kernel → ChatCompletionAgent]
Process user message through agent — Agent.invoke_async() converts user input to ChatMessageContent, maintains conversation history in ChatHistory, and sends to LLM connector [ChatMessageContent → ChatMessageContent]
Execute tool calls from LLM response — If LLM response contains function_calls, agent iterates through them, looks up KernelFunction in registered plugins, and invokes with extracted arguments [FunctionCall → FunctionResult]
Retrieve context from vector memory — Memory search uses EmbeddingGenerator to convert query text to vectors, then VectorStoreRecordCollection.search() finds similar MemoryRecord entries [TextContent → MemoryRecord]
Orchestrate multi-step process workflow — ProcessStepBuilder creates workflow steps that emit events, maintain state in ProcessData, and coordinate between local functions and external services via gRPC/SignalR [ProcessData → ProcessEvent]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ChatMessageContent python/semantic_kernel/contents/chat_message_content.py
class with role: AuthorRole, content: str, name: Optional[str], metadata: dict, items: list[KernelContent] for multimodal content
Created from user input or LLM responses, flows through agents and plugins, stored in conversation history

KernelFunction python/semantic_kernel/functions/kernel_function.py
class with name: str, plugin_name: str, description: str, parameters: list[KernelParameterMetadata], return_parameter: KernelReturnParameterMetadata, method: Callable
Registered with kernel at startup, discovered by agents for tool calling, invoked during execution

ProcessData dotnet/samples/Demos/ProcessWithCloudEvents/ProcessWithCloudEvents.Client/src/services/grpc/gen/documentGeneration.ts
protobuf message with processId: string identifying the running process instance
Generated when starting a process, passed between process steps, used to correlate events across distributed services

MemoryRecord python/semantic_kernel/memory/memory_record.py
dataclass with id: str, text: str, embedding: ndarray, metadata: dict, is_reference: bool, external_source_name: str, description: str
Created from text chunks with embeddings, stored in vector databases, retrieved during similarity searches

KernelArguments python/semantic_kernel/kernel_arguments.py
dict-like container with typed parameters, execution_settings: dict[str, PromptExecutionSettings]
Built from user input and context, passed to functions during execution, modified by plugins

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract unguarded

SignalR hub is running at hardcoded URL 'http://localhost:5125/pfevents' and has methods 'UserRequestFeatureDocumentation', 'RequestUserReviewDocumentationFromProcess', etc.

If this fails: Client fails silently or throws connection errors if hub URL changes, hub is unavailable, or method names don't match server implementation

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:connection

critical Contract weakly guarded

HTTP API server is running at 'http://localhost:5125' with endpoints '/api/generate-doc' and '/api/reviewed-doc' accepting specific request schemas

If this fails: All HTTP requests fail with network errors, 404s, or 400s if server is down, URLs change, or request/response schemas don't match backend

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/ProcessFrameworkClient.ts:API_BASE_URL

critical Shape unguarded

bertscore.compute() expects 'predictions' and 'references' arrays to have matching lengths and all contain valid text strings

If this fails: BERT scoring fails with index errors or produces meaningless scores if array lengths differ or contain null/empty strings

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:bertscore.compute

critical Environment unguarded

Environment variables CONTAINER_APP_NAME and CONTAINER_APP_ENV_DNS_SUFFIX are always set and contain valid Azure Container App values

If this fails: Manifest generation creates invalid URLs with 'None' values, breaking bot registration and webhook routing

python/samples/demos/copilot_studio_skill/src/api/app.py:copilot_manifest

warning Resource unguarded

System has sufficient disk space and network connectivity to download 'Unbabel/wmt22-cometkiwi-da' model (~2GB) on first request

If this fails: First comet_score request hangs indefinitely or crashes with disk space errors; subsequent requests may fail if download was incomplete

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:comet_score

warning Temporal weakly guarded

OAuth authorization callback arrives at HTTP server within a reasonable timeframe and contains expected 'code' parameter

If this fails: OAuth flow hangs forever if user closes browser, takes too long, or if authorization server sends different parameter names

python/samples/demos/mcp_with_oauth/agent/main.py:CallbackHandler

warning Contract unguarded

Process events 'PublishDocumentation' and 'RequestUserReview' contain structured message objects that handlers can process without type checking

If this fails: Event handlers receive unexpected data types or malformed messages, causing runtime errors or silent data corruption in UI state

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:subscribeToProcessEvents

warning Ordering weakly guarded

SignalR event handlers ('RequestUserReview', 'PublishDocumentation') are registered before connection.start() completes and no events arrive during startup

If this fails: Early process events are lost if server emits them immediately after connection establishment but before handlers are fully registered

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:constructor

warning Domain unguarded

All text content is in English language ('lang="en"') as hardcoded, regardless of actual input language

If this fails: BERT scores are meaningless for non-English text, leading to incorrect quality assessments for multilingual content

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:bertscore

info Scale unguarded

COMET model predictions run on CPU ('accelerator="cpu"') regardless of available hardware or input batch size

If this fails: Processing large translation batches becomes extremely slow (minutes instead of seconds) even when GPUs are available

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:comet_score

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

ChatHistory (in-memory)
Stores conversation messages for each agent session with role-based access and metadata

VectorStore (database)
Persists embeddings and metadata for semantic search across different vector database implementations

ProcessState (state-store)
Maintains workflow execution state across distributed process steps with event correlation

KernelServices (registry)
Dependency injection container for AI services, connectors, and plugins used by agents

Feedback Loops

Agent conversation loop (recursive, reinforcing) — Trigger: User message or function result. Action: Agent processes input, calls LLM, executes tools if needed, generates response. Exit: No more function calls in LLM response.
Process workflow execution (training-loop, reinforcing) — Trigger: Process event emission. Action: Steps execute in sequence, emit events to trigger next steps, update state. Exit: Workflow reaches terminal step or error condition.
Memory similarity search refinement (recursive, balancing) — Trigger: Low relevance scores in search results. Action: Adjust embedding generation parameters, re-index content, refine query. Exit: Relevance threshold met or max iterations reached.

Delays

LLM API response time (async-processing, ~200ms-5s) — Agent waits for completion before proceeding with tool calls or response generation
Vector embedding generation (async-processing, ~50-500ms) — Memory operations wait for text-to-vector conversion before similarity search
Process step coordination (eventual-consistency, ~Variable) — Distributed steps may execute out of order until event correlation completes

Control Points

Model selection (architecture-switch) — Controls: Which LLM provider and model variant agents use for completions. Default: Configured via environment variables
Temperature and top_p (hyperparameter) — Controls: Randomness and creativity of LLM responses. Default: Default 0.7 temperature
Vector similarity threshold (threshold) — Controls: Minimum relevance score for memory retrieval results. Default: 0.8 default
Tool calling enablement (feature-flag) — Controls: Whether agents can execute functions during conversation. Default: Enabled by default

Technology Stack

OpenAI SDK (library)
Provides HTTP client and response models for OpenAI and Azure OpenAI API interactions

Pydantic (serialization)
Validates and serializes structured data models for agent inputs/outputs and configuration

AsyncIO (runtime)
Enables non-blocking LLM API calls and concurrent plugin execution in Python implementation

SignalR (framework)
Real-time communication between process orchestrator and web frontend for live workflow updates

gRPC (framework)
High-performance RPC protocol for process step coordination and external service integration

Chroma/Pinecone/Azure AI Search (database)
Vector database backends for semantic memory storage and similarity search

Jinja2/Handlebars (library)
Template engines for dynamic prompt generation with variable substitution and control flow

.NET Generic Host (framework)
Provides dependency injection, configuration, and hosting infrastructure for .NET implementation

Key Components

Kernel (orchestrator) — Central coordinator that manages plugins, services, and execution context — handles function registration, service dependency injection, and coordinates between agents, memory, and LLM connectors python/semantic_kernel/kernel.py
ChatCompletionAgent (agent) — Wraps LLM chat completion APIs with conversation state management, tool calling capabilities, and structured output parsing python/semantic_kernel/agents/chat_completion_agent.py
VectorStoreRecordCollection (adapter) — Abstract interface for vector database operations — provides upsert, get, delete, and vector search methods across different vector store implementations python/semantic_kernel/data/vector_store_record_collection.py
ProcessStepBuilder (factory) — Builds process step definitions with event routing, state management, and external integration configuration for workflow orchestration dotnet/src/Experimental/Process.Abstractions/ProcessStepBuilder.cs
OpenAIChatCompletion (adapter) — Connector for OpenAI and Azure OpenAI chat APIs — handles authentication, request formatting, response parsing, and streaming for chat completions python/semantic_kernel/connectors/ai/open_ai/services/open_ai_chat_completion.py
PromptTemplate (processor) — Renders prompt templates with variable substitution, conditional logic, and formatting — supports Jinja2 and handlebars syntax for dynamic prompt generation python/semantic_kernel/prompt_template/prompt_template.py
EmbeddingGenerator (transformer) — Abstract base for converting text into vector embeddings using various embedding models — handles batching, normalization, and caching python/semantic_kernel/connectors/ai/embeddings/embedding_generator_base.py
KernelPlugin (registry) — Container for related functions that can be discovered and invoked by agents — manages function metadata, validation, and access control python/semantic_kernel/functions/kernel_plugin.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Inference Repositories

Frequently Asked Questions

What is semantic-kernel used for?

SDK for building AI agents and workflows with pluggable LLMs and memory microsoft/semantic-kernel is a 8-component ml inference written in C#. Data flows through 6 distinct pipeline stages. The codebase contains 4274 files.

How is semantic-kernel architected?

semantic-kernel is organized into 5 architecture layers: Agent Layer, Connectors, Memory & Vector Stores, Process Framework, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through semantic-kernel?

Data moves through 6 stages: Initialize Kernel with services and plugins → Create agent with kernel and instructions → Process user message through agent → Execute tool calls from LLM response → Retrieve context from vector memory → .... User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services. This pipeline design reflects a complex multi-stage processing system.

What technologies does semantic-kernel use?

The core stack includes OpenAI SDK (Provides HTTP client and response models for OpenAI and Azure OpenAI API interactions), Pydantic (Validates and serializes structured data models for agent inputs/outputs and configuration), AsyncIO (Enables non-blocking LLM API calls and concurrent plugin execution in Python implementation), SignalR (Real-time communication between process orchestrator and web frontend for live workflow updates), gRPC (High-performance RPC protocol for process step coordination and external service integration), Chroma/Pinecone/Azure AI Search (Vector database backends for semantic memory storage and similarity search), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does semantic-kernel have?

semantic-kernel exhibits 4 data pools (ChatHistory, VectorStore), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does semantic-kernel use?

5 design patterns detected: Plugin Architecture, Provider Abstraction, Streaming Response Handling, Event-Driven Process Orchestration, Dependency Injection Service Location.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.