microsoft/semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps

27,745 stars C# 8 components

SDK for building AI agents and workflows with pluggable LLMs and memory

User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 8-component ml inference. 4274 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services.

  1. Initialize Kernel with services and plugins — Kernel.builder() registers LLM connectors (OpenAIChatCompletion), memory stores (VectorStoreRecordCollection), and plugins (KernelPlugin) through dependency injection [KernelBuilder → Kernel]
  2. Create agent with kernel and instructions — ChatCompletionAgent constructor takes kernel, name, instructions, and optional execution_settings to create an agent instance with conversation state [Kernel → ChatCompletionAgent]
  3. Process user message through agent — Agent.invoke_async() converts user input to ChatMessageContent, maintains conversation history in ChatHistory, and sends to LLM connector [ChatMessageContent → ChatMessageContent]
  4. Execute tool calls from LLM response — If LLM response contains function_calls, agent iterates through them, looks up KernelFunction in registered plugins, and invokes with extracted arguments [FunctionCall → FunctionResult]
  5. Retrieve context from vector memory — Memory search uses EmbeddingGenerator to convert query text to vectors, then VectorStoreRecordCollection.search() finds similar MemoryRecord entries [TextContent → MemoryRecord]
  6. Orchestrate multi-step process workflow — ProcessStepBuilder creates workflow steps that emit events, maintain state in ProcessData, and coordinate between local functions and external services via gRPC/SignalR [ProcessData → ProcessEvent]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ChatMessageContent python/semantic_kernel/contents/chat_message_content.py
class with role: AuthorRole, content: str, name: Optional[str], metadata: dict, items: list[KernelContent] for multimodal content
Created from user input or LLM responses, flows through agents and plugins, stored in conversation history
KernelFunction python/semantic_kernel/functions/kernel_function.py
class with name: str, plugin_name: str, description: str, parameters: list[KernelParameterMetadata], return_parameter: KernelReturnParameterMetadata, method: Callable
Registered with kernel at startup, discovered by agents for tool calling, invoked during execution
ProcessData dotnet/samples/Demos/ProcessWithCloudEvents/ProcessWithCloudEvents.Client/src/services/grpc/gen/documentGeneration.ts
protobuf message with processId: string identifying the running process instance
Generated when starting a process, passed between process steps, used to correlate events across distributed services
MemoryRecord python/semantic_kernel/memory/memory_record.py
dataclass with id: str, text: str, embedding: ndarray, metadata: dict, is_reference: bool, external_source_name: str, description: str
Created from text chunks with embeddings, stored in vector databases, retrieved during similarity searches
KernelArguments python/semantic_kernel/kernel_arguments.py
dict-like container with typed parameters, execution_settings: dict[str, PromptExecutionSettings]
Built from user input and context, passed to functions during execution, modified by plugins

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract unguarded

SignalR hub is running at hardcoded URL 'http://localhost:5125/pfevents' and has methods 'UserRequestFeatureDocumentation', 'RequestUserReviewDocumentationFromProcess', etc.

If this fails: Client fails silently or throws connection errors if hub URL changes, hub is unavailable, or method names don't match server implementation

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:connection
critical Contract weakly guarded

HTTP API server is running at 'http://localhost:5125' with endpoints '/api/generate-doc' and '/api/reviewed-doc' accepting specific request schemas

If this fails: All HTTP requests fail with network errors, 404s, or 400s if server is down, URLs change, or request/response schemas don't match backend

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/ProcessFrameworkClient.ts:API_BASE_URL
critical Shape unguarded

bertscore.compute() expects 'predictions' and 'references' arrays to have matching lengths and all contain valid text strings

If this fails: BERT scoring fails with index errors or produces meaningless scores if array lengths differ or contain null/empty strings

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:bertscore.compute
critical Environment unguarded

Environment variables CONTAINER_APP_NAME and CONTAINER_APP_ENV_DNS_SUFFIX are always set and contain valid Azure Container App values

If this fails: Manifest generation creates invalid URLs with 'None' values, breaking bot registration and webhook routing

python/samples/demos/copilot_studio_skill/src/api/app.py:copilot_manifest
warning Resource unguarded

System has sufficient disk space and network connectivity to download 'Unbabel/wmt22-cometkiwi-da' model (~2GB) on first request

If this fails: First comet_score request hangs indefinitely or crashes with disk space errors; subsequent requests may fail if download was incomplete

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:comet_score
warning Temporal weakly guarded

OAuth authorization callback arrives at HTTP server within a reasonable timeframe and contains expected 'code' parameter

If this fails: OAuth flow hangs forever if user closes browser, takes too long, or if authorization server sends different parameter names

python/samples/demos/mcp_with_oauth/agent/main.py:CallbackHandler
warning Contract unguarded

Process events 'PublishDocumentation' and 'RequestUserReview' contain structured message objects that handlers can process without type checking

If this fails: Event handlers receive unexpected data types or malformed messages, causing runtime errors or silent data corruption in UI state

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:subscribeToProcessEvents
warning Ordering weakly guarded

SignalR event handlers ('RequestUserReview', 'PublishDocumentation') are registered before connection.start() completes and no events arrive during startup

If this fails: Early process events are lost if server emits them immediately after connection establishment but before handlers are fully registered

dotnet/samples/Demos/ProcessFrameworkWithSignalR/src/ProcessFramework.Aspire.SignalR.ReactFrontend/src/services/signalr/documentGeneration.client.ts:constructor
warning Domain unguarded

All text content is in English language ('lang="en"') as hardcoded, regardless of actual input language

If this fails: BERT scores are meaningless for non-English text, leading to incorrect quality assessments for multilingual content

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:bertscore
info Scale unguarded

COMET model predictions run on CPU ('accelerator="cpu"') regardless of available hardware or input batch size

If this fails: Processing large translation batches becomes extremely slow (minutes instead of seconds) even when GPUs are available

dotnet/samples/Demos/QualityCheck/python-server/app/main.py:comet_score

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

ChatHistory (in-memory)
Stores conversation messages for each agent session with role-based access and metadata
VectorStore (database)
Persists embeddings and metadata for semantic search across different vector database implementations
ProcessState (state-store)
Maintains workflow execution state across distributed process steps with event correlation
KernelServices (registry)
Dependency injection container for AI services, connectors, and plugins used by agents

Feedback Loops

Delays

Control Points

Technology Stack

OpenAI SDK (library)
Provides HTTP client and response models for OpenAI and Azure OpenAI API interactions
Pydantic (serialization)
Validates and serializes structured data models for agent inputs/outputs and configuration
AsyncIO (runtime)
Enables non-blocking LLM API calls and concurrent plugin execution in Python implementation
SignalR (framework)
Real-time communication between process orchestrator and web frontend for live workflow updates
gRPC (framework)
High-performance RPC protocol for process step coordination and external service integration
Chroma/Pinecone/Azure AI Search (database)
Vector database backends for semantic memory storage and similarity search
Jinja2/Handlebars (library)
Template engines for dynamic prompt generation with variable substitution and control flow
.NET Generic Host (framework)
Provides dependency injection, configuration, and hosting infrastructure for .NET implementation

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Inference Repositories

Frequently Asked Questions

What is semantic-kernel used for?

SDK for building AI agents and workflows with pluggable LLMs and memory microsoft/semantic-kernel is a 8-component ml inference written in C#. Data flows through 6 distinct pipeline stages. The codebase contains 4274 files.

How is semantic-kernel architected?

semantic-kernel is organized into 5 architecture layers: Agent Layer, Connectors, Memory & Vector Stores, Process Framework, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through semantic-kernel?

Data moves through 6 stages: Initialize Kernel with services and plugins → Create agent with kernel and instructions → Process user message through agent → Execute tool calls from LLM response → Retrieve context from vector memory → .... User input flows through the Kernel orchestrator to agents, which use LLM connectors to generate responses while calling plugins for external functionality. Memory stores provide context retrieval through vector similarity search, while process frameworks coordinate multi-step workflows with state persistence and event handling across distributed services. This pipeline design reflects a complex multi-stage processing system.

What technologies does semantic-kernel use?

The core stack includes OpenAI SDK (Provides HTTP client and response models for OpenAI and Azure OpenAI API interactions), Pydantic (Validates and serializes structured data models for agent inputs/outputs and configuration), AsyncIO (Enables non-blocking LLM API calls and concurrent plugin execution in Python implementation), SignalR (Real-time communication between process orchestrator and web frontend for live workflow updates), gRPC (High-performance RPC protocol for process step coordination and external service integration), Chroma/Pinecone/Azure AI Search (Vector database backends for semantic memory storage and similarity search), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does semantic-kernel have?

semantic-kernel exhibits 4 data pools (ChatHistory, VectorStore), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does semantic-kernel use?

5 design patterns detected: Plugin Architecture, Provider Abstraction, Streaming Response Handling, Event-Driven Process Orchestration, Dependency Injection Service Location.

Analyzed on April 20, 2026 by CodeSea. Written by .