great-expectations/great_expectations

Always know what to expect from your data.

11,422 stars Python 8 components

Validates data quality through configurable expectations with multi-backend execution

Users define expectations through DataContext, which manages datasources that load data into Batches. Validators execute expectations against batches using ExecutionEngines that compute metrics. Results flow to Renderers for documentation and Stores for persistence. Checkpoints orchestrate this pipeline for production workflows.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 8-component library. 1728 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Users define expectations through DataContext, which manages datasources that load data into Batches. Validators execute expectations against batches using ExecutionEngines that compute metrics. Results flow to Renderers for documentation and Stores for persistence. Checkpoints orchestrate this pipeline for production workflows.

  1. Initialize DataContext — AbstractDataContext loads configuration from stores, initializes datasources, and sets up validation infrastructure [DataContextConfig → DataContext]
  2. Define Expectations — Users create ExpectationConfiguration objects defining data quality rules via expect_* methods on Validator instances
  3. Load Data Batches — Datasource queries data sources and creates Batch objects containing data and metadata using BatchDefinition [BatchRequest → Batch]
  4. Execute Validations — Validator.validate() converts expectations to MetricConfiguration, ExecutionEngine computes metrics, results compared to expectation thresholds [ExpectationConfiguration → ValidationResult]
  5. Aggregate Results — Checkpoint collects ValidationResults from multiple batches and expectations into CheckpointResult [ValidationResult → CheckpointResult]
  6. Render Documentation — Renderer transforms ValidationResults into HTML reports, data docs, and other human-readable formats [ValidationResult → RenderedContent]
  7. Persist Results — Store implementations save ValidationResults, ExpectationSuites, and documentation to filesystem, databases, or cloud storage [Serializable objects]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ExpectationConfiguration great_expectations/core/expectation_configuration.py
dict with expectation_type: str, kwargs: Dict[str, Any], meta: Dict[str, Any]
Created by users or auto-generated, stored in context, executed by validators, results rendered to documentation
ValidationResult great_expectations/core/expectation_validation_result.py
dict with success: bool, result: Dict[str, Any], exception_info: Optional[Dict], meta: Dict[str, Any]
Generated by expectation execution, aggregated into checkpoint results, rendered for documentation or stored for monitoring
Batch great_expectations/core/batch.py
object with data: Union[DataFrame, Dataset], batch_definition: BatchDefinition, batch_spec: BatchSpec
Created from datasource queries, passed to execution engines, used by validators to run expectations against specific data subsets
MetricConfiguration great_expectations/core/metric_domain_types.py
dict with metric_name: str, metric_domain_kwargs: Dict, metric_value_kwargs: Dict
Derived from expectation configurations, executed by metrics engines, results used for expectation validation
DataContext great_expectations/data_context/data_context/abstract_data_context.py
object with config: DataContextConfig, datasources: Dict, stores: Dict, checkpoints: Dict
Instantiated at startup, manages all GX resources, persists configuration and validation results

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

useThemeConfig().gxCard exists and has the structure {title: string, description: string, buttons: {primary: {href: string, label: string}, secondary: {href: string, label: string}}}

If this fails: Runtime error 'Cannot read properties of undefined' when gxCard is not configured in theme config, breaking card rendering

docs/docusaurus/src/components/GXCard/index.js:useGXCardConfig
critical Environment weakly guarded

GitHub API at 'https://api.github.com/repos/${owner}/${repository}' returns JSON with stargazers_count and forks_count numeric fields

If this fails: formatCompactNumber crashes with TypeError if API returns null/string for counts, or component displays 'NaN' for invalid numeric values

docs/docusaurus/src/components/GithubNavbarItem/index.js:useEffect
warning Resource weakly guarded

GitHub API is accessible and responds within reasonable time without CORS issues

If this fails: Component renders without star/fork counts and setShowGithubBadgeInfo(false) hides badge info, but no visible error to user about network failure

docs/docusaurus/src/components/GithubNavbarItem/index.js:fetch
critical Environment unguarded

Netlify Functions endpoint '/.netlify/functions/createJiraTicketInDocsBoard' exists and accepts the form data structure

If this fails: Feedback submission fails with 404 or 500 errors if endpoint is missing or expects different data shape, silently failing user feedback

docs/docusaurus/src/components/WasThisHelpful/index.js:CREATE_JIRA_TICKET_IN_DOCS_BOARD_ENDPOINT_URL
warning Shape unguarded

Form inputs have 'name' attribute matching formData keys (name, email, selectedValue, description)

If this fails: Form state becomes inconsistent if input name attributes don't match, causing submission to send undefined/old values

docs/docusaurus/src/components/WasThisHelpful/index.js:handleChange
warning Domain unguarded

Intl.NumberFormat with 'compact' notation is supported in all target browsers

If this fails: TypeError in older browsers that don't support compact notation, breaking the entire navbar component

docs/docusaurus/src/components/GithubNavbarItem/index.js:formatCompactNumber
critical Shape unguarded

announcementBar.content is safe HTML string that won't execute malicious scripts

If this fails: XSS vulnerability if content contains malicious JavaScript, allowing arbitrary code execution in user browsers

docs/docusaurus/src/theme/AnnouncementBar/Content/index.js:dangerouslySetInnerHTML
warning Environment weakly guarded

DOM access is available and buttonElement's parent contains code with 'code-block-hide-line' class elements

If this fails: Copy function falls back to originalCode with potentially sensitive hidden lines visible in clipboard if DOM traversal fails

docs/docusaurus/src/theme/CodeBlock/Buttons/CopyButton/index.js:filterHiddenLines
warning Contract unguarded

The 'to' prop starts with '/' and useVersionedPath hook correctly resolves versioned paths from current page context

If this fails: Navigation links break with incorrect paths if 'to' prop doesn't start with '/' or version context is unavailable

docs/docusaurus/src/components/VersionedLink/index.js:useVersionedPath
info Resource unguarded

Icon URLs in 'icon' prop are accessible and load successfully

If this fails: Broken image icons display if URLs are invalid, but component continues to function without visual feedback about the failure

docs/docusaurus/src/components/LinkCard/index.js:VersionedLink

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

ExpectationStore (file-store)
Persists ExpectationSuite definitions for reuse across validation runs
ValidationResultsStore (database)
Accumulates historical validation results for monitoring and trend analysis
CheckpointStore (file-store)
Stores checkpoint configurations for repeatable validation workflows
MetricStore (cache)
Caches computed metrics to avoid redundant computation across similar expectations

Feedback Loops

Delays

Control Points

Technology Stack

Pydantic (serialization)
Provides type-safe configuration models and validation for DataContext and other core objects
SQLAlchemy (database)
Database abstraction layer for SQL datasources and stores
Pandas (compute)
Primary data manipulation engine for in-memory data processing
Apache Spark (compute)
Distributed computing engine for large-scale data validation
Jinja2 (framework)
Template engine for generating documentation and reports from validation results
Click (framework)
Command-line interface framework for GX CLI tools
Marshmallow (serialization)
Schema serialization and deserialization for configuration objects
Pytest (testing)
Testing framework for unit and integration tests across the codebase

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is great_expectations used for?

Validates data quality through configurable expectations with multi-backend execution great-expectations/great_expectations is a 8-component library written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 1728 files.

How is great_expectations architected?

great_expectations is organized into 6 architecture layers: Data Context Layer, Expectation Layer, Execution Engine Layer, Validator Layer, and 2 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through great_expectations?

Data moves through 7 stages: Initialize DataContext → Define Expectations → Load Data Batches → Execute Validations → Aggregate Results → .... Users define expectations through DataContext, which manages datasources that load data into Batches. Validators execute expectations against batches using ExecutionEngines that compute metrics. Results flow to Renderers for documentation and Stores for persistence. Checkpoints orchestrate this pipeline for production workflows. This pipeline design reflects a complex multi-stage processing system.

What technologies does great_expectations use?

The core stack includes Pydantic (Provides type-safe configuration models and validation for DataContext and other core objects), SQLAlchemy (Database abstraction layer for SQL datasources and stores), Pandas (Primary data manipulation engine for in-memory data processing), Apache Spark (Distributed computing engine for large-scale data validation), Jinja2 (Template engine for generating documentation and reports from validation results), Click (Command-line interface framework for GX CLI tools), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does great_expectations have?

great_expectations exhibits 4 data pools (ExpectationStore, ValidationResultsStore), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle self-correction and convergence. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does great_expectations use?

5 design patterns detected: Plugin Architecture, Strategy Pattern, Builder Pattern, Template Method, Observer Pattern.

Analyzed on April 19, 2026 by CodeSea. Written by .