sharkdp/bat

A cat(1) clone with wings.

58,379 stars Rust 8 components

Displays file contents with syntax highlighting, line numbers, git integration, and automatic paging

Files enter through command-line arguments or stdin, get processed for syntax detection, are tokenized by syntect's regex-based parser into syntax scopes, then formatted with ANSI color codes and decorations like line numbers, and finally output either directly to terminal or through a pager like less. Git integration runs in parallel to detect file modifications.

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

A 8-component cli tool. 105 files analyzed. Data flows through 8 distinct pipeline stages.

How Data Flows Through the System

Files enter through command-line arguments or stdin, get processed for syntax detection, are tokenized by syntect's regex-based parser into syntax scopes, then formatted with ANSI color codes and decorations like line numbers, and finally output either directly to terminal or through a pager like less. Git integration runs in parallel to detect file modifications.

  1. Parse command arguments and load configuration — App::from_args() processes clap arguments, merges with config files and environment variables, validates options, and builds the Config struct with all display preferences
  2. Discover and validate input files — Input::new() processes file paths from arguments, handles glob expansion via wild crate, creates Input enum variants for files or stdin, validates file existence
  3. Load syntax highlighting assets — HighlightingAssets::from_binary() or from_cache() deserializes the SyntaxSet from compressed binary data embedded in the executable or cached files, loading ~200 language definitions
  4. Open and inspect file contents — InputReader::read_to_end() opens files, detects encoding using encoding_rs, removes BOM markers, uses content_inspector to check if file is binary or text [Input → OpenedInput]
  5. Map file to syntax definition — SyntaxMapping::get_syntax_for() examines file extension, shebang line, and filename against syntax definitions, applies user-configured mappings and glob patterns [OpenedInput → MappingTarget] (config: syntax_mapping)
  6. Apply syntax highlighting — Printer uses syntect's HighlightLines to tokenize text with regex patterns, assigns syntax scopes to text ranges, applies theme colors to create ANSI escape sequences [OpenedInput → OutputHandle] (config: theme, colored_output)
  7. Add line decorations and formatting — Printer adds line numbers, git modification markers (+ - ~), file headers, rule separators, handles line range filtering and text wrapping based on terminal width [OutputHandle → OutputHandle] (config: style_components, line_ranges, wrapping_mode)
  8. Output to terminal or pager — OutputType::stdout() writes directly to terminal, or PagerKind spawns external pager process (less/more) and pipes content, handles terminal size detection and color capability [OutputHandle] (config: paging_mode)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Input src/input.rs
enum with variants: File(PathBuf), Stdin, Bytes(Vec<u8>), plus metadata fields name: Option<String>, title: Option<String>, kind: Option<String>
Created from command-line file paths or stdin, opened to get raw bytes with metadata, then consumed by the syntax highlighter
Config src/config/mod.rs
struct with colored_output: bool, paging_mode: PagingMode, style_components: StyleComponents, syntax_mapping: SyntaxMapping, theme: String, line_ranges: LineRanges, and ~20 other display options
Built from command-line args merged with config files, used throughout the pipeline to control syntax highlighting, decorations, and output format
SyntaxSet syntect crate
syntect::parsing::SyntaxSet containing compiled regex patterns and grammar rules for ~200 programming languages, loaded from binary assets or cache files
Deserialized lazily from compressed binary data, used to tokenize input text and assign syntax scopes to text ranges
OutputHandle src/output.rs
enum with variants: TerminalStdout(BufWriter<Stdout>), TerminalStderr, PagingWith(Box<dyn Write>), FmtWrite(&mut dyn fmt::Write)
Created based on terminal detection and paging mode, receives all formatted output from the printer, handles terminal colors and paging integration
HighlightedLineRanges src/line_range.rs
wrapper around Vec<LineRange> where LineRange has start: usize, end: usize, plus methods for intersection testing
Parsed from --line-range command arguments, used to filter which lines get highlighted and displayed during output formatting

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

The COLORTERM environment variable, if present, contains only valid string values like 'truecolor' or '24bit' but never checks for malformed UTF-8 or binary data that could cause env::var() to panic

If this fails: If COLORTERM contains invalid UTF-8 bytes (possible with custom shell environments or CI systems), the process crashes with a panic instead of gracefully falling back to 8-color mode

src/bin/bat/app.rs:is_truecolor_terminal
critical Domain weakly guarded

The serialized Vec<u8> data in LazyTheme always contains valid syntect Theme data that can be deserialized, but never validates the binary format or version compatibility

If this fails: If embedded theme data becomes corrupted or was built with an incompatible syntect version, deserialization silently fails and users get no syntax highlighting without clear error messages

src/assets/lazy_theme_set.rs:LazyTheme::deserialize
critical Resource unguarded

The embedded SyntaxSet binary data (serialized_syntax_set) fits in available memory when deserialized, typically around 50MB+ for ~200 language definitions, but never checks available memory

If this fails: On memory-constrained systems (containers, embedded devices), deserialization causes OOM kills without warning, especially when multiple bat processes run simultaneously

src/assets.rs:HighlightingAssets
warning Scale weakly guarded

The system's available_parallelism() returns a reasonable number of threads (1-128) but never validates or caps this value

If this fails: On misconfigured systems or VMs that report thousands of CPUs, bat could spawn excessive threads causing resource exhaustion and system instability

src/bin/bat/app.rs:available_parallelism
warning Contract unguarded

ANSI escape sequences in test files always use the exact format [38;2;R;G;B;m for RGB colors and [0m for reset, but syntax tests don't validate that actual output matches these precise byte sequences

If this fails: When syntect changes its ANSI encoding format or color precision, tests may pass visually but break tools that parse bat's output expecting exact escape sequence formats

tests/syntax-tests/highlighted/Go/main.go
warning Environment weakly guarded

The NO_COLOR environment variable follows the standard (any non-empty value means disable colors) but env::var_os() can return Some(empty_string) which is_empty() treats as false

If this fails: Setting NO_COLOR='' (empty string) incorrectly enables colors instead of disabling them, violating the NO_COLOR standard and breaking accessibility tools

src/bin/bat/app.rs:env_no_color
warning Temporal weakly guarded

Asset cache validation only checks bat version strings for compatibility but never validates the actual syntect crate version used to build cached syntax definitions

If this fails: When users upgrade bat but syntect's binary format changes between versions, cached assets become incompatible causing silent syntax highlighting failures or crashes

src/assets/assets_metadata.rs
info Domain unguarded

The Go source example hardcodes StdSizes{8, 8} assuming 64-bit architecture (8-byte pointers and int64) but bat's syntax highlighting has no knowledge of target architecture

If this fails: When highlighting Go code that uses different architecture assumptions, the visual syntax highlighting remains correct but semantic analysis tools consuming bat's output might misinterpret size calculations

tests/syntax-tests/source/Go/main.go:sizeof
info Contract weakly guarded

File extension mappings in SyntaxMapping always resolve to existing syntax definitions in the loaded SyntaxSet, but there's no validation that referenced syntaxes are actually available

If this fails: Custom syntax mappings or corrupted syntax sets can map file extensions to non-existent syntaxes, causing bat to fall back to plain text without informing users why their custom mappings failed

src/assets.rs:get_syntax_for
info Ordering unguarded

The PrettyPrinter builder pattern expects methods to be called in a logical order (input before language before print) but doesn't enforce ordering dependencies

If this fails: Calling print() before setting input or language() after print() leads to confusing runtime errors instead of clear API misuse messages, making library integration more difficult

src/lib.rs:PrettyPrinter

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Embedded syntax assets (in-memory)
Compressed binary data containing ~200 syntax definitions and themes, embedded at compile time from syntect's default assets
User asset cache (file-store)
Binary cache files storing user-compiled syntax definitions and themes, built via 'bat cache --build' command
Configuration sources (file-store)
YAML/INI config files and environment variables that override default display options

Feedback Loops

Delays

Control Points

Technology Stack

syntect (library)
Provides syntax highlighting engine with regex-based parsing and theme application using Sublime Text grammar files
clap (library)
Handles command-line argument parsing with subcommands, validation, help generation, and shell completion support
git2 (library)
Integrates with git repositories to detect file modifications and show status indicators in the left margin
console (library)
Provides terminal capability detection, ANSI escape sequence handling, and cross-platform terminal interaction
encoding_rs (library)
Handles text encoding detection and conversion for files that aren't UTF-8, including BOM processing
minus (library)
Pure Rust pager implementation used as fallback when system pagers (less/more) aren't available
bincode (serialization)
Serializes and deserializes syntax definitions and themes to/from compact binary format for asset caching
globset (library)
Provides glob pattern matching for syntax mapping configuration and file filtering

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare bat

Related Cli Tool Repositories

Frequently Asked Questions

What is bat used for?

Displays file contents with syntax highlighting, line numbers, git integration, and automatic paging sharkdp/bat is a 8-component cli tool written in Rust. Data flows through 8 distinct pipeline stages. The codebase contains 105 files.

How is bat architected?

bat is organized into 4 architecture layers: CLI Application, Core Library, Asset Management, Output Processing. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through bat?

Data moves through 8 stages: Parse command arguments and load configuration → Discover and validate input files → Load syntax highlighting assets → Open and inspect file contents → Map file to syntax definition → .... Files enter through command-line arguments or stdin, get processed for syntax detection, are tokenized by syntect's regex-based parser into syntax scopes, then formatted with ANSI color codes and decorations like line numbers, and finally output either directly to terminal or through a pager like less. Git integration runs in parallel to detect file modifications. This pipeline design reflects a complex multi-stage processing system.

What technologies does bat use?

The core stack includes syntect (Provides syntax highlighting engine with regex-based parsing and theme application using Sublime Text grammar files), clap (Handles command-line argument parsing with subcommands, validation, help generation, and shell completion support), git2 (Integrates with git repositories to detect file modifications and show status indicators in the left margin), console (Provides terminal capability detection, ANSI escape sequence handling, and cross-platform terminal interaction), encoding_rs (Handles text encoding detection and conversion for files that aren't UTF-8, including BOM processing), minus (Pure Rust pager implementation used as fallback when system pagers (less/more) aren't available), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does bat have?

bat exhibits 3 data pools (Embedded syntax assets, User asset cache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle polling and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does bat use?

4 design patterns detected: Lazy Asset Loading, Builder API, Multi-source Configuration, Input Abstraction.

How does bat compare to alternatives?

CodeSea has side-by-side architecture comparisons of bat with fd. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by .