sharkdp/fd
A simple, fast and user-friendly alternative to 'find'
Searches filesystem paths using regex patterns with parallel traversal
User arguments flow through clap parsing into a Config struct containing compiled regex patterns and filter settings. The walk() function spawns parallel threads that traverse directories using the ignore crate, respecting gitignore rules and sending discovered paths as WorkerResult enums through crossbeam channels. Each path flows through a filter pipeline checking regex patterns, file types, size, time, and ownership constraints. Surviving paths are either formatted with colors and templates for stdout output, or used to execute user commands with placeholder substitution.
Under the hood, the system uses 3 data pools, 5 control points to manage its runtime behavior.
A 9-component cli tool. 23 files analyzed. Data flows through 5 distinct pipeline stages.
How Data Flows Through the System
User arguments flow through clap parsing into a Config struct containing compiled regex patterns and filter settings. The walk() function spawns parallel threads that traverse directories using the ignore crate, respecting gitignore rules and sending discovered paths as WorkerResult enums through crossbeam channels. Each path flows through a filter pipeline checking regex patterns, file types, size, time, and ownership constraints. Surviving paths are either formatted with colors and templates for stdout output, or used to execute user commands with placeholder substitution.
- Parse CLI arguments into configuration — Opts::parse() uses clap to validate command-line arguments, then run() function builds Config struct by compiling regex patterns with RegexBuilder, creating file type filters from --type options, and parsing size/time constraints
- Walk filesystem in parallel threads — walk() function creates ignore::WalkParallel with configured thread count, applies gitignore and fdignore rules, spawns worker threads that traverse directory trees and send WorkerResult::Entry or WorkerResult::Error through crossbeam channels [Config → WorkerResult] (config: threads, ignore_hidden, read_vcsignore)
- Filter paths through constraint pipeline — Each WorkerResult flows through pattern.is_match() for regex matching, file_types.should_ignore() for type filtering, size_limits.filter() for size constraints, time_constraints.apply() for modification time bounds, and owner_filter.matches() for Unix ownership validation [WorkerResult → DirEntry] (config: pattern, file_types, size_limits +1)
- Execute commands on matched paths — If --exec or --exec-batch specified, CommandSet.execute() substitutes path placeholders in command templates using FormatTemplate.generate(), spawns processes with std::process::Command, captures output and merges exit codes [DirEntry → Command output] (config: command, path_separator, null_separator)
- Format output with colors and templates — print_entry() applies LsColors styling based on file types, substitutes FormatTemplate placeholders with actual path components using basename/dirname functions, generates terminal hyperlinks if enabled, and writes with null or newline separation [DirEntry → Format output] (config: format, ls_colors, hyperlink +1)
Data Models
The data structures that flow between stages — the contracts that hold the system together.
src/config.rsstruct with case_sensitive: bool, full_path_base: Option<PathBuf>, ignore_hidden: bool, read_fdignore: bool, pattern: Arc<regex::bytes::Regex>, ls_colors: Option<LsColors>, format: Option<FormatTemplate>, threads: usize, null_separator: bool, and many more search configuration fields
Built from CLI arguments in main(), passed immutably to all processing components, contains compiled regex patterns and filter configurations
src/dir_entry.rsstruct wrapping ignore::DirEntry or PathBuf for broken symlinks, with cached metadata: OnceCell<Option<Metadata>> and style: OnceCell<Option<Style>>
Created during directory traversal, passed through filter chain, consumed by output formatter or command executor
src/fmt/mod.rsenum with Tokens(Vec<Token>) containing placeholders like Basename, Parent, NoExt, or Text(String) for fixed output
Parsed from user format string, applied to each DirEntry to generate final output paths with substitutions
src/exec/mod.rsstruct with mode: ExecutionMode (OneByOne or Batch) and commands: Vec<CommandTemplate> containing parsed command templates with placeholders
Built from --exec or --exec-batch arguments, validates placeholder usage, executes system commands with path substitution
src/walk.rsenum with Entry(DirEntry) for successful path discovery or Error(ignore::Error) for filesystem access failures
Sent from walker threads through crossbeam channels, processed by filter pipeline or error handlers
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Command template arguments contain valid UTF-8 strings when converted with as_ref() but never validates encoding - assumes all filesystem paths and command strings are valid Unicode
If this fails: If a filesystem path contains invalid UTF-8 bytes (common on Unix systems), command template parsing silently produces malformed strings or panics during command substitution
src/exec/mod.rs:CommandTemplate::new
Batch commands assume the first argument (args[0]) is always a valid executable path but only checks has_tokens() - never validates the executable exists or is executable
If this fails: Batch mode will spawn processes that immediately fail with 'command not found' errors, but validation happens at execution time rather than argument parsing
src/exec/mod.rs:CommandSet::new_batch
Format string parsing assumes '{' and '}' characters have equal UTF-8 byte lengths (BRACE_LEN constant) but this is only true for ASCII braces
If this fails: If format strings somehow contain Unicode lookalike brace characters, string slicing will panic with 'byte index not on char boundary' errors
src/fmt/mod.rs:FormatTemplate::parse
Test environment assumes CARGO_BIN_EXE_fd environment variable points to a valid executable file but never validates the file exists or is executable
If this fails: Integration tests fail with obscure 'No such file or directory' errors if the environment variable points to a non-existent binary or build artifacts are corrupted
src/main.rs:find_fd_exe (tests)
Jemalloc allocator selection uses compile-time feature flags and platform detection but assumes the target system has sufficient virtual memory for jemalloc's memory mapping strategy
If this fails: On memory-constrained systems or containers with strict memory limits, jemalloc may fail to allocate large virtual memory regions, causing fd to crash with out-of-memory errors where the system allocator would succeed
src/main.rs:jemalloc configuration
Config assumes compiled regex patterns in Arc<Regex> remain valid for the entire program lifetime but never validates that regex compilation succeeded or handles regex engine limits
If this fails: If regex patterns exceed internal complexity limits or contain constructs that cause compilation to fail after Config creation, shared regex access across threads produces undefined behavior or panics
src/config.rs:Config struct
Owner filter parsing assumes Unix user/group name resolution will always succeed for valid names but user/group databases can be unavailable or inconsistent
If this fails: When /etc/passwd or LDAP is unavailable, or when running in containers with different user namespaces, owner filters fail with unclear error messages instead of gracefully degrading
src/filter/owner.rs:OwnerFilter::from_string
Size filter uses hardcoded multiplier constants (TERA = 1000^4) that assume file sizes fit in u64, but on systems with 128-bit filesystems or future storage, this creates an artificial 16 exabyte limit
If this fails: Files larger than u64::MAX bytes (18 EB) cause size filter arithmetic to overflow silently, producing incorrect size comparisons for very large files
src/filter/size.rs:SizeFilter constants
Time parsing assumes system timezone database is available and consistent but never handles timezone data corruption or missing zoneinfo files
If this fails: On systems with corrupted tzdata or in containers without timezone information, time filter parsing panics or produces incorrect timestamp comparisons, making time-based searches unreliable
src/filter/time.rs:TimeFilter::from_str
Terminal hyperlink generation assumes stdout is connected to a terminal that supports OSC 8 escape sequences but never validates terminal capabilities
If this fails: When output is redirected to files or piped to programs that don't handle escape sequences, hyperlink codes appear as literal garbage text in the output stream
src/output.rs:print_entry
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Arc-wrapped compiled Regex objects shared across worker threads to avoid recompilation overhead during path matching
Crossbeam channels that buffer discovered filesystem paths from parallel walker threads before filter processing
OnceCell containers in DirEntry that lazily cache filesystem metadata and color styling to avoid redundant system calls
Delays
- Filesystem traversal (async-processing) — Worker threads block on filesystem I/O while reading directory contents and file metadata, with parallelism providing throughput despite individual blocking operations
- Command execution (async-processing) — Each spawned command process blocks until completion, with optional output buffering when running multiple threads to prevent interleaved output
- Regex compilation (compilation) — Pattern compilation happens once at startup using RegexBuilder, then cached in Arc for shared access across worker threads
Control Points
- Thread count (runtime-toggle) — Controls: Number of parallel worker threads for filesystem traversal, defaults to number of CPU cores. Default: num_cpus::get()
- Case sensitivity (feature-flag) — Controls: Whether regex pattern matching is case sensitive or uses smart-case detection based on uppercase characters in pattern. Default: smart case (auto-detect)
- Ignore file handling (feature-flag) — Controls: Whether to respect .gitignore, .fdignore, and other VCS ignore files during directory traversal. Default: enabled by default
- Color output (env-var) — Controls: When to apply ANSI color codes using LsColors - always, never, or auto-detect based on terminal capabilities. Default: auto (isatty detection)
- Memory allocator (architecture-switch) — Controls: Whether to use jemalloc for better performance on supported platforms, with platform-specific conditional compilation. Default: jemalloc on Linux, system allocator on macOS/Windows
Technology Stack
Provides parallel directory traversal with built-in gitignore and VCS ignore file parsing, handling all the complexity of filesystem walking and ignore rule application
Compiles user search patterns into optimized finite automata for fast path matching during filesystem traversal
Parses command-line arguments with automatic help generation, argument validation, and shell completion support using derive macros
Provides high-performance multi-producer single-consumer channels for passing discovered filesystem paths between walker threads and filter processing
Interprets LS_COLORS environment variable and applies appropriate ANSI color codes based on file types and extensions
Replaces system allocator with jemalloc on supported platforms for better memory allocation performance during intensive filesystem operations
Efficiently matches multiple placeholder patterns simultaneously in format string parsing without backtracking
Parses time expressions and performs date arithmetic for time-based filtering constraints like 'modified within last week'
Key Components
- Opts (parser) — Uses clap derive macros to parse command-line arguments, validate option combinations, and provide help text with argument validation
src/cli.rs - run (orchestrator) — Main coordination function that parses CLI args, builds regex patterns and filters, configures the walker, and either outputs results or executes commands
src/main.rs - walk (executor) — Creates parallel directory walker using ignore crate, spawns worker threads, applies ignore rules from gitignore files, and sends discovered paths through channels
src/walk.rs - SizeFilter (validator) — Parses size constraints like '>1M' or '<500k', validates file metadata against size bounds using SI and binary prefixes
src/filter/size.rs - TimeFilter (validator) — Parses time expressions using jiff crate, compares file modification times against before/after constraints with span arithmetic
src/filter/time.rs - OwnerFilter (validator) — Validates file ownership on Unix systems by parsing user:group specifications and checking against file metadata uid/gid
src/filter/owner.rs - FormatTemplate (transformer) — Parses format strings with placeholders like {/} for basename, {//} for parent, applies path transformations using aho-corasick pattern matching
src/fmt/mod.rs - CommandSet (executor) — Executes user commands on matched paths, handles both one-by-one and batch execution modes, manages process spawning and output buffering
src/exec/mod.rs - print_entry (formatter) — Renders final output with color schemes from lscolors, applies path separators, generates terminal hyperlinks, handles null vs newline separation
src/output.rs
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare fd
Related Cli Tool Repositories
Frequently Asked Questions
What is fd used for?
Searches filesystem paths using regex patterns with parallel traversal sharkdp/fd is a 9-component cli tool written in Rust. Data flows through 5 distinct pipeline stages. The codebase contains 23 files.
How is fd architected?
fd is organized into 5 architecture layers: CLI Interface, Configuration & Validation, Parallel Walker, Filtering Pipeline, and 1 more. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through fd?
Data moves through 5 stages: Parse CLI arguments into configuration → Walk filesystem in parallel threads → Filter paths through constraint pipeline → Execute commands on matched paths → Format output with colors and templates. User arguments flow through clap parsing into a Config struct containing compiled regex patterns and filter settings. The walk() function spawns parallel threads that traverse directories using the ignore crate, respecting gitignore rules and sending discovered paths as WorkerResult enums through crossbeam channels. Each path flows through a filter pipeline checking regex patterns, file types, size, time, and ownership constraints. Surviving paths are either formatted with colors and templates for stdout output, or used to execute user commands with placeholder substitution. This pipeline design reflects a complex multi-stage processing system.
What technologies does fd use?
The core stack includes ignore (Provides parallel directory traversal with built-in gitignore and VCS ignore file parsing, handling all the complexity of filesystem walking and ignore rule application), regex (Compiles user search patterns into optimized finite automata for fast path matching during filesystem traversal), clap (Parses command-line arguments with automatic help generation, argument validation, and shell completion support using derive macros), crossbeam-channel (Provides high-performance multi-producer single-consumer channels for passing discovered filesystem paths between walker threads and filter processing), lscolors (Interprets LS_COLORS environment variable and applies appropriate ANSI color codes based on file types and extensions), tikv-jemallocator (Replaces system allocator with jemalloc on supported platforms for better memory allocation performance during intensive filesystem operations), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does fd have?
fd exhibits 3 data pools (Compiled regex patterns, Worker result channels), 5 control points, 3 delays. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does fd use?
4 design patterns detected: Parallel Pipeline, Lazy Caching, Template Substitution, Builder Configuration.
How does fd compare to alternatives?
CodeSea has side-by-side architecture comparisons of fd with bat. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.