#vector-search #filtering #metadata #iqdb #hybrid-search

iqdb-filter

Canonical metadata-filter evaluator for vector search: validate-on-construction and infallible per-row evaluation - part of the iQDB family

4 releases (1 stable)

Uses new Rust 2024

new 1.0.0 Jun 6, 2026
0.5.0 Jun 6, 2026
0.4.0 Jun 6, 2026
0.3.0 Jun 6, 2026

#3 in Algorithms

30 downloads per month
Used in 5 crates (4 directly)

Apache-2.0 OR MIT

85KB
913 lines

Rust logo
iqdb-filter
iQDB HYBRID FILTERING

Crates.io Downloads docs.rs CI MSRV

iqdb-filter is the metadata-filtering layer for the iQDB vector-database spine. It is the one place that decides what a Filter means: every index that honours metadata filters delegates here, so the semantics can never drift between implementations.

It evaluates the Filter expression language defined in iqdb-types against a record's metadata, with strict, predictable closed-world rules and validation that bounds every filter before it runs.



MSRV is 1.87+ (Rust 2024 edition). Validate once, evaluate per-row. No panics on hostile input. ~19 ns to evaluate a compound predicate.

Status: stable (1.0). The public API is committed under SemVer for the 1.x series — no breaking changes until 2.0. See CHANGELOG.md.


What it does

  • Canonical evaluator — one implementation of Filter semantics, shared by every metadata-aware index so query results never diverge
  • Validate once, evaluate manyFilterEvaluator::new checks the filter (depth, In cardinality) a single time; evaluate is then infallible and runs per-row inside the search loop
  • Closed-world semantics — a leaf over an absent field is false, type mismatches are false, NaN orderings are false, and Not of a false leaf is true (the "records without this field" idiom)
  • DoS-hardened — iterative validation that can't stack-overflow, with bounded depth and In width; the library never panics on adversarial input
  • Scan helpersprefilter / postfilter apply the evaluator as lazy, allocation-free iterator adapters over a stream of (key, metadata) pairs
  • Strategy selection — a selectivity estimate drives an automatic PreFilter / PostFilter choice, with a tunable threshold
  • Inverted index — an opt-in, per-field MetadataIndex resolves selective Eq / In predicates to a candidate set (a superset of true matches) and backs a sharper, count-based selectivity estimate
  • First-party only — depends solely on iqdb-types, so it is unblocked today

Installation

[dependencies]
iqdb-filter = "1.0"

Quick start

Build an evaluator once, then test it against each record's metadata:

use iqdb_filter::FilterEvaluator;
use iqdb_types::{Filter, Metadata, Value};

// published == true AND year > 2000
let filter = Filter::and(vec![
    Filter::eq("published", Value::Bool(true)),
    Filter::gt("year", Value::Int(2000)),
]);
let evaluator = FilterEvaluator::new(filter).expect("valid filter");

let meta: Metadata = [
    ("published".to_string(), Value::Bool(true)),
    ("year".to_string(), Value::Int(2026)),
]
.into_iter()
.collect();

assert!(evaluator.evaluate(Some(&meta)));
assert!(!evaluator.evaluate(None)); // no metadata -> every leaf is false

The Not / absent-field idiom selects records that lack a field, or carry it with a non-matching value:

use iqdb_filter::FilterEvaluator;
use iqdb_types::{Filter, Value};

// "records that are not authored by ada" — including records with no author.
let evaluator =
    FilterEvaluator::new(Filter::not(Filter::eq("author", Value::String("ada".into()))))
        .expect("valid filter");

assert!(evaluator.evaluate(None));

Validation rejects pathological filters up front — bounded by the public caps:

use iqdb_filter::{FilterEvaluator, MAX_IN_VALUES};
use iqdb_types::{Filter, IqdbError, Value};

// An `In` set wider than the cap is refused before it can slow a query.
let huge = vec![Value::Int(0); MAX_IN_VALUES + 1];
let err = FilterEvaluator::new(Filter::is_in("tag", huge)).unwrap_err();
assert_eq!(err, IqdbError::InvalidFilter);

Apply a strategy with the scan helpers, or let the selectivity estimate pick one:

use iqdb_filter::{FilterEvaluator, FilterStrategy, choose_strategy};
use iqdb_types::{Filter, Metadata, Value};

let evaluator = FilterEvaluator::new(Filter::eq("lang", Value::String("rust".into())))
    .expect("valid filter");

// `prefilter` keeps the keys of matching candidates, lazily, before scoring.
let rust: Metadata = [("lang".to_string(), Value::String("rust".into()))]
    .into_iter()
    .collect();
let go: Metadata = [("lang".to_string(), Value::String("go".into()))]
    .into_iter()
    .collect();
let rows = [(0_usize, Some(&rust)), (1, Some(&go))];
let kept: Vec<usize> = evaluator.prefilter(rows).collect();
assert_eq!(kept, [0]);

// An equality predicate is narrow, so the selector recommends pre-filtering.
assert_eq!(choose_strategy(&evaluator), FilterStrategy::PreFilter);

For repeated queries, build an opt-in MetadataIndex so a selective predicate resolves to a candidate set instead of scanning every row:

use iqdb_filter::{FilterEvaluator, MetadataIndex};
use iqdb_types::{Filter, Metadata, Value};

let rows = [
    (0_usize, [("lang".to_string(), Value::String("rust".into()))].into_iter().collect::<Metadata>()),
    (1, [("lang".to_string(), Value::String("go".into()))].into_iter().collect::<Metadata>()),
    (2, [("lang".to_string(), Value::String("rust".into()))].into_iter().collect::<Metadata>()),
];

// Index only the `lang` field.
let index = MetadataIndex::build(&["lang"], rows.iter().map(|(k, m)| (*k, Some(m))));

let evaluator = FilterEvaluator::new(Filter::eq("lang", Value::String("rust".into())))
    .expect("valid filter");

// `candidates` returns a superset of true matches; confirm with `evaluate`.
let mut hits: Vec<usize> = match index.candidates(&evaluator) {
    Some(cands) => cands,
    None => (0..rows.len()).collect(), // unbounded predicate -> full scan
};
hits.sort_unstable();
assert_eq!(hits, [0, 2]);

Errors

FilterEvaluator::new returns iqdb_types::Result; the only failure is IqdbError::InvalidFilter, returned when a filter exceeds MAX_FILTER_DEPTH nesting or carries an In node wider than MAX_IN_VALUES. After a filter is validated, evaluate is infallible and never panics — including on records with no metadata, type mismatches, and NaN values.


Status

v1.0.0stable. The full surface — the canonical FilterEvaluator (validate-on-construction, infallible allocation-free per-row evaluate), the prefilter / postfilter scan helpers, estimate_selectivity

  • the selector (choose_strategy / StrategySelector), and the opt-in per-field MetadataIndex — is committed under SemVer for the 1.x series: no breaking changes until 2.0. It is exercised by unit, integration, and property tests, a consumer-simulation suite that builds a filtered top-k searcher on the public API alone, and fuzz targets that drive the no-panic and superset contracts over unbounded input; all verified across the CI matrix (Linux, macOS, Windows) on stable and the 1.87 MSRV. The one remaining feature, InFilter pushdown into graph traversal, is additive (FilterStrategy is #[non_exhaustive]) and will ship in a later 1.x release when an approximate-index consumer drives it (see the ROADMAP). The full surface is documented in docs/API.md.


Where It Fits

iqdb-filter sits just above the types crate and is consumed by the index layer:

  • iqdb-types — the Filter, Metadata, and Value types it evaluates
  • iqdb-flat / iqdb-hnsw / iqdb-ivf — delegate here for metadata filtering
  • iqdb — exposes filtered search to users

Its only first-party dependency is iqdb-types, so it is unblocked today.


Standards

Built to the iQDB Rust standard. See REPS.md (Rust Efficiency & Performance Standards) and dev/DIRECTIVES.md for the engineering law and the definition of done. Before a PR: cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean.


License

Licensed under either of

at your option.

COPYRIGHT © 2026 JAMES GOBER.

Dependencies

~0.5–1MB
~21K SLoC