4 releases (stable)
Uses new Rust 2024
| 1.5.2 | Apr 20, 2026 |
|---|---|
| 1.5.1 | Apr 6, 2026 |
| 1.5.0 | Apr 3, 2026 |
| 0.0.0 | Mar 31, 2026 |
#21 in #chunking
122,222 downloads per month
Used in 9 crates
1MB
18K
SLoC
Data processing pipeline for chunking, deduplication, and file reconstruction, used in the Hugging Face Xet storage tools.
Provides content-defined chunking via gear hashing, deduplication against metadata shards, and file reconstruction from deduplicated chunk references.
xet-data
Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.
Overview
- Content-defined chunking — Gear-hash based chunking for deduplication
- Deduplication — Probe and register chunks against metadata shards
- File reconstruction — Reassemble files from deduplicated chunk references
- Progress tracking — Hooks for upload/download progress reporting
This crate is part of xet-core.
License
Apache-2.0
Dependencies
~26–41MB
~692K SLoC