19 releases (4 breaking)

Uses new Rust 2024

0.5.0 May 8, 2026
0.4.0 Mar 24, 2026
0.3.13 Mar 22, 2026
0.2.1 Mar 10, 2026
0.1.0 Feb 26, 2026

#3 in #llm


Used in 9 crates (5 directly)

MIT/Apache

450KB
8K SLoC

pmetal-gguf

GGUF file format support for llama.cpp and Ollama compatibility.

Overview

This crate provides reading and writing support for the GGUF (GPT-Generated Unified Format) file format, enabling compatibility with llama.cpp, Ollama, and other GGUF-compatible inference engines.

Features

  • GGUF Reading: Parse GGUF files and extract metadata/tensors
  • GGUF Writing: Create GGUF files from SafeTensors/PyTorch models
  • Tensor Dequantization: Convert quantized tensors to full precision
  • Metadata Handling: Read/write model metadata and tokenizer info

Usage

Reading GGUF Files

use pmetal_gguf::GgufContent;

// Load GGUF file
let gguf = GgufContent::from_file("model.gguf")?;

// Access metadata
println!("Architecture: {}", gguf.metadata.get("general.architecture")?);
println!("Context length: {}", gguf.metadata.get("llama.context_length")?);

// Iterate tensors
for (name, tensor) in gguf.tensors() {
    println!("{}: {:?}", name, tensor.shape());
}

Dequantizing Tensors

use pmetal_gguf::{GgufContent, dequant};

let gguf = GgufContent::from_file("model-q4.gguf")?;

// Dequantize a specific tensor
let weights = gguf.get_tensor("model.layers.0.self_attn.q_proj.weight")?;
let fp32_weights = dequant::dequantize(&weights)?;

Converting to GGUF

use pmetal_gguf::{GgufWriter, Quantization};

let mut writer = GgufWriter::new("output.gguf")?;

// Set metadata
writer.set_metadata("general.architecture", "llama")?;
writer.set_metadata("general.name", "My Model")?;

// Add tensors with optional quantization
writer.add_tensor("model.embed_tokens.weight", &embeddings, Quantization::None)?;
writer.add_tensor("model.layers.0.self_attn.q_proj.weight", &weights, Quantization::Q4_K)?;

writer.finish()?;

Supported Quantization Types

Type Bits Description
F32 32 Full precision
F16 16 Half precision
Q8_0 8 8-bit quantization
Q4_0 4 4-bit quantization
Q4_K 4 K-quant (higher quality)
Q5_K 5 K-quant
Q6_K 6 K-quant

Modules

Module Description
reader GGUF file parsing
quantize GGUF file creation and quantization
dequant Tensor dequantization
dynamic Dynamic quantization
imatrix Importance matrix support
k_quants K-quant implementations
iq_quants IQ-quant implementations
vec_dot Vectorized dot product kernels

License

MIT OR Apache-2.0

Dependencies

~6.5–10MB
~105K SLoC