2 releases
Uses new Rust 2024
| 0.1.1 | Mar 26, 2026 |
|---|---|
| 0.1.0 | Mar 25, 2026 |
#1 in #convert
665KB
9K
SLoC
piper-phoneme-streaming
piper-phoneme-streaming is a high-performance Rust library for streaming Text-to-Phoneme (G2P) conversion. It is built to seamlessly integrate with modern streaming Text-to-Speech (TTS) engines like Piper and others based on espeak-ng.
What Problems Does It Solve?
Typical G2P (Grapheme-to-Phoneme) approaches wait for a full sentence or paragraph before converting text to phonemes. In real-time or streaming TTS applications, this introduces unacceptable latency.
piper-phoneme-streaming addresses this by:
- Streaming natively: Processing text character-by-character and yielding phonemes as soon as there is enough context (e.g., at word boundaries).
- Dynamic Language Detection: Seamlessly handling mixed-language input on the fly. It can automatically detect language boundaries (e.g., mixing English and Vietnamese) and switch phonemization strategies mid-sentence without interrupting the stream.
- Accurate Text Normalization: Built-in strategies to expand abbreviations, dates, numbers, and acronyms sequentially before phonemization.
espeak-ngParity: Employs direct execution ofespeak-ng's binary phoneme table and dictionary formats to assure generated phonemes match exactly what Piper or other models expect.
How It Works
The library operates fundamentally in a push-based architecture via StreamingG2P:
- Text Expansion & Normalization: Input characters are processed by
TextExpand, which handles numbers, money, and typical abbreviations interactively. - Language Detection: If multiple languages are enabled, dynamic heuristics detect the language of incoming text batches on the fly.
- Word Phonemizer: The
WordPhonemizermatches the normalized text against the appropriate language's dictionary and runtime rules from embeddedespeak-ngdata. - Sentence Upgrade:
StreamingSentencePhonemeUpgradeapplies sentence-level syntax rules, stress assignments, and intonation corrections before finalizing the phoneme token stream.
Usage Examples
Streaming Conversion
The streaming API enables progressive consumption of text.
use piper_phoneme_streaming::{StreamingG2P, Language};
fn main() {
// Initialize the engine with supported languages
let g2p = StreamingG2P::with_languages(
&[Language::English, Language::Vietnamese],
Language::English
).unwrap();
// Create a new streaming session (maintains state across pushed chunks)
let mut session = g2p.new_session();
let text = "Hello world. Xin chào thế giới.";
// Push characters individually or in chunks
for ch in text.chars() {
let output = g2p.push_text(&mut session, &ch.to_string()).unwrap();
for phoneme in output {
print!("{}", phoneme.token);
}
}
// Flush any remaining buffered phonemes once the stream ends
let tail = g2p.finish(&mut session).unwrap();
for phoneme in tail {
print!("{}", phoneme.token);
}
}
Normal Conversion
If streaming is not required, you can use the full conversion API to process the entire result at once.
use piper_phoneme_streaming::{FullG2p, Language};
fn main() {
let g2p = FullG2p::new(Language::English).unwrap();
let out = g2p.g2p("Hello world!").unwrap();
let out_str: String = out.iter().map(|t| t.token).collect();
println!("{}", out_str);
}
Adding to Your Project
Add the dependency to your Cargo.toml:
[dependencies]
piper-phoneme-streaming = { path = "..." } # Or specify version if published
Dependencies
~5.5–8MB
~119K SLoC