Voice Assistant POC - Optimized
Real-time AI with Whisper + TinyLlama • Privacy-first • WebGPU accelerated
Loading AI models... Please wait
Select a language model to power your voice assistant
Fast, lightweight model (~650MB)
Very fast, small model (~350MB)
Classic model, good quality (~550MB)
4-bit quantized model (~118MB)
16-bit model, slower (~270MB)
These models use Google's MediaPipe for enhanced performance and efficiency.
Google's optimized Gemma 3 1B with Q4 quantization (~600MB)
⚡ 40% faster inference with Q4 quantization
Auto-fallback: SmolLM2 1.7B Q4 or TinyLlama if MediaPipe unavailable
Google's Gemma 3 1B standard precision (~1.0GB)
Auto-fallback: SmolLM2 1.7B or TinyLlama if MediaPipe unavailable
Larger Gemma model via MediaPipe (~5.2GB)
Auto-fallback: TinyLlama 1.1B if MediaPipe unavailable
Enter any ONNX text generation model from Hugging Face
Real-time AI with Whisper + TinyLlama • Privacy-first • WebGPU accelerated
Loading AI models... Please wait