~85KB
Which model is better? Run them side by side. 85KB.
Multi-model comparison with real inference and semantic diff.
The problem
You have three GGUF models — a base, a fine-tune, and a different quantization. Which is best for your use case? Today: load model A in Python (30 seconds + 8GB RAM), run prompts, note results. Close. Load model B. Repeat. Compare manually. For 5 models, this takes an hour of manual work.
The solution
OmniVersus loads multiple GGUF models via mmap, runs the same prompts through each, and shows a side-by-side comparison: output text, token probabilities, speed, and quality metrics. One command, one binary, 85KB.
Why Bare-Metal Matters
Loading multiple LLMs simultaneously is a memory management challenge that Python handles poorly. OmniVersus uses mmap to load models on-demand without copying to RAM, and runs a complete transformer for each model in sequence. 85KB vs 4GB+ of PyTorch makes this practical on any machine.
Technical Specifications
| Feature | Value |
|---|---|
| Binary Size | ~85KB |
| Function | Multi-model semantic comparison with real inference |
| Models | 2+ GGUF models side-by-side |
| Dependencies | None — no Python, no PyTorch |
| Comparison | Token output, probabilities, speed, quality |
| Memory | mmap — models loaded on demand |
Comparison
| OmniVersus | Manual (Python) | LM Eval Harness | |
|---|---|---|---|
| Size | ~85KB | 4GB+ (PyTorch) | 4GB+ (PyTorch) |
| Setup | One command | Load/unload models manually | Complex config |
| Side-by-side output | Built-in | Manual comparison | Benchmark scores only |
| Dependencies | None | Python, torch, transformers | Python, torch, datasets |
| Token probabilities | Per-token comparison | Custom code needed | Aggregate only |
Use Cases
Quantization Comparison
Compare Q4_K vs Q6_K vs Q8_0 of the same model on your specific prompts. See exactly where quality differs.
Fine-tune Evaluation
Run your fine-tuned model against the base model on a prompt set. See improvement and regression per prompt.
Model Selection
Compare models from different providers (Qwen, Llama, Mistral) on your specific task. Pick the best one with data, not benchmarks.
Try Now — Free
Coming Soon
This product is under active development. Contact us for early access or to be notified when binaries are available.
Talk to the Team