OmniVersus Coming Soon

~85KB

Which model is better? Run them side by side. 85KB.

Multi-model comparison with real inference and semantic diff.

LinuxLinux

The problem

You have three GGUF models — a base, a fine-tune, and a different quantization. Which is best for your use case? Today: load model A in Python (30 seconds + 8GB RAM), run prompts, note results. Close. Load model B. Repeat. Compare manually. For 5 models, this takes an hour of manual work.

The solution

OmniVersus loads multiple GGUF models via mmap, runs the same prompts through each, and shows a side-by-side comparison: output text, token probabilities, speed, and quality metrics. One command, one binary, 85KB.

Why Bare-Metal Matters

Loading multiple LLMs simultaneously is a memory management challenge that Python handles poorly. OmniVersus uses mmap to load models on-demand without copying to RAM, and runs a complete transformer for each model in sequence. 85KB vs 4GB+ of PyTorch makes this practical on any machine.

Technical Specifications

Feature Value
Binary Size ~85KB
Function Multi-model semantic comparison with real inference
Models 2+ GGUF models side-by-side
Dependencies None — no Python, no PyTorch
Comparison Token output, probabilities, speed, quality
Memory mmap — models loaded on demand

Comparison

OmniVersus Manual (Python) LM Eval Harness
Size ~85KB 4GB+ (PyTorch)4GB+ (PyTorch)
Setup One command Load/unload models manuallyComplex config
Side-by-side output Built-in Manual comparisonBenchmark scores only
Dependencies None Python, torch, transformersPython, torch, datasets
Token probabilities Per-token comparison Custom code neededAggregate only

Use Cases

Quantization Comparison

Compare Q4_K vs Q6_K vs Q8_0 of the same model on your specific prompts. See exactly where quality differs.

Fine-tune Evaluation

Run your fine-tuned model against the base model on a prompt set. See improvement and regression per prompt.

Model Selection

Compare models from different providers (Qwen, Llama, Mistral) on your specific task. Pick the best one with data, not benchmarks.

Try Now — Free

Coming Soon

This product is under active development. Contact us for early access or to be notified when binaries are available.

Talk to the Team