Which model is better? Run them side by side. 85KB.

Multi-model comparison with real inference and semantic diff.

Linux

The problem

You have three GGUF models — a base, a fine-tune, and a different quantization. Which is best for your use case? Today: load model A in Python (30 seconds + 8GB RAM), run prompts, note results. Close. Load model B. Repeat. Compare manually. For 5 models, this takes an hour of manual work.

The solution

OmniVersus loads multiple GGUF models via mmap, runs the same prompts through each, and shows a side-by-side comparison: output text, token probabilities, speed, and quality metrics. One command, one binary, 85KB.

Why Bare-Metal Matters

Loading multiple LLMs simultaneously is a memory management challenge that Python handles poorly. OmniVersus uses mmap to load models on-demand without copying to RAM, and runs a complete transformer for each model in sequence. 85KB vs 4GB+ of PyTorch makes this practical on any machine.

Technical Specifications

Feature	Value
Binary Size	~85KB
Function	Multi-model semantic comparison with real inference
Models	2+ GGUF models side-by-side
Dependencies	None — no Python, no PyTorch
Comparison	Token output, probabilities, speed, quality
Memory	mmap — models loaded on demand

Comparison

	OmniVersus	Manual (Python)	LM Eval Harness
Size	~85KB	4GB+ (PyTorch)	4GB+ (PyTorch)
Setup	One command	Load/unload models manually	Complex config
Side-by-side output	Built-in	Manual comparison	Benchmark scores only
Dependencies	None	Python, torch, transformers	Python, torch, datasets
Token probabilities	Per-token comparison	Custom code needed	Aggregate only

Use Cases

Quantization Comparison

Compare Q4_K vs Q6_K vs Q8_0 of the same model on your specific prompts. See exactly where quality differs.

Fine-tune Evaluation

Run your fine-tuned model against the base model on a prompt set. See improvement and regression per prompt.

Model Selection

Compare models from different providers (Qwen, Llama, Mistral) on your specific task. Pick the best one with data, not benchmarks.

Try Now — Free

Coming Soon

This product is under active development. Contact us for early access or to be notified when binaries are available.

Talk to the Team