Enter your prompt
Four models compete. One neutral judge scores. The best answer wins.
0 chars ctrl+enter to run
Battle History
Every arena run this session.
No battles yet — enter the arena ⚡
How it works
Three phases. One unbiased judge. The best answer wins.
01
Generate

Your prompt is dispatched to all four models simultaneously, each operating under a unique persona. Analytical precision, step-by-step reasoning, creative insight, and philosophical depth.

02
Judge

A dedicated Gemma 7B model — running on a completely separate API key — scores every response across Accuracy, Clarity, Depth and Creativity. It never competes.

03
Synthesize

The top-ranked model synthesizes all responses into one definitive champion answer, weaving together the strongest insights from every competitor into a single brilliant response.

Full Roster
GPT-OSS 120B
NVIDIA NIM
COMPETITOR
Analytical precision
DeepSeek R1
NVIDIA NIM
COMPETITOR
Step-by-step reasoning
Phi-3 Mini 128K
NVIDIA NIM
COMPETITOR
Sharp & practical
Llama 3.3 70B
Groq ⚡
COMPETITOR
Nuanced & comprehensive
Gemma 7B
NVIDIA NIM
JUDGE ONLY
Neutral · Never competes