Enter your prompt
Four models compete. One neutral judge scores. The best answer wins.
Battle History
Every arena run this session.
No battles yet — enter the arena ⚡
How it works
Three phases. One unbiased judge. The best answer wins.
01
Generate
Your prompt is dispatched to all four models simultaneously, each operating under a unique persona. Analytical precision, step-by-step reasoning, creative insight, and philosophical depth.
02
Judge
A dedicated Gemma 7B model — running on a completely separate API key — scores every response across Accuracy, Clarity, Depth and Creativity. It never competes.
03
Synthesize
The top-ranked model synthesizes all responses into one definitive champion answer, weaving together the strongest insights from every competitor into a single brilliant response.
Full Roster
◈
GPT-OSS 120B
NVIDIA NIM
COMPETITOR
Analytical precision
◈
DeepSeek R1
NVIDIA NIM
COMPETITOR
Step-by-step reasoning
◈
Phi-3 Mini 128K
NVIDIA NIM
COMPETITOR
Sharp & practical
◈
Llama 3.3 70B
Groq ⚡
COMPETITOR
Nuanced & comprehensive
⚖
Gemma 7B
NVIDIA NIM
JUDGE ONLY
Neutral · Never competes