Legal AI Intelligence
Rankings across 19 legal benchmarks — including LegalBench, bar exam performance, and contract analysis. Data sourced from llm-stats.com.
| # | Model | Maker | Legal Score | Strengths |
|---|---|---|---|---|
| 1 | Claude Fable 5 | 51.5 | ||
| 2 | Qwen3.7 Max | Alibaba Cloud / Qwen Team | 50.8 | Excellent multilingual coverage (50+ languages) |
| 3 | MiniMax M2.1 | MiniMax | 48.1 | 1M+ context window with usable recall |
| 4 | Claude Opus 4.8 | Anthropic | 46.5 | Long-form coherence — voice and structure stay consistent over thousands of tokens |
| 5 | Kimi K2.5 | Moonshot AI | 46 | Consistently top-5 on research and long-context retrieval |
Scores represent aggregate performance across all 19 legal benchmarks. Maximum possible score: 100. Data from llm-stats.com as of June 22, 2026.
llm-stats evaluates models across 19 tasks designed specifically for legal work. Here are the primary categories:
General benchmark rankings (coding, math) don't predict legal performance. A model that tops HumanEval may underperform on contract analysis or bar exam tasks.
Drafting intake emails, analyzing contracts, and answering legal research questions each favor different model strengths. The leaderboard helps identify the right tool for each task.
New models release monthly. A model that was best-in-class 6 months ago may have been surpassed. Check llm-stats regularly or talk to us about AI implementation for your firm.