Top AI Models of 2026: Ranking the Smartest Systems

The article digs into TrackingAI’s April 2026 benchmark of the Mensa Norway visual-pattern test. It shows a fast-closing gap among top AI visual reasoning models and points out how methodology and test design can really shape how we read these results.

But let’s not get carried away—visual-pattern tasks only tap into a sliver of what we’d call intelligence. They’re not some all-encompassing measure of capability.

Table of Contents

Benchmark snapshot: top performers and context

Grok-4.20 Expert Mode and OpenAI GPT‑5.4 Pro (Vision) both landed at the top with a score of 145, according to TrackingAI’s April 2026 results. Just behind them, Gemini 3.1 Pro Preview scored 141, making it a pretty tight race at the upper tier.

Meanwhile, Mistral’s best model only managed 97, showing a much wider gap for some major players. Last year, the top benchmark score was 135. So, jumping to 145 in a single year? That’s a hefty leap—AI visual reasoning is moving fast.

The leaderboard (shoutout to Visual Capitalist for the reproduction) gives us a snapshot of how these models are bunching up at the top in this one niche area. It’s not trying to rank them for stuff like coding, factual accuracy, or tool use—just this specific cognitive lane.

Leaders and notable gaps

Top score of 145 for Grok-4.20 Expert Mode and OpenAI GPT‑5.4 Pro (Vision).
Second place at 141 for Gemini 3.1 Pro Preview.
Mistral’s leading model at 97, showing progress isn’t even across the board.
Year-over-year jump from 135 in 2025 to 145 in 2026 at the top, highlighting rapid gains in these visual pattern tasks.

There’s a catch with the April 2026 results. TrackingAI uses a 35-question Mensa Norway test, sending images to vision models but giving non-vision models just verbal prompts.

This difference in how the test is delivered can mess with direct comparisons. So, the ranking really only tells us about visual-pattern recognition, not overall intelligence or how these models do in the wild.

Methodology and interpretation: what the numbers mean

TrackingAI points out some quirks in their protocol. If a model refuses to answer, they try up to 10 times and take the most recent answer for scoring.

This could give an edge to models that are more willing to take a stab at a question, while more cautious systems might get left behind. Visual Capitalist calls the leaderboard a frontier snapshot—it’s just a glimpse of reasoning at the cutting edge, not a universal scoreboard for every kind of ability.

Since the benchmark is all about visual-pattern recognition, vision-enabled models naturally come out looking better. These results don’t really say much about other key AI skills, like factual recall or using tools in complex, non-visual tasks.

All in all, there’s a clear trend: the top of the AI reasoning game is getting crowded, but progress is still all over the map depending on the developer and the model’s architecture.

Implications for researchers and practitioners

For researchers, the April 2026 results highlight just how important it is to align test formats with the skills you’re actually measuring. If vision feeds come into play, you really need to consider how image understanding, prompt phrasing, and the way responses are handled can change the scores.

Practitioners and users might see a clear message here: frontier models are getting better at cognitive-style tasks that look like structured reasoning. Still, their reliability, safety, and ability to work across different domains just aren’t consistent across the board.

Here is the source article for this story: Ranked: The Smartest AI Models of 2026

Additional Reading:

Benchmark snapshot: top performers and context

Leaders and notable gaps

Methodology and interpretation: what the numbers mean

Implications for researchers and practitioners

Related Posts