Benchmark Results - Search News

China’s LineShine Tops Supercomputer List Despite AI Benchmark Gap

China’s LineShine tops the June 2026 TOP500 supercomputer list, though mixed-precision results leave El Capitan stronger on ...

1mon

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing single-model systems from Anthropic and OpenAI by using more than 100 specialized AI ...

Tech Times

ChatGPT Pro Is Splitting Into Three: GPT-5.6 Benchmark Reveals Luna, Terra, Sol Pro

ChatGPT Pro tier split may be coming: a June 30 OpenAI genomics paper lists GPT-5.6 Luna Pro, Terra Pro, and Sol Pro — the ...

1mon

MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost

M3 demonstrates that the next phase of agent development will not just be driven by larger datasets, but by efficient architectural choices.

15d

Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again

B, a 3-billion-parameter AI model, is challenging OpenAI, Google and DeepSeek on math and coding benchmarks while reigniting the debate over AI scaling, benchmark gaming and small-model reasoning.

Artificial Lawyer

What Legal AI Benchmarks Reveal That Model Names Don’t

By Daniel Lewis, CEO, LegalOn. Foundation models are improving quickly. One useful measure is software engineering: the ...

PCMag

New 3DMark Benchmark Test Will Let You Use Upscaling, Frame Gen to Boost FPS

The Thermal Grizzly stand at Computex 2026 has been running what could be the first public demo of the next-generation 3DMark ...

MIT Technology Review

How to build a better AI benchmark

To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results