China’s LineShine tops the June 2026 TOP500 supercomputer list, though mixed-precision results leave El Capitan stronger on ...
Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing single-model systems from Anthropic and OpenAI by using more than 100 specialized AI ...
ChatGPT Pro tier split may be coming: a June 30 OpenAI genomics paper lists GPT-5.6 Luna Pro, Terra Pro, and Sol Pro — the ...
M3 demonstrates that the next phase of agent development will not just be driven by larger datasets, but by efficient architectural choices.
B, a 3-billion-parameter AI model, is challenging OpenAI, Google and DeepSeek on math and coding benchmarks while reigniting the debate over AI scaling, benchmark gaming and small-model reasoning.
By Daniel Lewis, CEO, LegalOn. Foundation models are improving quickly. One useful measure is software engineering: the ...
The Thermal Grizzly stand at Computex 2026 has been running what could be the first public demo of the next-generation 3DMark ...
To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results