Benchmark laboratory
The price has dropped by 66%, and the performance is still the ceiling? Claude Opus 4.5 Who panicked by this wave of “price reduction strikes”?丨302.AI Benchmark laboratory
On November 25th, when the spotlight of the big model competition was still flowing between GPT-5.1 and Gemini 3 Pro, Anthropic brought its king product Claude Opus 4.5 back strongly, and claimed that this is currently the most powerful model in programming, agents, and computer use on a global scale, with programming capabilities surpassing humans.expert. The most eye-catching trump card of the Claude series has always been its dominant performance in the field of programming. In the real world of authority.…
After finishing the parameter volume "personality”? Grok 4.1 Actual measurement: full EQ,编程大幅提升丨302.AI Benchmark laboratory
Last week, when the eyes of the entire AI circle focused on the iterations of the two giants Google and OpenAI, xAI once again used its iconic raid method to open the Grok 4.1 series model for free to all users in the early hours of November 18th. This means that in just four months, the Grok 4 series has completed a key upgrade, and this upgrade clearly conveys xAI's unique competitive strategy to the outside world: the next frontier of the large model may no longer be the cold computing power and parameters, but the cold computing power and parameters.…
All six battles were won! 4K output, from infographic to ultra-realistic portrait: Nano Banana Pro重回王座丨302.AI Benchmark laboratory
The smoke of the LLM battlefield this week has not dissipated, and Google has dropped another blockbuster. On the evening of November 20th, Beijing time, Nano Banana Pro (official version number Gemini-3-Pro-Image-Preview) was officially opened. Just three months ago, the “magic banana” that once swept the AIGC community with “everything can be done in 3D” is now making a strong return with the blessing of the powerful base of Gemini 3 Pro. Now that “Pro" is hung up…
Almighty SOTA or does it specialize in the art industry? Gemini 3 Pro in-depth measurement: it is the “god” of UI construction and the “mortal" derived by the algorithm”丨302.AI Benchmark laboratory
To be honest, by the end of 2025, everyone may feel a little “tired” of AI. In the past two years, major manufacturers have piled up parameters and computing power like crazy, doubling the parameters at every turn, but the feeling of daily tasks is much the same. This kind of ”volume computing power" game has somewhat reached the moment when the marginal effect is decreasing. But just last night (November 18th, Beijing time), if Google quietly threw out Gemini 3.0, this pool of stagnant water might really be stirred up. Many people's memories…
Doubao-Seed-Code actual measurement: roll price, roll running points, but can't roll the real code?丨302.AI Benchmark laboratory
The AI programming circuit in the second half of this year can be described as a race against the clock and fierce competition. In the past, Kimi-K2-0905 was strongly ranked in the first echelon, and then Jipu GLM-4.5 challenged the ring defender Claude Sonnet 4.5. MiniMax also launched the latest masterpiece MiniMax-M2, which topped the list of Open source with strength. It is not difficult to find that these models that have emerged one after another like throwing stones into a lake, without exception, emphasized their significant improvement in programming capabilities when they were released. This trend is clear…
Generate a high-quality 3D model in one picture, measured by byte beating Seed3D 1.0: Amazing,也有遗憾丨302.AI Benchmark laboratory
Bytedance's Seed team recently launched its latest achievement, Seed3D 1.0-a 3D basic model that combines the accuracy and extensibility of physical simulation. With just one picture, a high-precision 3D model can be generated, and it comes with fine textures and materials, which can be directly used for simulation and robot training. The core challenge of current 3D generation technology lies in achieving “the leap from a photo to a usable three-dimensional world." This requires that the model must solve three fundamental problems: first, it cannot generate only one…
When accuracy is no longer the only criterion:三款主流STT语音转文字模型实测横评丨302.AI Benchmark laboratory
In the context of the current multi-modal AI that has gradually overcome vision and complex logical reasoning, the vulnerability of speech recognition systems to variables such as accent and noise is still a core challenge that needs to be overcome urgently in this field. When AI can see pictures and reason, why is it so difficult to understand a conversation with an accent? This is a common pain point for all developers and users. In the field of speech-to-text (STT), we always seem to be facing a “technological paradox”: model capabilities are making rapid progress on paper, but in real conference rooms, noisy streets, and full of people, we are always facing a "technological paradox".…
Kimi K2 Thinking actual measurement: Complex reasoning is already very useful,深度编程尚待提升丨302.AI Benchmark laboratory
In the summer of 2025, when the main line of the large-scale model competition has shifted from a simple parameter scale to a deeper “intelligent intelligence” (Agentic Intelligence), a name detonated the entire open source community like thunder-Kimi K2. This groundbreaking open source big language model released by Moonshot AI on July 11, 2025 is not only the first big model in the industry to claim to reach trillion parameters, the total number of parameters is as high as a staggering 1.04 trillion, and more importantly…
终结“人机感”,MiniMax Speech 2.6 实测:低延迟+全音色复刻颠覆体验丨302.AI 基准实验室
从机械单调的合成音,到略带情感的 AI 助手,AI 语音的竞赛始终聚焦于说得更“快”与更“像”的极限。然而,旧有的标杆正在被颠覆:MiniMax 于 10 月 30 日掷出其最新语音模型Speech 2.6,将端到端延迟一举压缩至 250 毫秒以下,重新定义了实时语音交互的速率标准。在人类日常对话中,自然停顿介于 300-500 毫秒之间,而 250 毫秒更…
当对手已冲入2.5时代,Minimax Hailuo 2.3却在踩倒车? 丨302.AI 基准实验室
在我们 9 月末的评测文章《国产AI视频“2.5时代”首战:Wan2.5的“电影感”与Kling 2.5的“稳定美学”,能否击败Veo 3?》中,曾提到国产 AI 视频模型正不约而同迈入 2.5 时代,而在一个多月后的 10 月 28 日,这一阵营中的又一员大将——来自 MiniMax 的 Hailuo 也正式迎来升级,推出 2.3 版本。 Hailuo 2…