Benchmark laboratory

Almighty SOTA or does it specialize in the art industry? Gemini 3 Pro in-depth measurement: it is the “god” of UI construction and the “mortal" derived by the algorithm”丨302.AI Benchmark laboratory

To be honest, by the end of 2025, everyone may feel a little “tired” of AI. In the past two years, major manufacturers have piled up parameters and computing power like crazy, doubling the parameters at every turn, but the feeling of daily tasks is much the same. This kind of ”volume computing power" game has somewhat reached the moment when the marginal effect is decreasing. But just last night (November 18th, Beijing time), if Google quietly threw out Gemini 3.0, this pool of stagnant water might really be stirred up. Many people's memories…
November 19, 2025 • Benchmark laboratory
8.2K01
Doubao-Seed-Code actual measurement: roll price, roll running points, but can't roll the real code?丨302.AI Benchmark laboratory

The AI programming circuit in the second half of this year can be described as a race against the clock and fierce competition. In the past, Kimi-K2-0905 was strongly ranked in the first echelon, and then Jipu GLM-4.5 challenged the ring defender Claude Sonnet 4.5. MiniMax also launched the latest masterpiece MiniMax-M2, which topped the list of Open source with strength. It is not difficult to find that these models that have emerged one after another like throwing stones into a lake, without exception, emphasized their significant improvement in programming capabilities when they were released. This trend is clear…
November 17, 2025 • Benchmark laboratory
3.1K02
Generate a high-quality 3D model in one picture, measured by byte beating Seed3D 1.0: Amazing,也有遗憾丨302.AI Benchmark laboratory

Bytedance's Seed team recently launched its latest achievement, Seed3D 1.0-a 3D basic model that combines the accuracy and extensibility of physical simulation. With just one picture, a high-precision 3D model can be generated, and it comes with fine textures and materials, which can be directly used for simulation and robot training. The core challenge of current 3D generation technology lies in achieving “the leap from a photo to a usable three-dimensional world." This requires that the model must solve three fundamental problems: first, it cannot generate only one…
November 14, 2025 • Benchmark laboratory
1.2K10
When accuracy is no longer the only criterion:三款主流STT语音转文字模型实测横评丨302.AI Benchmark laboratory

In the context of the current multi-modal AI that has gradually overcome vision and complex logical reasoning, the vulnerability of speech recognition systems to variables such as accent and noise is still a core challenge that needs to be overcome urgently in this field. When AI can see pictures and reason, why is it so difficult to understand a conversation with an accent? This is a common pain point for all developers and users. In the field of speech-to-text (STT), we always seem to be facing a “technological paradox”: model capabilities are making rapid progress on paper, but in real conference rooms, noisy streets, and full of people, we are always facing a "technological paradox".…
November 10, 2025 • Benchmark laboratory
1.7K00
Kimi K2 Thinking actual measurement: Complex reasoning is already very useful,深度编程尚待提升丨302.AI Benchmark laboratory

In the summer of 2025, when the main line of the large-scale model competition has shifted from a simple parameter scale to a deeper “intelligent intelligence” (Agentic Intelligence), a name detonated the entire open source community like thunder-Kimi K2. This groundbreaking open source big language model released by Moonshot AI on July 11, 2025 is not only the first big model in the industry to claim to reach trillion parameters, the total number of parameters is as high as a staggering 1.04 trillion, and more importantly…
November 7, 2025 • Benchmark laboratory
8.0K11
终结“人机感”，MiniMax Speech 2.6 实测：低延迟+全音色复刻颠覆体验丨302.AI 基准实验室

从机械单调的合成音，到略带情感的 AI 助手，AI 语音的竞赛始终聚焦于说得更“快”与更“像”的极限。然而，旧有的标杆正在被颠覆：MiniMax 于 10 月 30 日掷出其最新语音模型Speech 2.6，将端到端延迟一举压缩至 250 毫秒以下，重新定义了实时语音交互的速率标准。在人类日常对话中，自然停顿介于 300-500 毫秒之间，而 250 毫秒更…
2025 年 11 月 3 日 • Benchmark laboratory
1.9K10
当对手已冲入2.5时代，Minimax Hailuo 2.3却在踩倒车? 丨302.AI 基准实验室

在我们 9 月末的评测文章《国产AI视频“2.5时代”首战：Wan2.5的“电影感”与Kling 2.5的“稳定美学”，能否击败Veo 3？》中，曾提到国产 AI 视频模型正不约而同迈入 2.5 时代，而在一个多月后的 10 月 28 日，这一阵营中的又一员大将——来自 MiniMax 的 Hailuo 也正式迎来升级，推出 2.3 版本。 Hailuo 2…
2025 年 10 月 31 日 • Benchmark laboratory
1.5K10
卷不动全能冠军？MiniMax-M2：用一半的力气，拿下最值钱的阵地丨302.AI 基准实验室

MiniMax 日前正式开源了其专为编程任务与 Agent 工作流优化设计的大模型 MiniMax-M2。该模型采用 MoE 混合专家架构，官方称其为“小模型”，是因为仅凭 100 亿激活参数，即可实现媲美顶尖模型的端到端工具调用能力，而其轻量级形态使得部署和扩展变得比以往更加轻松。 MiniMax M2 定位明确，旨在成为 AI 编程与 Agent 开发领…
2025 年 10 月 29 日 • Benchmark laboratory
4.5K11
Doubao-Seed-Translation翻译模型实测：距离真正的“翻译大师”还有多远？丨302.AI 基准实验室

字节跳动旗下火山引擎于 9 月推出其通用多语言翻译模型 Doubao-Seed-Translation，支持包括中、英、日、韩、德、法等 28 种语言互译，基本覆盖了全球大部分主流语种。官方称模型在中英翻译效果上逼近 Deepseek-R1，而在多语言综合表现上，甚至可以对标顶尖模型 GPT-4o 与 Gemini-2.5-Pro，展现出国际一流的翻译水准。…
2025 年 10 月 27 日 • Benchmark laboratory
2.3K10
Sonnet 4 平替？Claude Haiku 4.5 实测杀疯了：性能不输，价格砍半丨302.AI 基准实验室

距 Claude Sonnet 4 问鼎业界编程翘楚五个月后，Anthropic 再度官宣发布其 Claude 家族轻量级新作——Claude Haiku 4.5，并宣称该模型在编码性能上已媲美 Sonnet 4，而价格仅为后者的三分之一，速度更是提升一倍以上，堪称一款极具竞争力的 Sonnet 4 平替。 Anthropic 官方抛出的数据也直观地力证了 …
2025 年 10 月 24 日 • Benchmark laboratory
2.6K00

3 / 16
1
2
3
4
5
6