Model evaluation

When accuracy is no longer the only criterion:三款主流STT语音转文字模型实测横评丨302.AI Benchmark laboratory

In the context of the current multi-modal AI that has gradually overcome vision and complex logical reasoning, the vulnerability of speech recognition systems to variables such as accent and noise is still a core challenge that needs to be overcome urgently in this field. When AI can see pictures and reason, why is it so difficult to understand a conversation with an accent? This is a common pain point for all developers and users. In the field of speech-to-text (STT), we always seem to be facing a “technological paradox”: model capabilities are making rapid progress on paper, but in real conference rooms, noisy streets, and full of people, we are always facing a "technological paradox".…
November 10, 2025 • Benchmark laboratory
1.7K00
不止于形，更在于神——Vidu Q2 实测：“演技派”领跑AI视频内卷新方向丨302.AI 基准实验室

随着 AI 视频生成技术从基础的提示词理解迈向电影级画面创作，模型的进化方向已不再局限于画质本身，而是延伸至具备导演思维的运镜逻辑与对用户深层意图的感知能力。“电影级”，正成为新一代 AI 视频模型的核心标签。在九月末密集发布的 AI 视频模型中，Wan2.5 与 Sora 2 凭借音画同步能力的突破，将 AI 视频的叙事质感推向一个新高度。紧随其后，生数…
2025 年 10 月 20 日 • Benchmark laboratory
2.6K00
别再只谈电影级画质，Sora 2评测：当AI开始真正讲中文、做导演，真实感什么水平？丨302.AI 基准实验室

国庆假期第一天，当AI视频领域的热度还聚焦在Kling 2.5拿下SOTA，Wan2.5大获好评之时，OpenAI再次以“核弹级”的发布，将视频生成技术推向了全新的叙事维度——Sora 2，一个不仅能看见“世界”，更能听懂“世界”的视频模型。自Sora初次亮相以来，凭借对物理世界近乎“复刻”的模拟能力，彻底改写了AI视频生成的质量标杆。然而，在AIGC创作…
2025 年 10 月 14 日 • Benchmark laboratory
4.6K02
302.AI 基准实验室丨全面刷新榜单，“全球最强AI”Grok 4评测：真实实力与局限解析

2025年7月10日，全球AI领域再次迎来一场震撼级的技术革新。埃隆·马斯克旗下的xAI公司，在这一天正式向世界揭开了其最新一代大型语言模型——Grok 4的神秘面纱。xAI大胆宣称Grok 4是“全球最强大AI”，并用一系列令人咋舌的基准测试成绩，强有力地支撑了这一论断。 Grok 4不仅推出了强大的单智能体版本，更带来了突破性的多智能体协作版本Grok …
2025 年 7 月 12 日 • Benchmark laboratory
5.9K20
302.AI 基准实验室丨从街头到秀场：拟真度新皇登基！文生图模型Higgsfield Soul 开箱测评

北京时间6月26日凌晨，Higgsfield AI 在海外社交媒体平台 X 宣布推出文生图模型 Higgsfield Soul，官方介绍这是一款高端美学照片模型（High Aesthetic Photo Model）。一上线即凭「一键高定时装」与「实时姿态驱动」两大黑科技刷爆社媒。Soul 结合跨模态纹理映射与可控人体骨骼动画，可在数秒内把任何草图或文字 p…
2025 年 7 月 7 日 • Benchmark laboratory
2.6K40
The battle for the king of AI Life map in the first half of 2025: 302. AI assembled an all-star lineup, and the actual measurement of the TOP5 models was announced! Must-read for AIGC enthusiasts

I. 前言：2025上半年AI生图模型评测 – 迈向拟真与高效的新纪元在过去两年里，无论你刷社交媒体、看电视节目，还是注意到街头巷尾的广告，AI生成的图像早已全方位融入到我们的生活中。2025年上半年，AI图像生成领域再次迎来爆发式增长，技术突破与应用落地呈现出前所未有的加速态势。从ChatGPT、Sora等模型的突破性进展，到国产大模型的飞速…
2025 年 6 月 20 日 • Benchmark laboratory
12.7K23
302.AI 基准实验室丨三大最新语言模型：Gemini/Doubao/Minimax 高考数学与游戏编程实战测评

6月，各大模型厂商如同上了“发条”，新品发布纷至沓来。 6月11日，Force 2025 原动力大会上，火山引擎正式发布豆包大模型 1.6 版（Doubao-Seed-1.6）。该系列中包含了三个主要版本：标准版 Doubao-Seed-1.6、深度思考强化版 Doubao-Seed-1.6-thinking 以及极速版 Doubao-Seed-1.6-fl…
2025 年 6 月 19 日 • Benchmark laboratory
7.3K30
The ultimate video model hegemony in the first half of 2025! Seedance 1.0 vs Kling 2.1 vs Veo 3 actual measurement | 302.AI Benchmark laboratory

北京时间6月11日，火山引擎在召开的 Force 2025 原动力大会上，正式发布了豆包大模型 1.6 版（Doubao-Seed-1.6）、豆包·视频生成模型 Seedance 1.0 pro、豆包·语音播客模型以及豆包·实时语音模型。其中，全新发布的豆包·视频生成模型 Seedance 1.0 pro 支持文字与图片输入，可生成多镜头无缝切换的1080…
2025 年 6 月 16 日 • Benchmark laboratory
8.3K60
What is the difference between the new version of DeepSeek-R1-0528 and the old version? Quick look at the comparison of the measured results | 302.AI Benchmark laboratory

北京时间 5 月 28 日晚，DeepSeek 在官方社群推送了更新通知。 5 月 29 日 DeepSeek 正式在社媒平台宣布 DeepSeek R1 模型已完成小版本升级，当前版本更新为 DeepSeek-R1-0528 Hugging Face模型榜登顶根据 DeepSeek 官方重点信息提炼：更新后的 R1 模型在数学、编程与通用逻辑等多个基准…
2025 年 5 月 30 日 • Benchmark laboratory
7.0K140
The latest comparative evaluation of the Claude 4 series, reasoning regressive front-end programming enhancement? | 302.AI Benchmark laboratory

美东时间5月22日周四，Anthropic在公司首届”Code with Claude”开发者大会上推出了 Claude 4 系列的两款全新模型：Claude Opus 4和Claude Sonnet 4。据了解，Claude Opus 4 和 Sonnet 4 都是混合推理模型，同时支持 Extended thinking（扩展推…
2025 年 5 月 23 日 • Benchmark laboratory
16.6K2450

1 / 2
1
2