Elevenlabs

  • When accuracy is no longer the only criterion:三款主流STT语音转文字模型实测横评丨302.AI Benchmark laboratory

    In the context of the current multi-modal AI that has gradually overcome vision and complex logical reasoning, the vulnerability of speech recognition systems to variables such as accent and noise is still a core challenge that needs to be overcome urgently in this field. When AI can see pictures and reason, why is it so difficult to understand a conversation with an accent? This is a common pain point for all developers and users. In the field of speech-to-text (STT), we always seem to be facing a “technological paradox”: model capabilities are making rapid progress on paper, but in real conference rooms, noisy streets, and full of people, we are always facing a "technological paradox".…

    November 10, 2025 Benchmark laboratory
    7000
  • 终结“人机感”,MiniMax Speech 2.6 实测:低延迟+全音色复刻颠覆体验丨302.AI 基准实验室

    从机械单调的合成音,到略带情感的 AI 助手,AI 语音的竞赛始终聚焦于说得更“快”与更“像”的极限。然而,旧有的标杆正在被颠覆:MiniMax 于 10 月 30 日掷出其最新语音模型Speech 2.6,将端到端延迟一举压缩至 250 毫秒以下,重新定义了实时语音交互的速率标准。在人类日常对话中,自然停顿介于 300-500 毫秒之间,而 250 毫秒更…

    2025 年 11 月 3 日 Benchmark laboratory
    7281
  • 2025年AI音乐模型评测:孤独的Suno与国产模型的追赶者们丨302.AI 基准实验室

    在开始这篇万字长文前,可以先看两则我刚剪的短视频,配乐均来自本篇评测中生成的 AI 音乐案例,能对目前的 AI 音乐质量有个直接的认知。相信我若不说明,能一耳朵辨别出这是 AI 音乐的人,恐怕寥寥无几。 放眼今天的 AIGC 版图,图像/视频领域早已卷得飞起,这周刚被万千用户追捧的 SOTA 模型很可能下周就被新的竞品完爆,潮起又潮落。然而当我们把视线挪到 …

    2025 年 9 月 18 日 Benchmark laboratory
    2.5K0