Benchmark laboratory
OpenAI 十周年答卷 GPT-5.2 实测:颠覆神话不再,未来使命何往?丨302.AI 基准实验室
正值成立十周年之际,OpenAI 于12月12日突袭发布新一代大模型GPT-5.2 系列,而这距离上一代 GPT-5.1 的发布仅过去一个月。在此期间,Gemini 3 与 Claude Opus 4.5 轮番炸场的内卷周期里,行业竞争已陷入胶着,往日发布即颠覆的市场震撼力正在边际递减。 OpenAI 此次并未选择单纯堆砌参数,而是首次祭出了三版本细分的精准…
GLM-4.6V 实测:当视觉模型学会“动手”,它离“顶尖”还差什么?丨302.AI 基准实验室
智谱 AI 于 12 月 8 日正式开源了其新一代多模态模型 GLM-4.6V 系列,包含面向高性能场景的 106B 版本与轻量本地部署的 9B Flash 版。此次升级不仅将训练上下文窗口一举推至 128K tokens,更在模型架构中做了一个关键变革:让工具调用(Function Call)成为视觉模型的原生能力。这意味着,模型不再止步于识别图像,而是能…
Kling Video 2.6:让谷歌“嘴瓢”的中文视频,我不仅说得准,还能演得好!丨302.AI 基准实验室
继12月1日快手发布首个统一多模态视频模型可灵O1后,仅两天后,又火速上线了Kling Video 2.6——可灵系列首个实现原生音频的模型,能够单次生成包含画面、自然语音、匹配音效及环境氛围的完整视频,大幅简化创作流程。 Kling 2.6的核心突破在于其多模态的深度协同,技术特点十分鲜明: 基于这一技术赋能,Kling 2.6可适配多种应用场景: 在此前…
年末开源图像模型决战:Z-Image-Turbo vs Flux.2 Dev 丨302.AI 基准实验室
在上篇《顶流开源模型Flux.2是否依然能打?硬钢Nano Banana Pro五轮实测》我们对Flux.2的两个闭源版本(Pro和Flex)进行了测试。而在同一周(11月27日),阿里通义紧随Flux的步伐,也发布了全新的开源图像模型:Z-Image-Turbo. Z-Image-Turbo 是 Z-Image 的蒸馏版本,仅使用 8 次函数评估(NFE)…
昔日顶流更新,Flux.2是否依然能打?硬钢Nano Banana Pro五轮实测丨302.AI 基准实验室
11月25日,Black Forest Labs终于将其2024年发布的图像模型Flux迭代至2.0版本。作为开源模型,Flux曾凭借其性价比与微调能力,一时风头无两,几乎取代了Stable Diffusion的生态。诸如腾讯混元针对人像微调的Flux-1-SRPO也获得过我们不错的评价。但近半年,随着谷歌Nano Banana和字节跳动SeeDance的问…
美学大师 vs 世界模拟器:Seedream 4.5对决Nano Banana Pro,SOTA能否易主?丨302.AI 基准实验室
12月3日,火山引擎正式发布了新一代 AI 图像模型 Seedream 4.5,又进入到了熟悉的中国模型后发制人的节奏:今年 8 月末 Nano Banana 横空出世不久,字节跳动便带着 Seedream 4.0 进行精准狙击。在我们当时的横评文章中,Seedream 4.0 六战五胜,实现了对 Nano Banana 的全面反超。回顾一下对于 4.0 版…
实测开源标杆 DeepSeek-V3.2:在“效率”与“深度”之间寻找新平衡丨302.AI 基准实验室
刚进入12月,DeepSeek 又一次无预告地发布了备受期待的 V3.2 系列模型—— DeepSeek-V3.2 与 DeepSeek-V3.2-Speciale,距离上次9月末发布Deepseek-V3.2-Exp仅过去2个月。本次更新不仅是技术迭代的成果,更像是一次针对大模型能力天花板的主动探索。两款模型师出同门,却有着清晰的分工:一个追求高效实用的日…
The price has dropped by 66%, and the performance is still the ceiling? Claude Opus 4.5 Who panicked by this wave of “price reduction strikes”?丨302.AI Benchmark laboratory
On November 25th, when the spotlight of the big model competition was still flowing between GPT-5.1 and Gemini 3 Pro, Anthropic brought its king product Claude Opus 4.5 back strongly, and claimed that this is currently the most powerful model in programming, agents, and computer use on a global scale, with programming capabilities surpassing humans.expert. The most eye-catching trump card of the Claude series has always been its dominant performance in the field of programming. In the real world of authority.…
After finishing the parameter volume "personality”? Grok 4.1 Actual measurement: full EQ,编程大幅提升丨302.AI Benchmark laboratory
Last week, when the eyes of the entire AI circle focused on the iterations of the two giants Google and OpenAI, xAI once again used its iconic raid method to open the Grok 4.1 series model for free to all users in the early hours of November 18th. This means that in just four months, the Grok 4 series has completed a key upgrade, and this upgrade clearly conveys xAI's unique competitive strategy to the outside world: the next frontier of the large model may no longer be the cold computing power and parameters, but the cold computing power and parameters.…
All six battles were won! 4K output, from infographic to ultra-realistic portrait: Nano Banana Pro重回王座丨302.AI Benchmark laboratory
The smoke of the LLM battlefield this week has not dissipated, and Google has dropped another blockbuster. On the evening of November 20th, Beijing time, Nano Banana Pro (official version number Gemini-3-Pro-Image-Preview) was officially opened. Just three months ago, the “magic banana” that once swept the AIGC community with “everything can be done in 3D” is now making a strong return with the blessing of the powerful base of Gemini 3 Pro. Now that “Pro" is hung up…