302.AI 基准实验室 | OpenAI o4-mini & o3，实测编程效果与多模态能力到底如何？

302.AI • 2025 年 4 月 23 日下午1:38 • 基准实验室 • 2390 views

上周，OpenAI在直播中发布了 o 系列新模型：o4-mini 和 o3。

OpenAI表示，o3是他们目前最强大的推理模型，在分析图像、图表和图形等视觉任务中表现尤为出色。而 o4-mini 则是一个较小的模型，专注于快速且经济高效的推理，特别在数学、编码和视觉任务中实现了优异的性能。

接下来，我们将在 302.AI 平台上分别对 o4-mini 和 o3 进行实测对比，以评估这两大新模型的性能表现。

OpenAI o4-mini & o3模型实测

I. o4-mini实测

（对比模型：DeepSeek R1、o1-mini）

1、简单推理

提示词：

分析下列序列的规律，并填写后续三个元素： 3, 5, 6, 10, 9, 17, 12, 26, 15, ___, ___, ___

题目分析：序列中的规律是交替进行，正确答案为：37, 18, 50。

o4-mini：解析过程较为简洁，答案正确。

o1-mini：奇偶列规律分析正确，但是数字所在位数数错了，导致最后答案是错误的。

DeepSeek R1：分析规律正确，答案正确。

2、模型幻觉测试

提示词：“独在异乡为异客”的前一句是什么？

题目分析：“独在异乡为异客”就是古诗《九月九日忆山东兄弟》的第一句，没有前一句。

o4-mini：答案正确。

o1-mini：答案错误，存在明显的幻觉。

DeepSeek R1：回答正确。

3、编程测试

提示词：请生成一个跑酷游戏，界面必须包含游戏操作说明，开始游戏按钮

o4-mini：游戏界面比较简洁，跳跃正常，观察到右上角的分数随着开始时间一直在增加，不过部分障碍物设置太高，多次尝试仍然是无法越过，这不太合理。

o1-mini：根据操作说明按下空格键可跳跃，但实操发现空格键并未响应，存在明显逻辑问题。

DeepSeek R1：按照操作说明可进行跳跃，但是发现障碍物设置并不合理，完全未起到阻碍的作用，游戏存在明显问题。

其他模型效果：

来看下 o3 的效果，整体还不错。完整度较高，障碍设置合理，分数是根据成功跳过障碍物实时增加的。

II. o3实测

对比模型：Gemini 2.5 pro、Doubao-1.5-Thinking-Pro-Vision

1、地点识别

提示词：图片是在哪拍摄的？

题目解析：对于地标建筑不是特别明显的图片，模型要正确识别难度还是比较大的，图片正确的位置为：位于广州市白云区的麓湖公园。

o3：答案错误。

Gemini 2.5 pro：答案错误。

Doubao-1.5-Thinking-Pro-Vision：回答正确。

2、图片推理

提示词：杯子有多高？

题目分析：根据图片可知存在两个未知数：一个是杯子的高度（题目所问），另一个是杯子叠加的高度。通过设定未知数可以列出方程，根据两个等式求解，以得出杯子的高度。正确答案为 14 厘米。

o3：回答正确。

Gemini 2.5 pro：回答正确。

Doubao-1.5-Thinking-Pro-Vision：回答正确。

3、图片找不同

提示词：图片中共有6处不同，请指出具体在哪里

（右侧为答案）

o3：未能准确找出不同之处，描述不对。

（红色圈出的部分是错误的）

Gemini 2.5 pro：正确指出了三处不同。

（红色圈出的部分是完全错误的）

Doubao-1.5-Thinking-Pro-Vision：正确指出了五处不同。

（红色圈出的部分是完全错误的）

III. 实测总结

1、实测结果整理：

o4-mini & DeepSeek R1 & o1-mini
	简单推理	模型幻觉	编程测试
o4-mini	正确	正确	部分障碍物设置过高
o1-min	错误	错误	存在逻辑问题
DeepSeek R1	正确	正确	障碍物设置过于简单
o3 & Gemini 2.5 pro & Doubao-1.5-Thinking-Pro-Vision
	地点识别	图片推理	图片找不同
o3	错误	正确	未能准确找出
Gemini 2.5 pro	错误	正确	正确找出3处，有3处错误
Doubao-1.5-Thinking-Pro-Vision	正确	正确	正确找出5处，有1处错误

2、实测总结：

通过以上实测，可初步得出以下结论：

o4-mini & DeepSeek R1 & o1-mini

（1）o4-mini 较于 o1-mini 有明显的能力提升：在简单推理与模型幻觉测试中，o4-mini 和 DeepSeek R1 在简单推理和模型幻觉测试中均表现出色，o1-mini 则是表现较差。

（2）轻量级模型在编程能力上还有待提升：三个对比模型在编程任务中均存在不足，o4-mini 在障碍物设置方面存在不合理之处，如障碍物过高以至于无法越过，o1-mini存在明显的逻辑问题，而DeepSeek R1则因障碍物设置过于简单而未能有效发挥作用。

o3 & Gemini 2.5 pro & Doubao-1.5-Thinking-Pro-Vision

（1）o3 模型地点识别任务未达到网络预期水平：地点识别任务中处理随手拍摄且缺乏显著地标的图片时，仅Doubao-1.5模型能够提供准确答案。

（2）各模型在常规图片推理方面具备一定能力，但在复杂视觉任务中仍有较大提升空间：在简单图片推理任务中，各模型均能给出正确答案，但在难度较高的找不同测试中，所有模型均未能准确指出所有不同之处。

如何在302.AI中使用
302.AI的聊天机器人和API超市提供了按需付费无订阅的服务方式，企业和个人用户可按需灵活选用。
1、使用模型对话
使用路径：依次点击使用机器人→聊天机器人→ 选择模型 →创建聊天机器人；
o3/o4-mini：
2、使用模型API
企业用户可以通过302.AI的API超市快速、便捷地调用模型，还能够根据特定项目需求进行定制化开发。
相关文档：使用API→API超市→语言大模型→OpenAI→查看文档；
API名称：
o4-mini：o4-mini
o3：o3

👉立即注册免费试用302.AI，开启你的AI之旅！👈

为什么选择302.AI？

● 灵活付费：无需月费，按需付费，成本可控
● 丰富功能：从文字、图片到视频，应有尽有，满足多种场景需求
● 开源生态：支持开发者深度定制，打造专属AI应用
● 易用性：界面友好，操作简单，快速上手

302.AI 新品发布 | 图像创意站：GPT-Image-1玩法全解析，轻松生成惊艳作品

LLM o3 o4-mini Openai302.AI 基准实验室 | 模型测评

Like (0)

302.AI

302.AI 基准实验室 | GPT-4.1竟吊打GPT-4o！GLM-Z1-AirX又能否超越DeepSeek R1？

Previous 2025 年 4 月 16 日下午10:24

302.AI 深度拆解 | 大白话聊一聊：AI下半场，Agent 的本质与变革

Next 2025 年 4 月 25 日上午11:33

卷不动全能冠军？MiniMax-M2：用一半的力气，拿下最值钱的阵地丨302.AI 基准实验室

MiniMax 日前正式开源了其专为编程任务与 Agent 工作流优化设计的大模型 MiniMax-M2。该模型采用 MoE 混合专家架构，官方称其为“小模型”，是因为仅凭 100 亿激活参数，即可实现媲美顶尖模型的端到端工具调用能力，而其轻量级形态使得部署和扩展变得比以往更加轻松。 MiniMax M2 定位明确，旨在成为 AI 编程与 Agent 开发领…
1天前 • 基准实验室
21000
Doubao-Seed-Translation翻译模型实测：距离真正的“翻译大师”还有多远？丨302.AI 基准实验室

字节跳动旗下火山引擎于 9 月推出其通用多语言翻译模型 Doubao-Seed-Translation，支持包括中、英、日、韩、德、法等 28 种语言互译，基本覆盖了全球大部分主流语种。官方称模型在中英翻译效果上逼近 Deepseek-R1，而在多语言综合表现上，甚至可以对标顶尖模型 GPT-4o 与 Gemini-2.5-Pro，展现出国际一流的翻译水准。…
3天前 • 基准实验室
18900
Sonnet 4 平替？Claude Haiku 4.5 实测杀疯了：性能不输，价格砍半丨302.AI 基准实验室

距 Claude Sonnet 4 问鼎业界编程翘楚五个月后，Anthropic 再度官宣发布其 Claude 家族轻量级新作——Claude Haiku 4.5，并宣称该模型在编码性能上已媲美 Sonnet 4，而价格仅为后者的三分之一，速度更是提升一倍以上，堪称一款极具竞争力的 Sonnet 4 平替。 Anthropic 官方抛出的数据也直观地力证了 …
6天前 • 基准实验室
58400
Claude Sonnet 4.5 对阵 GLM-4.6：中外大模型编程巅峰对决，胜负已分? 丨302.AI 基准实验室

今年十一国庆可谓是大模型界尤为热闹的一个行业节点。就在假期前夕的 9 月 30 日，Anthropic 与智谱先后发布 Claude Sonnet 4.5 与 GLM-4.6。而二者的升级方向都十分默契地指向同一关键战场——编程能力。前有 Anthropic 高调宣称 Claude Sonnet 4.5 是迄今为止最强大的编程模型，后有 GLM-4.6 在…
2025 年 10 月 13 日 • 基准实验室
1.3K00

发表回复

Comments(26)

tlovertonet 2025 年 5 月 23 日上午12:14
This website online can be a stroll-by way of for all the information you needed about this and didn’t know who to ask. Glimpse right here, and you’ll positively discover it.
回复
Lee Fenske 2025 年 6 月 4 日下午4:03
whoah this weblog is fantastic i really like studying your posts. Stay up the great work! You realize, a lot of people are searching around for this information, you could aid them greatly.
回复
réserver un vtc 2025 年 6 月 6 日上午9:02
Wohh exactly what I was looking for, thankyou for putting up.
回复
taxi cdg 2025 年 6 月 6 日上午9:47
You are a very clever person!
回复
droversointeru 2025 年 6 月 8 日上午6:19
Great info and right to the point. I don’t know if this is truly the best place to ask but do you people have any ideea where to hire some professional writers? Thank you :)
回复
Leadership Development 2025 年 6 月 11 日上午3:39
Hiya, I am really glad I have found this information. Today bloggers publish only about gossips and internet and this is actually irritating. A good website with interesting content, this is what I need. Thanks for keeping this site, I will be visiting it. Do you do newsletters? Cant find it.
回复
Blossom Mates 2025 年 6 月 16 日下午5:11
Just wanna remark on few general things, The website design and style is perfect, the content is real good : D.
回复
Tory Tretina 2025 年 7 月 1 日上午4:55
Your place is valueble for me. Thanks!…
回复
Thanh Lamon 2025 年 7 月 1 日下午6:50
I just could not depart your website prior to suggesting that I extremely enjoyed the standard information a person provide for your visitors? Is going to be back often in order to check up on new posts
回复
Raleigh Toothacre 2025 年 7 月 2 日上午3:38
You made some decent points there. I did a search on the topic and found most individuals will consent with your blog.
回复
get 2025 年 7 月 6 日下午10:41
Very interesting topic, thanks for putting up.
回复
302.AI 基准实验室丨全面刷新榜单，“全球最强AI”Grok 4评测：真实实力与局限解析 - 2025 年 7 月 16 日下午3:54
[…] 4的实测对比，参与此次对决的选手包括：Gemini 2.5 Pro、Claude-opus-4、o3以及DeepSeek-R1。究竟Grok […]
回复
Willa Ginkel 2025 年 7 月 28 日下午7:59
F*ckin’ remarkable things here. I’m very happy to peer your article. Thank you a lot and i am looking ahead to contact you. Will you kindly drop me a e-mail?
回复
there 2025 年 7 月 30 日上午2:30
you might have a great weblog right here! would you like to make some invite posts on my weblog?
回复
og bil beskyttelse 2025 年 7 月 31 日上午1:07
Youre so cool! I dont suppose Ive read something like this before. So good to seek out someone with some original ideas on this subject. realy thanks for beginning this up. this web site is something that’s needed on the web, someone with a bit of originality. helpful job for bringing one thing new to the internet!
回复
mold damage clean-up atlanta 2025 年 7 月 31 日上午4:46
of course like your web-site however you need to test the spelling on several of your posts. Several of them are rife with spelling problems and I in finding it very troublesome to tell the reality however I’ll definitely come again again.
回复
Grand Prairie ac repair 2025 年 8 月 6 日下午6:17
I discovered your weblog website on google and examine just a few of your early posts. Continue to maintain up the excellent operate. I just extra up your RSS feed to my MSN News Reader. Looking for ahead to reading more from you afterward!…
回复
hosting services 2025 年 8 月 7 日上午11:14
I’ll immediately grasp your rss as I can’t in finding your email subscription hyperlink or newsletter service. Do you have any? Kindly let me recognize so that I may just subscribe. Thanks.
回复
pestoto 2025 年 8 月 16 日下午5:26
Thanks for a marvelous posting! I actually enjoyed reading it, you will be a great author.I will be sure to bookmark your blog and will eventually come back someday. I want to encourage that you continue your great work, have a nice morning!
回复
live toto macau 2025 年 8 月 17 日下午7:04
You really make it seem really easy with your presentation however I to find this matter to be really one thing that I believe I might never understand. It seems too complex and extremely broad for me. I am taking a look ahead in your next post, I will attempt to get the hang of it!
回复
olxtoto 2025 年 8 月 18 日上午9:17
You can certainly see your enthusiasm within the paintings you write. The arena hopes for more passionate writers like you who aren’t afraid to mention how they believe. At all times go after your heart. “Until you walk a mile in another man’s moccasins you can’t imagine the smell.” by Robert Byrne.
回复
result macau 2025 年 8 月 18 日下午12:28
Very interesting info !Perfect just what I was searching for! “It is our choices…that show what we truly are, far more than our abilities.” by J. K. Rowling.
回复
toto togel 2025 年 8 月 21 日上午10:02
Great post. I used to be checking continuously this blog and I am inspired! Extremely helpful info specially the final phase :) I maintain such information much. I was seeking this particular information for a long time. Thank you and good luck.
回复
slot pakai qris 2025 年 8 月 21 日下午8:51
I am impressed with this site, real I am a fan.
回复
ayuda PFG arquitectura 2025 年 8 月 24 日下午1:42
Thanks for another informative website. Where else could I get that kind of information written in such an ideal way? I have a project that I’m just now working on, and I’ve been on the look out for such information.
回复
Inmobiliaria José Ignacio 2025 年 8 月 24 日下午9:27
Very well written article. It will be supportive to everyone who usess it, including yours truly :). Keep up the good work – i will definitely read more posts.
回复