OpenAI Launches First Model With Reasoning Abilities

TMTPOST -- In a groundbreaking move, OpenAI has unveiled its latest AI model, 'o1,' which promises to redefine the landscape of artificial intelligence with its advanced reasoning capabilities.

Two distinct versions were released: o1-preview and o1-mini. The former is designed for high-level reasoning tasks in mathematics, programming, and scientific inquiries, boasting performance close to that of PhD-level experts. The latter is a more compact model optimized for code generation.

The o1 model is the highly-anticipated and touted 'Strawberry' project. Some industry insiders suggest that 'o1' stands for 'Orion.'

OpenAI has emphasized that this new model represents a fresh start in AI's ability to handle complex reasoning tasks, meriting a new naming convention distinct from the 'GPT-4' series. Meanwhile, this also marks another new starting point of the AI era - the important arrival of large models that can perform general complex reasoning.

Despite its advanced capabilities, the current chat experience with o1 remains basic. Unlike its predecessor GPT-4o, o1 does not offer functions such as browsing the web or handling file analysis tasks. Although it has image analysis capabilities, this feature is temporarily disabled pending further testing. Additionally, there are message limits: the number of passages sent on o1-preview is capped at 30 per week, while o1-mini allows for 50 messages per week.

Starting Friday, both versions are available to ChatGPT Plus/Team users and via API channels, with enterprise and educational users gaining priority access next week.

OpenAI CEO Sam Altman described o1 as the company's most capable and aligned models yet, but admitted that “o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.”

The training behind o1 is fundamentally different from its predecessors, said OpenAI’s research lead, Jerry Tworek. He said o1 “has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it.”

OpenAI taught previous GPT models to imitate patterns from its training data. With o1, it trained the model to solve problems on its own applying a technique known as reinforcement learning, which teaches the system through rewards and penalties. It then uses a “chain of thought” to process queries, similarly to the way humans process problems in a step-by-step manner.

OpenAI's new training methodology has led to a model that, according to the company, is more accurate. "We've noticed this model hallucinates less," says Tworek. However, the issue hasn’t been fully resolved. "We can’t claim to have eliminated hallucinations."

What distinguishes this new model from GPT-4o is its enhanced ability to solve complex problems, particularly in coding and math, while also providing explanations for its reasoning, OpenAI explains.

“The model is definitely better at solving the AP math test than I am, and I was a math minor in college,” says Bob McGrew, OpenAI’s chief research officer. OpenAI tested o1 on a qualifying exam for the International Mathematics Olympiad, where it solved 83% of the problems, compared to GPT-4o’s 13%.

In Codeforces programming contests, the model ranked in the 89th percentile of participants. OpenAI also claims the next update will perform similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.

Despite these advancements, o1 lags behind GPT-4o in certain areas, such as factual knowledge about the world. It also lacks web-browsing capabilities and the ability to process files and images. Still, OpenAI views o1 as representing a new class of AI capabilities, naming it to symbolize "resetting the counter back to 1."

It is clear that while the new OpenAI o1 model does not yet possess a fully comprehensive problem-solving ability, its significantly improved reasoning capability makes it far more useful in specialized fields like science, programming, and mathematics. Additionally, the overall lower and upper limits of AI agent-related technologies have been raised, greatly enhancing capabilities in scientific research and production. However, its significance for the consumer sector is relatively limited.

Jim Fan, the Chief Scientist of Nvidia, noted that the new o1 model requires more computational power and data, and it can generate a data flywheel effect—correct answers and their thought processes can become valuable training data. This, in turn, continuously improves the reasoning core, much like how AlphaGo’s value network improved as more refined data was generated through MCTS (Monte Carlo Tree Search).

OpenAI's o1 series models significantly enhance reasoning capabilities and have introduced a new scaling paradigm: unlocking test time compute through reinforcement learning, according to Tianfeng Securities.

However, the model has its critics. Some users have noted delays in response times due to the multi-step processing involved in generating answers. Others have pointed out that while o1 excels in certain benchmarks, it does not yet surpass GPT-4o in all metrics. OpenAI's product manager, Joanne Jang, has cautioned against unrealistic expectations, emphasizing that o1 is a significant step forward but not a miracle solution.

The AI community remains divided over the terminology used to describe o1's capabilities. Terms like 'reasoning' and 'thinking' have sparked debate, with some experts arguing that these anthropomorphic descriptions can be misleading. Nonetheless, the o1 model's ability to perform tasks that require planning and multi-step problem-solving marks a notable advancement in AI technology.

Founded in 2015, OpenAI has been at the forefront of the tech industry's rapid shift towards AI. Its chatbot product, ChatGPT, first launched in 2022, sparked a global investment frenzy in AI.

OpenAI is in discussions to raise funds at a valuation of $150 billion, Bloomberg reported. The company is aiming to secure approximately $6.5 billion from investors including Apple, Nvidia and Microsoft, and is also exploring $5 billion in debt financing from banks.

OpenAI's CFO Sarah Friar recently mentioned in an internal memo that the upcoming round of financing will support the company's needs for increased computational capacity and other operational expenses. She emphasized that the company's goal is to allow employees to sell a portion of their shares in a buyback offer later this year.

(Sources: CNN, TechCrunch, The Verge.)