The new era of AI is unveiled! OpenAI’s large reasoning model that can “think about problem-solving logic” is now available

The AI ​​era has ushered in a new starting point – the big model capable of general complex reasoning has finally come to the fore.

On September 12, OpenAI announced on its official website that it had begun to push the OpenAI o1 preview model to all subscribers – the previously widely anticipated “Strawberry” big model. OpenAI said that for complex reasoning tasks, the new model represents a new level of artificial intelligence capabilities, so it is worth resetting the count to 1 and giving it a new name different from the “GPT-4” series.

The characteristic of the reasoning big model is that AI will spend more time thinking before answering, just like the process of human thinking to solve problems. The logic behind the previous big models is to predict the sequence of word generation by learning patterns in many data sets, which is not strictly speaking a true understanding of the question.

As the first version of the o1 series model, OpenAI has only launched the o1-preview preview version and the o1-mini mini version. It is launched in stages to paying users, free users, and developers, and the price for developers is quite high.

The cost of using the o1 model is at least 3 times that of GPT-4o

According to reports, the new o1 model can answer more complex programming, math, and science problems through a new training method behind it. It will “think” before answering, and it is faster than humans. The smaller and cheaper mini version focuses on programming use cases.

ChatGPT Plus and Team paid users can access these two models immediately, manually selecting them from the drop-down menu of the AI ​​model selector in the user interface. ChatGPT Enterprise and Edu users will also be able to use these two modes in the future, and all free users will be given access to o1-mini at an unknown time in the future. OpenAI hopes to automatically select the correct model based on the prompts in the future.

However, it is very expensive for developers to access o1. In the API (application programming interface), o1-preview charges $15 per 1 million input tokens, which is three times the cost of GPT-4o, and $60 per 1 million output tokens, which is four times the cost of GPT-4o. 1 million tokens is the size of the model parsing text blocks, which is equivalent to about 750,000 words.

Jerry Tworek, head of research at OpenAI, told the media that the training method behind o1 is fundamentally different from previous models.

First, o1 “uses a new optimization algorithm and a new training data set tailored for it to be trained”, which contains “inference data” and scientific literature tailored for it.

Second, the previous GPT model training method imitates the rules/patterns of the data set, while o1 uses “reinforcement learning” to teach the model to solve problems by itself through rewards and penalties, and then uses a “chain of thoughts” to handle user queries, giving a summary version of the chain of thoughts, similar to the way humans handle problems step by step.

OpenAI believes that this new training method will make the o1 model more accurate and reduce the “hallucination” problem of making up answers, but it cannot completely eliminate the “hallucination”. The main difference between the new model and GPT-4o is that it can better solve complex problems such as programming and mathematics, while also improving its reasoning process, trying different strategies, and identifying and correcting errors in its own answers.

Cognition will leap to the “level of a doctoral student in science”

OpenAI has explained that GPT-4, released in 2023, is similar to the intelligence level of high school students, while GPT-5 completes the growth of AI from “high school students to Ph.D.” This o1 model is a key step.

Compared with existing large models such as GPT-4o, OpenAI o1 can solve more difficult reasoning problems while improving the mechanical flaws existing in past models.

For example, the new model can count how many “r”s there are in strawberry.

At the same time, AI will be more organized when answering programming questions. Before starting to write code, it must think through the entire answer process before starting to output the code.

For example, in the poetry writing task with preset conditions (for example, the last word of the second sentence needs to end with i), GPT-4o, which “picks up the pen and writes”, does give an answer, but often only part of the conditions are met. It also does not self-correct. This means that the AI ​​must encounter the correct answer the first time it is generated, otherwise it will make mistakes. But in the o1 model, AI will continue to trial and error and polish its answers, thereby significantly improving the accuracy and quality of generated results.

Interestingly, when you click on the AI ​​thinking process, the AI ​​will also appear saying “I am thinking about this matter, can I do this?”, “Ah, I don’t have enough time and I have to give the answer as soon as possible”, etc. OpenAI confirmed that what is shown here is not the original chain of thought, but a “summary generated by the model”, and the company also frankly admitted that there are factors to maintain a “competitive advantage” here.

Jerry Tworek, head of research at OpenAI, revealed that the training behind the o1 model is fundamentally different from previous products. While previous GPT models were designed to mimic patterns in their training data, o1 was trained to solve problems on its own. In the process of reinforcement learning, reward and punishment mechanisms are used to “educate” AI to use “thinking chains” to deal with problems, just like how humans learn to dismantle and analyze problems.

According to tests, the o1 model was able to obtain 83% of the scores in the International Mathematical Olympiad Qualifying Examination, while GPT-4o could only correctly solve 13% of the problems. In the programming ability competition Codeforces, the o1 model scored 89%, while GPT-4o only scored 11%.

OpenAI said that according to tests, in the next updated version, the AI ​​​​will be able to perform at a PhD level in challenging benchmark tests in physics, chemistry and biology.

There are also obvious disadvantages, such as being more likely to have “hallucinations”

As the initial version of the o1 model, the o1-preview version also has obvious shortcomings. For example, it is just a “text-only version” that temporarily cannot browse web information and upload files and pictures, which means it does not have many of the functions of ChatGPT. It is not as powerful as GPT-4o in many common use cases, and there are usage restrictions. , the o1 preview version has a weekly limit of 30 messages, and the mini version has a weekly limit of 50 messages.

Other limitations mentioned include: the o1 model is not as capable as GPT-4o in many areas and performs poorly in factual knowledge about the world; the reasoning ability is slower in some use cases and may take longer to answer questions. ;Currently o1 is only a pure text model, lacking the ability to reason about specific documents or collect real-time information from the network.

In addition, letting the AI ​​model play Tic-Tac-Toe has always been considered a problem in the industry. The new o1 model with reasoning capabilities will still make mistakes in this game, that is, it cannot completely overcome the technical difficulties.

OpenAI also admitted in a technical paper that it had received some “anecdotal feedback” that the o1 preview and mini version were more likely to produce “illusions” than GPT-4o and its mini version, that is, the AI ​​was still very confident. Make up answers, and o1 will rarely admit that it doesn’t know the answer to a question.

Well-known technology media

Techcrunch pointed out that OpenAI pointed out in a blog post related to the o1 model that it decided not to show users the original “thinking chain” of this new model, but chose to give a summary of the thinking chain in the answer in order to maintain ” Competitive Advantage” To compensate for possible shortcomings, “we strive to teach the model to reproduce any useful ideas in the chain of ideas in the answer.”

Scroll to Top