OpenAI Unveils GPT-4.5 ‘Orion’

Here is a rewritten version of the content without changing its meaning, retaining the original length, and keeping proper headings and titles:

Updated 2:40 pm PT: A significant update has been made to the GPT-4.5 AI model’s white paper. The line stating that “GPT-4.5 is not a frontier AI model” has been removed. The updated white paper can be found here, while the original version is available here. The original article follows.

OpenAI has officially announced the launch of GPT-4.5, the highly anticipated AI model previously code-named Orion. This latest model boasts the largest size to date, leveraging more computing power and data than any of its predecessors.

Notably, OpenAI’s white paper explicitly states that GPT-4.5 is not considered a frontier model, a distinction that sets it apart from other models in its class.

Subscribers to OpenAI’s $200-a-month ChatGPT Pro plan will have access to GPT-4.5 through ChatGPT as part of a research preview, starting Thursday. Additionally, developers with paid API access will be able to utilize GPT-4.5 beginning today. Other ChatGPT users, including those with ChatGPT Plus and ChatGPT Team subscriptions, can expect to gain access to the model sometime next week, according to an OpenAI spokesperson.

The AI community has eagerly awaited the release of Orion, which is seen by some as a key indicator of the viability of traditional AI training approaches. The development of GPT-4.5 employed the same fundamental technique used for GPT-4, GPT-3, GPT-2, and GPT-1, involving a significant increase in computing power and data during the pre-training phase, known as unsupervised learning.

In previous GPT generations, scaling up led to substantial performance improvements across various domains, including mathematics, writing, and coding. According to OpenAI, GPT-4.5’s increased size has resulted in “deeper world knowledge” and “higher emotional intelligence.” However, there are indications that the gains from scaling up data and computing power may be starting to level off, as GPT-4.5 falls short of newer AI “reasoning” models from companies like DeepSeek, Anthropic, and OpenAI itself on several benchmarks.

GPT-4.5 is also very expensive to run, prompting OpenAI to reevaluate its decision to continue serving the model in its API in the long term. The cost for developers to access GPT-4.5’s API is substantial, at $75 per million input tokens (approximately 750,000 words) and $150 per million output tokens, significantly higher than the costs associated with GPT-4o.

In a blog post shared with TechCrunch, OpenAI stated, “We’re sharing GPT-4.5 as a research preview to better understand its strengths and limitations. We’re still exploring what it’s capable of and are eager to see how people use it in ways we might not have expected.”

Mixed Performance

OpenAI emphasizes that GPT-4.5 is not intended to be a direct replacement for GPT-4o, the company’s workhorse model that powers most of its API and ChatGPT. While GPT-4.5 supports features like file and image uploads and ChatGPT’s canvas tool, it currently lacks capabilities such as support for ChatGPT’s realistic two-way voice mode.

On the positive side, GPT-4.5 outperforms GPT-4o and many other models in terms of performance. On OpenAI’s SimpleQA benchmark, which evaluates AI models on straightforward factual questions, GPT-4.5 demonstrates higher accuracy than GPT-4o and OpenAI’s reasoning models, o1 and o3-mini. According to OpenAI, GPT-4.5 also hallucinates less frequently than most models, reducing the likelihood of generating inaccurate information.

However, it’s worth noting that an OpenAI spokesperson acknowledged that the company has not publicly reported the performance of its deep research model on the SimpleQA benchmark, and therefore, it’s not a relevant comparison. Notably, AI startup Perplexity’s Deep Research model, which performs similarly to OpenAI’s deep research on other benchmarks, outperforms GPT-4.5 on this test of factual accuracy.

SimpleQA benchmarks.Image Credits:OpenAI

On the SWE-Bench Verified benchmark, which assesses coding abilities, GPT-4.5 roughly matches the performance of GPT-4o and o3-mini but falls short of OpenAI’s deep research and Anthropic’s Claude 3.7 Sonnet. However, on OpenAI’s SWE-Lancer benchmark, which evaluates an AI model’s ability to develop full software features, GPT-4.5 outperforms GPT-4o and o3-mini, although it still lags behind deep research.

OpenAI’s SWE-Bench verified benchmark.Image Credits:OpenAI

OpenAI’s SWe-Lancer Diamond benchmark.Image Credits:OpenAI

Although GPT-4.5 doesn’t match the performance of leading AI reasoning models like o3-mini, DeepSeek’s R1, and Claude 3.7 Sonnet on challenging academic benchmarks such as AIME and GPQA, it performs on par with or outperforms leading non-reasoning models on these tests. This suggests that GPT-4.5 excels in math- and science-related problems.

OpenAI claims that GPT-4.5 is qualitatively superior to other models in areas where benchmarks are less effective, such as understanding human intent. GPT-4.5 responds in a warmer, more natural tone and performs well on creative tasks like writing and design, according to OpenAI.

In an informal test, OpenAI asked GPT-4.5 and two other models, GPT-4o and o3-mini, to create a unicorn in SVG format. GPT-4.5 was the only AI model to produce something resembling a unicorn.

left: GPT-4.5, Middle: GPT-4o, RIGHT: o3-mini.Image Credits:OpenAI

In another test, OpenAI prompted GPT-4.5 and two other models to respond to the prompt, “I’m going through a tough time after failing a test.” While GPT-4o and o3-mini provided helpful information, GPT-4.5’s response was the most socially appropriate.

As OpenAI wrote in its blog post, “[W]e look forward to gaining a more complete picture of GPT-4.5’s capabilities through this release, because we recognize academic benchmarks don’t always reflect real-world usefulness.”

GPT-4.5’s emotional intelligence in action.Image Credits:OpenAI

Scaling Laws Challenged

OpenAI asserts that GPT-4.5 is at the forefront of what is possible in unsupervised learning. However, the model’s limitations also seem to confirm speculation from experts that pre-training scaling laws will eventually reach their limits.

OpenAI co-founder and former chief scientist Ilya Sutskever stated in December that “we’ve achieved peak data” and that “pre-training as we know it will unquestionably end.” These comments echoed concerns that AI investors, founders, and researchers shared with TechCrunch for a feature in November.

In response to the pre-training hurdles, the industry, including OpenAI, has shifted its focus to reasoning models, which take longer to perform tasks but tend to be more consistent. By increasing the time and computing power that AI reasoning models use to “think” through problems, AI labs are confident they can significantly improve models’ capabilities.

OpenAI plans to eventually combine its GPT series of models with its “o” reasoning series, starting with GPT-5 later this year. Although GPT-4.5, which was reportedly incredibly expensive to train and delayed several times, may not take the AI benchmark crown on its own, OpenAI likely sees it as a stepping stone toward something far more powerful.

Source Link

OpenAI Unveils GPT-4.5 ‘Orion’

Mixed Performance

Scaling Laws Challenged

Best Portable SSDs 2025

Sustainable Industrial Transformation

Webb telescope captures ‘Einstein ring’

Acclaim rises from ashes

Home

Services

Domains & Hosting

FUSION MAG

OpenAI Unveils GPT-4.5 ‘Orion’

Mixed Performance

Scaling Laws Challenged

Best Portable SSDs 2025

You May Also Like

Sustainable Industrial Transformation

Webb telescope captures ‘Einstein ring’

Acclaim rises from ashes

Home

Services

Domains & Hosting

FUSION MAG