OpenAI Claims GPT-5 Matches Human Performance Across Various Professions

OpenAI has unveiled a new benchmark, GDPval, designed to evaluate the performance of its AI models against human professionals across various industries. Released on Thursday, this benchmark aims to assess how close OpenAI’s systems are to achieving their goal of artificial general intelligence (AGI). Early results indicate that the GPT-5 model and Anthropic’s Claude Opus 4.1 are nearing the quality of work produced by industry experts, although OpenAI acknowledges that the benchmark currently covers a limited range of tasks.

Understanding GDPval and Its Scope

The GDPval benchmark focuses on nine key industries that significantly contribute to the U.S. gross domestic product, including healthcare, finance, manufacturing, and government. It evaluates AI performance across 44 different occupations, such as software engineers, nurses, and journalists. For the initial version, GDPval-v0, OpenAI enlisted experienced professionals to compare AI-generated reports with those created by their peers. Participants were tasked with selecting the best report from both sources. For instance, investment bankers were asked to analyze a competitor landscape in the last-mile delivery sector and compare their findings with AI-generated reports. The results were then averaged to determine an AI model’s “win rate” against human-generated reports across all evaluated occupations.

Performance Metrics of AI Models

In the first round of testing, OpenAI’s enhanced GPT-5-high model achieved a win rate of 40.6%, indicating that it was rated as better than or on par with industry experts in nearly half of the tasks assessed. In contrast, Anthropic’s Claude Opus 4.1 model performed even better, with a win rate of 49%. OpenAI attributes Claude’s high score to its ability to produce visually appealing graphics, rather than solely its performance in generating text-based reports. Despite these promising results, OpenAI cautions that the GDPval benchmark only evaluates a narrow aspect of professional work, primarily focused on report generation.

Future Directions and Industry Implications

OpenAI recognizes that the current GDPval test does not encompass the full range of tasks performed by professionals. The company plans to develop more comprehensive assessments that will account for a wider variety of industries and interactive workflows. This evolution is crucial as the company aims to demonstrate the practical applications of its AI models in real-world scenarios. In an interview, OpenAI’s chief economist, Dr. Aaron Chatterji, expressed optimism about the benchmark’s implications, suggesting that as AI models improve, professionals can leverage these tools to focus on more meaningful and higher-value tasks.

Tejal Patwardhan, who leads OpenAI’s evaluations, highlighted the significant progress made since the release of the GPT-4o model, which scored only 13.7% in similar evaluations about 15 months ago. The substantial improvement in GPT-5‘s performance reflects a trend that Patwardhan expects to continue as AI capabilities advance.

The Importance of Robust Benchmarks

As the field of artificial intelligence evolves, benchmarks like GDPval are becoming increasingly vital for assessing AI models’ capabilities. Silicon Valley employs various benchmarks to measure AI progress, including AIME 2025 and GPQA Diamond, which focus on competitive math problems and PhD-level science questions, respectively. However, many AI models are nearing their limits on these existing benchmarks, prompting researchers to call for more effective tests that can evaluate AI proficiency in real-world applications.

OpenAI’s GDPval could play a significant role in this ongoing conversation, as the company seeks to establish its AI models as valuable tools across diverse industries. Nevertheless, to convincingly demonstrate that its AI systems can outperform human professionals, OpenAI will need to expand the scope and depth of the GDPval benchmark in future iterations.


Observer Voice is the one stop site for National, International news, Sports, Editorโ€™s Choice, Art/culture contents, Quotes and much more. We also cover historical contents. Historical contents includes World History, Indian History, and what happened today. The website also covers Entertainment across the India and World.

Follow Us on Twitter, Instagram, Facebook, & LinkedIn

OV News Desk

The OV News Desk comprises a professional team of news writers and editors working round the clock to deliver timely updates on business, technology, policy, world affairs, sports and current events. The desk combines editorial judgment with journalistic integrity to ensure every story is accurate, fact-checked, and relevant. From market… More »

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button