The world’s most powerful AI model has become, well, less powerful. And that has industry insiders whispering about what may be a major redesign of the system.
In recent weeks, users of OpenAI’s GPT-4 have been complaining about degraded performance, with some calling the model “lazier” and “dumber” compared with its previous reasoning capabilities and other output.
Users vented their frustrations on Twitter and OpenAI’s online developer forum about issues such as weakened logic, more erroneous responses, losing track of provided information, trouble following instructions, forgetting to add brackets in basic software code, and only remembering the most recent prompt.
“The current GPT4 is disappointing,” a developer who uses GPT-4 to help him code functions for his website wrote. “It’s like driving a Ferrari for a month then suddenly it turns into a beaten up old pickup. I’m not sure I want to pay for it.”
Peter Yang, a product lead at Roblox, tweeted that the model was generating faster outputs but that the quality was worse. “Just simple questions like making writing more clear and concise and generating ideas,” he added. “The writing quality has gone down in my opinion.” He asked if anyone else had noticed this.
“I’ve found it to be lazier,” another Twitter user, Frazier MacLeod, replied.
The user Christi Kennedy wrote on OpenAI’s developer forum that GPT-4 had started looping outputs of code and other information over and over again.
“It’s braindead vs. before,” she wrote last month. “If you aren’t actually pushing it with what it could do previously, you wouldn’t notice. Yet if you are really using it fully, you see it is obviously much dumber.”
This is quite a change from earlier this year when OpenAI was wowing the world with ChatGPT, and the tech industry awaited the launch of GPT-4 with rapt anticipation. ChatGPT originally ran on GPT-3 and GPT-3.5 — these are the giant, underlying AI models that power its uncanny answers.
The larger GPT-4 launched in March and quickly became the go-to model for developers and other tech industry insiders. It’s broadly considered the most powerful AI model available, and it’s multimodal, which means it can understand images and text inputs.
After the initial rush to try out this new model, some were shocked by their bills for using GPT-4. Sharon Zhou, the CEO of Lamini, a startup that helps developers build custom large language models, said the new model was slow but very accurate.
That was the situation until a few weeks ago. Then GPT-4 got quicker, but the performance noticeably waned, fueling talk across the AI community that Zhou and other experts said suggested that a major change was underway.
They said OpenAI might be creating several smaller GPT-4 models that would act similarly to the large model but would be less expensive to run.
Zhou said this approach was called a Mixture of Experts, or MOE. The smaller expert models are trained on their own tasks and subject areas, meaning there could be a GPT-4 specializing in biology and one for physics, chemistry, and so on. When a GPT-4 user asks a question, the new system would know which expert model to send that query to. The new system might decide to send a query to two or more of these expert models just in case and then mash up the results.
“This idea has been around for a while, and it’s a natural next step,” Zhou said.
Zhou compared this situation to the “Ship of Theseus,” a thought experiment where parts of the vessel were swapped out over time, begging the question, at what point does it become a whole new ship?
“OpenAI is taking GPT-4 and turning it into a fleet of smaller ships,” she said. “From my perspective, it’s a new model. Some would say it’s the same.”
Insider asked OpenAI about this on Tuesday. The company, partly owned by Microsoft, did not respond.
This week, several AI experts posted what they said were details of GPT-4’s architecture on Twitter. Yam Peleg, a startup founder, tweeted that OpenAI was able to keep costs down by using an MOE model with 16 experts. Semianalysis wrote about the inner workings of GPT-4 this week.
George Hotz, a security hacker, described an “eight-way mixture model” for GPT-4 during a recent podcast. Soumith Chintala, who cofounded the PyTorch open-source AI project at Meta, weighed in on Hotz’s comments.
“I would *conjecture* that the speculations are roughly accurate but I don’t have confirmation,” Allen Institute for AI CEO Oren Etzioni wrote in an email to Insider after seeing the leaks online this week.
There are two main technical reasons to use an MOE approach: better-generated responses, and cheaper, faster responses, he said.
“The ‘right’ mixture will give you both but often there is a tradeoff between cost and quality,” Etzioni added. “In this case, it seems anecdotally that OpenAI is sacrificing some quality for reduced cost. These models are very hard to evaluate (what constitutes a better response? In what cases?) so this isn’t scientific, it’s anecdotal.”
OpenAI wrote about the MOE approach in 2022 research coauthored by Greg Brockman, the president of OpenAI who also cofounded the company.
“With the Mixture-of-Experts (MoE) approach, only a fraction of the network is used to compute the output for any one input. One example approach is to have many sets of weights and the network can choose which set to use via a gating mechanism at inference time,” Brockman and his colleague Lilian Weng wrote. “This enables many more parameters without increased computation cost. Each set of weights is referred to as ‘experts,’ in the hope that the network will learn to assign specialized computation and skills to each expert.”
Zhou said GPT-4’s unnerving performance decline in recent weeks could be related to this training and OpenAI rolling out this fleet of smaller expert GPT-4 models.
“When users test it, we are going to ask so many different questions. It won’t do as well, but it’s collecting data from us, and it will improve and learn,” Zhou said.
Axel Springer, Business Insider’s parent company, has a global deal to allow OpenAI to train its models on its media brands’ reporting.
