Artificial intelligence (AI) companies are hitting a scaling wall, according to a Reuters report referring to experts and investors in the AI space. The report suggests that the results from scaling up pre-training— making models bigger and feeding models more data— are no longer providing proportional capability improvements. AI developers are reportedly facing challenges in developing a model that is better than GPT 4. The developers are struggling with the following key challenges—
- Scaling costs
- Hardware-induced failures due to the complexity of the training process
- Lack of readily available data
- Power shortages
To cope with this, companies are using a process called “test time compute” where the model learns more while it is coming out with inferences. So for instance, when you ask a model for the answer to a specific question, it might show you two options instead of pre-emptively picking one. This helps the model allocate more of its processing power to challenging tasks like maths or coding.
Scaling concerns in Google, OpenAI and Anthropic:
While Reuters published a comprehensive report about the subject, concerns around AI scaling have been in discussion for a while now. As per a recent Bloomberg report, OpenAI’s new model Orion did not live up to the company’s expectations, given that it wasn’t as big a step up as GPT 4 was from GPT 3.5. Similarly, Google and Anthropic are also struggling to make any major breakthroughs.
This comes as a concerning development, especially keeping in mind the large quantities of funds companies are allocating on AI. Meta said that its investment in AI continues to require “serious infrastructure” and that it expects to invest significantly in them in its latest earnings call for the quarter ending in September 2024. Similarly, as per Google’s recent earnings call, the company spent $7.2 billion on sales and marketing, these expenses were a result of Google’s investment in advertising and promotional efforts related to the Made by Google launches, as well as for AI and Gemini.
AI scientists say they told us so:
Commenting on the Reuters report, Meta’s Chief AI Scientist Yan Le Cun says “I told you so”. He explains that auto-regressive large language models (LLMs)— models that use past data to predict future trends — are hitting a ceiling and that he has been concerned about the same since before most people heard of LLMs. “I’ve always said that LLMs were useful, but were an off-ramp on the road towards human-level AI. I’ve said that reaching human-level AI will require new architectures and new paradigms,” Le Cun explains.
Fellow computer scientist Gary Marcus argues that he spoke about deep learning models hitting a wall in 2022. “We all know that GPT-3 was vastly better than GPT-2. And we all know that GPT-4 (released thirteen months ago) was vastly better than GPT-3. But what has happened since? I could be persuaded that on some measures there was a doubling of capabilities for some set of months in 2020-2023, but I don’t see that case at all for the last 13 months. Instead, I see numerous signs that we have reached a period of diminishing returns,” Marcus mentioned in his blog post in April this year.
At the time, he flagged struggling AI projects like Inflection AI and Stability AI that were struggling with financial difficulties. “If enthusiasm for GenAI dwindles and market valuations plummet, AI won’t disappear, and LLMs won’t disappear; they will still have their place as tools for statistical approximation,” he explained.
Meanwhile, OpenAI CEO Sam Altman addressed the Reuters report with a single-line tweet saying there is no wall.
The data shortage dilemma:
Earlier this year during the AI for Good Global Summit, an interviewer asked Altman about data shortages and questioned whether OpenAI was now relying on synthetic data (computer-generated data) to train its models. Altman admitted that the company has generated a lot of synthetic data and experimented with training on it while adding that it would be really strange if the best way for companies to train models was to generate synthetic data and feed it into their models. When asked about quality concerns with synthetic data, Altman said that companies need to focus on ensuring that the data is of good quality and also finding ways to “get better at data efficiency and learn more from smaller amounts of data.”
Another space that companies could turn to address data shortage to is non-English data. At MediaNama’s annual PrivacyNama conference this year, lawyer Amlan Mohanty pointed out localised data sets (like ones from the Global South) could make AI models powerful and culturally richer. “These models are going to become more powerful, more capable when they have really small, localized, specialized data sets that are going to be licensed,” he mentioned.
Also read:
This post was originally published on here