Open AI's co-founder believes AI training has hit an impasse, forcing AI laboratories to train their models more intelligently and not just bigger
Ilya Sutskever is the co-founder of OpenAI and believes that existing approaches for scaling up large language model have plateaued. AI labs must train smarter and not just bigger to make significant progress in the future. LLMs also need to think longer.
Sutskever told Reuters that the pre-training stage of scaling up large language model, such as ChatGPT is reaching its limit. Pre-training is an initial phase in which huge amounts of uncategorized data are processed to build language structures and patterns within the model.
Up until recently, increasing the scale, or the amount of data that could be used for training, was sufficient to produce a model with greater power and capability. This is no longer the case. What you train the model with and how you do it are more important.
The 2010s were an age of scaling. Now we are back in the age wonder and discovery. Sutskever believes that "everyone is looking for the next big thing" and that "scaling up the right thing is more important than ever."
This article is about the increasing difficulty AI labs have in making major advancements on models around the power and performance ChatGPT 4.0.
The short version is that everyone has access to the exact same or similar training data via various online sources. It's not possible to gain an advantage by simply adding more raw data to the problem. In other words, AI outfits will gain an edge by training smarter, not just bigger.
Inferencing, or the final stage of the LLM process after the models have been fully trained and are accessible by users, is another key enabler.
The idea is to use multiple steps to solve problems and queries, where the model can feed itself back, leading to a more human-like decision-making and reasoning.
Noam Brown, a researcher at OpenAI who worked on the newest o1 LLM, says that it was found that having a bot play poker for 20 seconds gave the same performance boost as training the model 100,000 times and scaling the model up by 100,000x.
Having bots think more than just spouting the first thing that comes into their heads can produce better results. If the latter approach proves to be a successful one, the AI industry could move away from massive clusters of GPUs focused on improved inferencing.
Nvidia will likely be ready to accept everyone's money, regardless of the outcome. Nvidia CEO Jensen Huang has recently acknowledged the increase in demand for AI-based GPUs.
"We have now discovered a new scaling law. This is the scaling at a time when inference occurs. "All of these factors have contributed to the demand being extremely high for Blackwell [Nvidia’s next-gen GPU technology]," Huang said.
It's not clear how long it will take before a new generation of bots that are more intelligent appears. Nvidia will likely see the results of this effort in its bank account soon enough.
Comments