Is Deep Learning's Potential Reaching a Plateau?
Around 2010, advancements in hardware, algorithms, and an explosion of data allowed deep learning to transition from decades old theory to scalable reality. However, now ten years on, there are signs progress may be plateauing.
To understand the headwinds deep learning faces today, it's important to first understand how it has matured over the past decade.
Deep learning's virtuous circle initially gained momentum within the walls of large internet firms and research organizations. Key milestones such as the superhuman performance in image recognition by AlexNet in 2012 propelled a succession of algorithmic advancements.
In parallel to these advancements was a deluge of data rushing in from smartphones. This enabled Google, Facebook, Amazon and others to unleash deep learning on features such as predictive type-ahead, hyper-personalized recommendations, and speech transcription that power digital assistants – among many others.
By mid-2015, deep learning had matured and it was finally time to put it in the hands of the broader developer community.
2015 - 2017 – Machine learning platforms and libraries emerge
In November of 2015, Google open-sourced the TensorFlow machine learning library, which is arguably the moment ML / DL became truly accessible to the general developer community. Up to this point, developers primarily had access to AI only through APIs such as those available through IBM's Watson Developer Cloud, which made the technology accessible, but only in a controlled way.
Facebook followed Google a year later with its release of PyTorch, another machine learning library. So by the end of 2016, only four years ago, today's two dominant ML libraries were available.
To compliment these libraries, Google and Microsoft launched cloud machine learning development environments. Combined, ML libraries and development environments allowed developers to begin experimenting with deep learning's potential.
2017 - 2018 – Advancement of machine learning tools and algorithms
As we entered 2017, scaling deep learning had two critical obstacles. First, there weren't enough skilled data scientists in market to train and deploy models. Secondly, the cost of annotating data was too high for most developers.
Google was the first to address the skills problem with its release of AutoML in the second half of 2017. AutoML deskilled data science, allowing more developers to effectively train models without necessarily having to be machine learning specialists. Today automated ML is common, with companies such as DataRobot specializing in it.
The research community took aim at the data problem with the development/advancement of algorithms. Generative adversarial models, transfer learning, and unsupervised learning are all examples of approaches that lowered the amount of data required and/or the cost of annotating it. Today we increasingly see developers experimenting with these approaches.
2018 - Present – Advancements in Natural Language Processing
Deep learning kicked into a higher gear in 2018 when Google, again acting as the industry pacesetter, released a paper outlining transformer models, a novel neural network architecture for language models. This idea proved to be a significant advancement to the then state-of-the-art Recurrent Neural Network (RNN) approach.
Within 18 months, transformer models dominated natural language processing research and a steady drumbeat of ever-growing transformer based language models were released by the likes of NVIDIA, Microsoft, Google, Facebook, OpenAI, Baidu, and others. This drumbeat continues to this day.
Deep learning's headwinds
Now in late 2020, it's reasonable to look at deep learning's progress over the past decade and see 2020 / 2021 as an inflection point where AI begins to meaningfully scale. To some extent this is exactly what's happening. However, headwinds threaten each of the three critical areas currently propelling advancement.
1 Hardware
GPUs are essential to deep learning because of their superior ability to processes matrix multiplication. This capability is what allowed neural networks to be trained at scale. Underlining their significance, NVIDIA, the world's leading supplier of GPUs, has seen its stock price rise ~1,600% since late 2015, nearly perfectly correlating to the emergence of deep learning.
GPUs have their limits though. First, the energy cost associated with training a model is prohibitively high for many organizations. For example, some estimates put the cost to train OpenAI's recent GPT-3 model at over $4.6 million dollars. Secondly, language models are growing 10X in size year-over-year, outpacing gains in GPU memory. This points to a breaking point where processing power will effectively be maxed out.
Emerging to address this compute problem are neuromorphic computing chips, purpose built specifically for AI workloads. These chips emulate the neural structure and operation of the human brain and will unlock greater levels of scale. However, this technology is still in the research phase and it will take years even after their release for the AI community to optimize algorithms to run on them. So for now, we have to squeeze what we can out of GPUs.
2 Data
Deep learning requires a lot of data. Achieving a superhuman level of image recognition required 14 million human annotated images, while achieving autonomous driving will require at least 3 billion miles driven on auto-pilot (Tesla's fleet of cars has driven about ~3B miles today), and likely many more. Data at this scale is simply very hard to come by.
As already mentioned, generative models, unsupervised learning, and transfer learning are all examples of partial solutions to the data problem. More advancements are needed though and will almost certainly emerge over the coming years.
Until then, the popularity of services such as Hugging Face will continue to grow, i.e. services that allow developers to take large pre-trained models that would have been prohibitively expensive to train and simply tune them based on a smaller set of related data. This capability is already allowing developers to implement features such as natural language generation, question / answer support, and sentiment analysis, among others.
3 New Models
Capsule networks and transformer models are two recent examples of model advancements that have enhanced image recognition and natural language processing respectively. As AI and deep learning move towards general intelligence, I've proposed neuro-symbolic AI as another advancement that allows computers to "reason" more similar to how humans do. Further advancements are needed on this front though to optimize the data and hardware resources currently available.
Maintaining deep learning's velocity
With hardware, data, and better algorithms all requiring new innovation to advance AI, the market will be forced to optimize across these three just to maintain growth. In the near term, I predict we'll see increasing progress around reducing the data requirement for training models. Models should become easier to train with less manual effort, compute, and data required. Will this be enough to maintain deep learning's momentum though? Only time will tell.