In 2018, BERT got people talking about how machine learning models learn to read and speak. Today, large language models, or LLMs, are growing rapidly, proving dexterous in all sorts of applications.
They’re accelerating drug discovery, thanks to research from the Technical University of Munich’s Rostlab, as well as the work of a team from Harvard, Yale and New York University and others. In separate efforts, they applied LLMs to interpret the chains of amino acids that make up proteins, advancing our understanding of these building blocks of biology.
This is one of the many breakthroughs LLMs are making in healthcare, robotics, and other fields.
A Brief History of LLMs
Transformer models – neural networks, defined in 2017, capable of learning context in sequential data – pioneered LLMs.
The researchers behind BERT and other transformer models made 2018 “a watershed moment” for natural language processing, according to an AI report later this year. “Many experts have claimed that the release of BERT marks a new era in NLP,” he added.
Developed by Google, BERT (aka Bi-Directional Encoder Representations of Transformers) provided top scores on benchmarks for NLP. In 2019, he announced that BERT powers the company’s search engine.
Google released BERT as open source software, spawning a family of trackers and sparking a race to create ever bigger and more powerful LLMs.
For example, Meta created an improved version called RoBERTa, released as open-source code in July 2017. For training, it used “an order of magnitude more data than BERT”, according to the journal, and has got ahead of the NLP rankings. A melee ensued.
Scaling parameters and markets
For convenience, the score is often kept by the number of parameters or weights of an LLM, measures of the strength of a connection between two nodes in a neural network. BERT had 110 million, RoBERTa had 123 million, then BERT-Large was 354 million, setting a new record, but not for long.
In 2020, researchers from OpenAI and Johns Hopkins University announced GPT-3, with 175 billion parameters, trained on a dataset containing almost a trillion words. He performed well on a multitude of language tasks and even on three-digit number arithmetic.
“Linguistic models have a wide range of societal beneficial applications,” the researchers wrote.
Experts feel ‘blown away’
Within weeks people were using GPT-3 to create poems, programs, songs, websites and more. Recently, GPT-3 even wrote an academic paper about itself.
“I just remember being a bit blown away by the things he could do, for just being a language model,” said Stanford associate professor of computer science Percy Liang, speaking in a podcast.
GPT-3 helped motivate Stanford to create a center that Liang now leads, exploring the implications of what he calls fundamental models that can handle a wide variety of tasks well.
To trillions of parameters
Last year NVIDIA announced the Megatron 530B LLM which can be trained for new domains and languages. It got its start with tools and services to train language models with trillions of parameters.
“Large language models have proven themselves to be flexible and capable…capable of answering deep questions without specialized training or supervision,” said Bryan Catanzaro, vice president of deep learning applied research at NVIDIA at the time.
To further facilitate the adoption of the powerful models by users, the NVIDIA Nemo LLM service made its debut in September at the GTC. It is a cloud service managed by NVIDIA to tailor pre-trained LLMs to perform specific tasks.
Transformers are transforming drug discovery
The advances made by LLMs with proteins and chemical structures are also applied to DNA.
The researchers aim to scale their work with NVIDIA BioNeMo, a software framework and cloud service for generating, predicting and understanding biomolecular data. Part of the NVIDIA Clara Discovery collection of AI frameworks, applications, and models for drug discovery, it supports work in widely used protein, DNA, and chemistry data formats.
NVIDIA BioNeMo offers several pre-trained AI models, including the MegaMolBART model, developed by NVIDIA and AstraZeneca.
LLMs improve computer vision
Transformers are also reshaping computer vision as powerful LLMs replace traditional convolutional AI models. For example, researchers from Meta AI and Dartmouth have designed TimeSformer, an AI model that uses transformers to analyze video with cutting-edge results.
Experts predict that such models could spawn all sorts of new applications in computational photography, education, and interactive experiences for mobile users.
In related work earlier this year, two companies released powerful AI models for generating images from text.
OpenAI announced DALL-E 2, a transformer model with 3.5 billion parameters designed to create realistic images from text descriptions. And recently, London-based Stability AI launched Stability Diffusion,
Write code, control robots
LLMs also help developers write software. Tabnine, a member of NVIDIA Inception, a program that encourages cutting-edge startups, says it automates up to 30% of the code generated by a million developers.
Taking the next step, researchers are using transformer-based models to teach robots used in manufacturing, construction, self-driving and personal assistants.
For example, DeepMind developed Gato, an LLM who taught a robotic arm how to stack blocks. The 1.2 billion parameter model has been trained on over 600 separate tasks so it can be useful in a variety of modes and environments, whether playing games or animating chatbots.
“By augmenting and iterating on this same basic approach, we can create a useful general-purpose agent,” the researchers said in a paper published in May.
It’s another example of what the Stanford Center in a July article called a paradigm shift in AI. “Basic models are only beginning to transform the way AI systems are built and deployed around the world,” he said.
See how companies around the world are implementing LLMs with NVIDIA Triton for multiple use cases.