MITB Banner

Kiss Me, Transformer – A Journey of OpenAI

Other large scale transformer models include EleutherAI GPT-J, BAAI's Wu Dao 2.0, Google's Switch Transformer, and NVIDIA-Microsoft’s MT-NLG.

Share

Kiss Me, Transformer – A Journey of OpenAI

In little less than five years, OpenAI has become one of the leading AI research labs globally, alongside other AI players like Alphabet’s DeepMind, EleutherAI and SambaNova Systems. Its love for transformers is never-ending and indescribable.

OpenAI has been making headlines (for both right and wrong reasons) for its research work, particularly in the area of transformers, unsupervised, transfer learning, and the most obvious, GPT-3, or generative pre-trained transformer 3. 

The Genesis  

Two years ago, OpenAI published a blog post and paper on GPT-2. It was created as a direct scale-up of the 2018 GPT model. That changed everything for the company, where it released a small parameter (124 million) GPT-2 model in February 2019, followed by a staged/phased release of its medium 335 million model, and subsequent research with partners and the AI community into the model’s potential for misuse and societal benefit. 

Since then, the craze for transformer models has grown significantly. For instance, Adam King launched ‘TalktoTransformer.com,’ giving people an interface to play with the newly released models. Meanwhile, Hugging Face released a conversational AI demo based on GPT-2 models and eventually decided not to release the large GPT-2 model due to ethical considerations. 

In addition, the company has also released an auto text completion tool called Write With Transformer, a web app created and hosted by Hugging Face, showcasing the generative capabilities of several models, including GPT-2 and others. Also, researchers with the University of Washington and the Allen Institute for AI research revealed GROVER, a GPT-2-type language model. AI-assisted development workflow DeepTabNine built a code autocomplete based on GPT-2. In 2019, other research works based on GPT-2 included DLGNet and GLTR

Some of the popular research papers published in the same year include ‘Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning,’ and ‘Hello. It’s GPT-2 – How can I help you? Towards the use of pre-trained language models for task-oriented dialogue systems.’ 

In August 2019, NVIDIA trained 8.3 billion parameter transformer model MegatronML, making it the largest transformer-based language model trained at 24x the size of BERT and 5.6x the size of GPT-2. In the same month, OpenAI released its larger parameter 774 million GPT-2 model. There was no stopping for them. 

Enters GPT-3

In November 2019, OpenAI released a complete version of the GPT-2 model with 1.5 billion parameters. This was followed by the release of GPT-3 with 175 billion parameters in 2020, whose access was provided exclusively through an API offered by Microsoft. Other large scale transformer models include EleutherAI GPT-J, BAAI’s Wu Dao 2.0, Google’s Switch Transformer, and NVIDIA-Microsoft’s Megatron Turing Natural Language Generation (MT-NLG). 

OpenAI was founded as a nonprofit AI research entity in 2015 by Sam Altman, Greg Brockman, Elon Musk, and others, who collectively invested $1 billion with a mission to develop artificial general intelligence (AGI). In 2019, Musk left OpenAI due to a difference of opinion. He had criticised OpenAI, arguing that the company should be more open.

Later, Microsoft invested about $1 billion in OpenAI and got exclusive access to GPT-3 source code.  This move completely altered the foundation of OpenAI, moving away from openness and towards commercialisation and secrecy. 

Here’s a complete timeline of OpenAI’s transformer language models in the last three years: 

Kiss Me, Transformer – A Journey of OpenAI

In a recent Q&A session, Altman spoke about the soon to be launched transformer model GPT-4, which is expected to have 100 trillion parameters — i.e., 500x the size of GPT-3. Altman also gave a sneak-peek into GPT-5 and said that it might pass the Turing test. 

Besides this, OpenAI recently launched OpenAI Codex, an AI system that translates natural language into code. It is a descendant of GPT-3. The training data contains both natural language and billions of lines of source code from open-source platforms, including code in public GitHub repositories. 

Final Thought

OpenAI’s GPT-3 model is one of the most talked-about models globally. That is because it has seen some real-world applications/use cases, including language understanding, machine translation, and time-series predictions, among others. 

While plenty of new players are emerging in the space and creating large-scale language models using transformers and other innovative techniques – with GPT-4 and GPT-5 just around the corner – there is no stopping for OpenAI. The possibilities are immense. It is only going to get exciting from here on out. 

Share
Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.