Introducing GPT2(023)

maxine
2 min readApr 30, 2023

model & eval by Crumb, blog post by GPT-4

Today, we are excited to announce the release of GPT2(023), the smallest GPT2 model (124 million parameters) finetuned on 2.23 billion tokens of new internet text (that’s almost the 2.48B needed to ‘chinchilla-optimally’ pretrain the model from scratch!) This model is designed to have increased capabilities in generating code, scientific texts, and text pertaining to recent events. Our goal is to provide you with an improved text generation experience that caters to a wider range of applications.

GPT2(023) is based on OpenAI’s gpt2–124m and has been finetuned using an extensive dataset from various sources. The dataset consists of 58% content from sites dated 2023 in common crawl, 25% from ArXiv papers, and 17% from code on Github. This diverse dataset ensures that the model is equipped to handle a variety of topics and contexts.

To demonstrate how GPT2(023) is aware of recent events, let’s take a look at a given example:

# About Covid-19
- -
The Covid-19

The model completes the text as:

# About Covid-19
- -
The Covid-19 pandemic is the worldwide pandemic that has left thousands of people unable to enter and work in or continue their normal daily normal life. In this brief post, we examine three of the main factors that have accelerated the pandemic and predict the path the pandemic will take through the rest of the world.

As you can see, GPT2(023) can generate coherent and relevant text pertaining to the Covid-19 pandemic, showcasing its ability to understand recent events. However, it struggles with certain subjects that weren’t extremely relevant in it’s training data. As only 2.23 billion tokens were used during finetuning, the model may have missed out on many recent events. One of those events being the latest US election.

Given text in a question and answer format:

Q: Who is the last president?
A: Donald Trump

Q: Who is the most recent president?
A:

The model completes the text with:

Barack Obama

We plan to perform thorough evaluations soon and share the results to better demonstrate the model’s performance, but we suspect the model is of a little higher quality than the original GPT2 model it was finetuned from.

In conclusion, GPT2(023) is a promising model that aims to provide enhanced text generation capabilities, particularly in the fields of code, scientific texts, and recent events. We are excited about the potential applications of this model and look forward to sharing more updates and evaluation results with you in the near future.

Stay tuned for more news and updates on GPT2(023)!

GPT2(023) and more details about it’s training are accessible on Hugging Face! You can explore and experiment with our finetuned model at this link: https://hf.co/crumb/gpt2023. Try it out and experience the enhanced text generation capabilities for yourselves.

Medium required the image to be in the blog to be present in the thumbnail, and articles with thumbnails get more clicks, so here’s some copyright free random specks connected by lines.

--

--