
Super Human Humor with GPT2

Technical Details

The goal of this project is to train a machine learning model to make jokes.

I found this blog post which trained GPT2 to generate Magic the Gathering text. I decided to adopt it for my project.

Next, I needed a training data set. I found a data set of all submission to the subreddit r/jokes till 2017. In order to be able to generate jokes on more recent topics, I scraped the subreddit from 2017 to 2020 with pushshift.

I selected all posts with more than 100 upvotes and less than 300 characters. This left me with 32663 jokes with which I fine-tuned GPT2 for 4 epochs. This took around an hour using a Tesla T4 through Google Colab.

Example Jokes

I selected a few generated jokes that made sense and were original (not known jokes). As r/jokes likes dark humor, my model readily spits out offensive jokes. So trigger warning for the light hearted reader!


One could argue that my model really only produces nonsense. But I firmly believe that it simply transcended human humor. So why should we understand its jokes?!

Code available here.
