Blog

Super Human Humor with GPT2

Technical Details

The goal of this project is to train a machine learning model to make jokes.

I found this blog post which trained GPT2 to generate Magic the Gathering text. I decided to adopt it for my project.

Next, I needed a training data set. I found a data set of all submission to the subreddit r/jokes till 2017. In order to be able to generate jokes on more recent topics, I scraped the subreddit from 2017 to 2020 with pushshift.

I selected all posts with more than 100 upvotes and less than 300 characters. This left me with 32663 jokes with which I fine-tuned GPT2 for 4 epochs. This took around an hour using a Tesla T4 through Google Colab.

Example Jokes

I selected a few generated jokes that made sense and were original (not known jokes). As r/jokes likes dark humor, my model readily spits out offensive jokes. So trigger warning for the light hearted reader!

Conclusion

One could argue that my model really only produces nonsense. But I firmly believe that it simply transcended human humor. So why should we understand its jokes?!

Code available here.

Blog