Well, the rumours were true. GPT-4 has been announced, and it’s just as impressive as we had hoped. We’ve heard of big things for GPT-4 for months, so does the latest and greatest version of OpenAI’s hyped model live up to the hype?
For comparison, here is ChatGPT using GPT-3.5:
And here is ChatGPT using GPT-4:
The first thing you notice is reasoning is 5/5. Speed is 2/5, and conciseness is 4/5. The one thing you see before using GPT-4 is how it’s over half as fast as GPT-3.5. This is most likely due to the increased parameters the model deals with as it processes your inputs.
First Impressions
There is currently a cap which OpenAI says is a dynamic and temporary cap because it hasn’t been publicly released yet. Because I have ChatGPT Plus, I can access the GPT-4 model early. When writing this, my cap was 100 messages every 4 hours.
The first thing I noticed was GPT-4 is quite slow. I am used to the fast responses I get with GPT-3.5, but GPT-4 is very slow. The words seem to type slower. However, this might be an artificial speed constraint until publicly released. Or, it could be system load as the GPT-4 model might not have the required resources yet. I’ve seen GPT-3.5 respond to this slowly on the free version, so I imagine it will be faster.
But the second thing I noticed is the responses are a lot larger. In GPT-3.5, long responses weren’t uncommon to cut off part of the way through. You could type “continue”, and it would complete, but it was painful if you were generating code. GPT-4 can now handle up to 25,000 words of text.
In terms of tokens, GPT-4 has a context length of 8,192 tokens. Pretty much double the size of GPT-3.5, which is 4,096 tokens. More crazily, OpenAI also has a version of GPT-4, which supports 32,768 tokens. This isn’t available in ChatGPT right now, and I wonder if it will only be reserved for the API and Playground, not ChatGPT.
Quality of responses
Despite being a new model, the dataset is still the same 2021 cut-off data. As such, GPT-4 isn’t bringing any new information. Where things differ between GPT-3.5 and GPT-4 is how it interprets that data and returns it to you based on your inputs.
A simple test of GPT-4 capabilities is to take existing prompts and then re-run them through the new model in ChatGPT. Unbeknownst to some, I use ChatGPT to write satire. While the GPT-3.5 model did a fantastic job producing something, it produced robotic-sounding articles that didn’t meet the brief most of the time. It required some additional prompting and shaping to be something usable.
One problem I would always encounter is it loved to start off with, “In a shocking turn of events”, which gave it away that ChatGPT was being used. It would find another way to say something similar even when I told it not to. And it loved to say “In conclusion” at the end.
With GPT-4, I can tell you the end result is a night and day difference. GPT-3.5 wasn’t terrible, but the results aren’t fantastic. If I were a copywriter, GPT-3.5 would have been scary and GPT-4 terrifying. GPT-4 has reached a point where it can produce good and not only good but long results requiring minimal editing.
OpenAI also says that images can be used as input, and you can ask it to describe things in the image or boost a prompt without typing. Say, for example, take a drawing and some notes of a website, then produce the code for you.
Conclusion
The GPT-4 model is a significant leap in OpenAI’s models. It’s arguably the best model they have right now, which might be partly thanks to Microsoft’s involvement. The BingGPT integration was clearly using GPT-4 as the results were quite good, and now ChatGPT has the same level.
If ChatGPT 3.5 was seen as an assistant, ChatGPT 4 could be seen as a junior-level employee. It doesn’t get everything right, but it produces significantly better results, especially in writing. That should terrify authors, journalists and anyone that makes a living out of writing.