Since OpenAI released its long-awaited Code Interpreter plugin for ChatGPT, I have been playing with it extensively. Throwing everything at it, from a zip file of a large repository and asking it questions to uploading spreadsheets and generating imagery.
It appears that most people are using Code Interpreter for what it was intended for, working with data and code, being able to perform analysis and other awesome things on documents and so on.
While working on some writing tasks, I accidentally had the Code Interpreter plugin enabled and noticed that the results it produced seemed way better than GPT-4 usually produced. GPT-4 is great at writing, but this felt noticeably better. The Code Interpreter plugin feels like more than a plugin. It feels like a better version of GPT-4. Is Code Interpreter a beta test for GPT-4.5?
Because all this stuff can mess with your head a little due to the hallucinations, I started exclusively using Code Interpreter for all tasks beyond analysing files to see if the output quality was better. I came up with a prompt: Write me an article debating the merits of teaching monkeys to program
First, I asked GPT-4; you can find a link to the results here. And then I asked GPT-4 Code Interpreter, and you can find a link to the results here. It could have been sheer luck that the Code Interpreter version seemed better written and structured. It felt more real.
Then I did the same thing using another prompt: Write me a detailed analysis of why Germany lost in World War II. Once again, GPT-4 went first, and the results are here. And then again using GPT-4 Code Interpreter here. While some of the same reasons were provided, you can see Code Interpreter provided more detail and elaborated yet again.
Now, I could just be seeing what I want to see, but Code Interpreter seems to excel at writing tasks and most tasks across the board in comparison to GPT-4 base model. I am sure someone smarter than I am has already worked this out and determined why that might be. But, it is interesting to see a model intended for analysis excelling at other tasks that GPT-4 was already good at, but this makes things even better.