Since its launch, OpenAI’s GPT-4 has been the talk of the town, marking yet another milestone in artificial intelligence. However, over the past few months, there’s been a rising suspicion within the AI community that GPT-4 has been “nerfed” or subtly downgraded. Despite these concerns, OpenAI maintains its stance that nothing has changed that would cause any significant impact on GPT-4’s performance or quality. But is that really the case?
For months, many have complained about ChatGPT and GPT-4 feeling nerfed (a reduction in capabilities and reasoning). A recent paper has reignited the discussion, as it details that over time, ChatGPT appears to have gotten dumber.
A Tale of Two Speeds
When GPT-4 was first launched, it was relatively slow. However, despite this sluggishness, the model produced nothing short of impressive results. With the passage of time, GPT-4’s speed has drastically increased. But this boost in speed seems to have come at the cost of response quality.
Interestingly, this degradation in quality appears more pronounced in the chat version of GPT-4, ChatGPT, than in the API version, which I primarily access through Typing Mind. This discrepancy in performance between the two versions raises some intriguing questions.
Unravelling the Mixture of Experts
Recently leaked details about GPT-4 suggest it utilizes a Mixture of Experts (MoE) architecture. In such a setup, a top-level model, or a router, dispatches prompts to various expert models. These models then return their responses, which the router evaluates and weighs to determine the final output.
Given this architecture, one can’t help but wonder whether the quality degradation stems from alterations to the router model, changes to one or more expert models, or both. Which leads us to the crucial question: why would such alterations be necessary?
Cost Efficiency: A Likely Motivator?
In my opinion, the answer lies in the high operational cost of GPT-4. OpenAI may have embarked on a cost-cutting mission, tweaking the model to make decisions faster and thus reduce computational time. One piece of evidence supporting this hypothesis is the persistence of the 25-message cap on ChatGPT. This cap may be a strategic move to keep computational demands in check.
A faster decision-making process, however, might lead to less accurate results as the evaluator might rush to select less ideal results or give up too soon.
The Code Interpreter: A Clue to the Puzzle?
Intriguingly, I’ve found that using the Code Interpreter model yields results more akin to the earlier, more impressive versions of GPT-4. This observation has led me to wonder if ChatGPT’s decline in quality stems from the removal of one of the expert models, possibly to expedite decision-making.
The idea is that the Code Interpreter may primarily be utilizing the Code expert to handle coding tasks separately rather than evaluating weights from a group of models. This could explain why GPT-4, through the API, still produces highly accurate and runnable code.
The Role of Task-specific Models
The differences in performance between ChatGPT and the Code Interpreter may also suggest a strategic shift towards task-specific models. For instance, the Code Interpreter isn’t just excellent at handling code-related tasks; it’s also adept at self-correction and performs well in other areas, such as writing and summarizing.
In light of these observations, it’s plausible that OpenAI’s recent updates aim at optimizing cost and efficiency by routing certain tasks to designated expert models, bypassing the intermediate step of weighing responses from multiple models.
Mixture of Experts: A Deep Dive
The Mixture of Experts (MoE) architecture is a type of machine learning model that combines the results of multiple “expert” models to generate its final output. The idea behind the MoE architecture is to break down complex problems into smaller, more manageable parts that individual models (the “experts”) can specialize in.
Each expert in the MoE architecture is essentially a neural network trained to be proficient in a particular subtask. When a new input is received, a “gating” or “routing” network determines which expert or combination of experts should handle the task. This routing network, trained alongside the expert networks, learns to recognize the types of inputs that each expert is best suited to handle.
The responses from the selected experts are then combined to form the final output. The method of combining can vary. In some cases, the gating network assigns weights to the outputs of the experts, effectively deciding how much each expert’s opinion should count in the final decision.
Here’s an analogy that might help visualize how MoE works. Imagine a team of doctors, each with their own speciality. When a patient comes in, the primary care physician (the gating network) determines which specialists (the experts) should see the patient based on their symptoms. After each specialist provides their diagnosis and recommended treatment (the outputs), the primary care physician weighs these recommendations based on their knowledge of each specialist’s expertise and the patient’s condition to form a final treatment plan.
The beauty of the MoE architecture lies in its flexibility. It can handle a wide range of tasks by leveraging the unique strengths of each expert. Moreover, it’s capable of parallel processing as different experts can simultaneously handle different parts of a task, potentially leading to improved efficiency.
In the context of GPT-4, the MoE architecture could potentially explain the changes observed in its performance. If OpenAI made modifications to the gating network or the experts, it could affect how tasks are distributed and how results are combined, leading to noticeable differences in response quality and speed.
Final Thoughts
While these speculations remain unconfirmed, they offer a fascinating glimpse into the potential evolution of GPT-4. As we continue to explore and understand this powerful tool, it’s crucial to watch its performance and the implications of any changes. After all, the journey of AI is as much about understanding its developments as it is about shaping its future.