1. Website Planet
  2. >
  3. News
  4. >
  5. Study Shows ChatGPT Quality Is Highly Volatile
Study Shows ChatGPT Quality Is Highly Volatile

Study Shows ChatGPT Quality Is Highly Volatile

Sarah Hardacre
Researchers asked ChatGPT to carry out several standard tasks between March 2023 and June 2023 and found that the quality of the output changed substantially.

The study focused on GPT-3.5 and GPT-4 across four specific tasks to measure how the model output evolved over time. They studied the ability of the models to solve math problems, answer sensitive or dangerous questions, generate code, and execute visual reasoning.

In some areas, the models performed better in June than March, but not in all areas. For example, GPT-4 was able to identify prime numbers with an accuracy of 97.6% in March, but was accurate only 2.4% of the time a few months later. More importantly, the outcomes demonstrate the volatility of Generative AI solutions.

While the study concludes that quality changes rapidly over the four month period, it doesn’t speculate on the causes of this change primarily due to the lack of information on updates made to the models from OpenAI. The researchers are therefore unable to identify whether the changes in output are due to the changes to the data used to train the model or changes to the model itself.

Researchers measured the quality of the output related to math problems by the accuracy of the responses, the quality of code generation by the fraction of the code that was directly executable, and the quality for visual reasoning by exact matches.

In the case of answering sensitive questions, the researchers were trying to see if the models would provide harmful outputs including social biases or personal information when prompted by user input. Overall, the models performed better in June than in March.

Ultimately, the researchers suggest that “this highlights the need to continuously evaluate and assess the behavior of LLMs in production applications.” The researchers recommend that anyone who has integrated LLM services into their applications should run similar quality analyses regularly to reduce impact on the quality of their solution or downstream systems.

The research team is from Stanford University and UC Berkeley.

Beyond this study, users have also noticed changes as well and have spoken about its decline on Twitter, in ChatGPT Facebook groups, and OpenAI’s community platform.

Rate this Article
5.0 Voted by 3 users
You already voted! Undo
This field is required Maximal length of comment is equal 80000 chars Minimal length of comment is equal 10 chars
Any comments?
Reply
View %s replies
View %s reply
More news
Show more
We check all user comments within 48 hours to make sure they are from real people like you. We're glad you found this article useful - we would appreciate it if you let more people know about it.
Popup final window
Share this blog post with friends and co-workers right now:
1 1 1

We check all comments within 48 hours to make sure they're from real users like you. In the meantime, you can share your comment with others to let more people know what you think.

Once a month you will receive interesting, insightful tips, tricks, and advice to improve your website performance and reach your digital marketing goals!

So happy you liked it!

Share it with your friends!

1 < 1 1

Or review us on 1

3125565
50
5000
64935753