ChatGPT is considered a turbo learner, similar to the infamous robot in the 80s film »Number 5 Lives«. Except that the AI does not learn from books, but from digital data, which it scans even faster than number 5 can turn pages of paper. A study by researchers from Stanford University and UC Berkeley now reveals something surprising: instead of improving its performance, GPT is getting worse and worse in various disciplines. The scientists do not offer an explanation for the phenomenon.
Researchers examined response quality from March to June
The researchers studied the responses given by GPT over a period of several months, coming into contact with two different generations of development. They published the results of their study under the title "How does the behavior of ChatGPT change over time?" In their publication they describe how they set the GPT-3.5 and GPT-4 versions from March to June 2023 different tasks. On the one hand there were mathematical questions, on the other hand code creation questions. The AI should also draw visual conclusions and respond to sensitive content.
GPT-4 experienced significant performance degradation
The "most advanced" variant, GPT-4, suffered significant performance losses during this relatively short phase. While in March she was still able to identify the prime number 17,077 with 97.6 percent certainty, in June she was only able to do so in 2.4 percent of the queries. GPT-3.5 improved a bit on this one task. Also, in June, GPT-4 suddenly inserted quotation marks into generated codes, making them unexecutable. Performance in this area dropped from 52 percent of perfectly generated code to just 10 percent in March.
The data was published on Github. There the researchers warn all users of LLM offers to take a close look for themselves. Nobody can trust that AI systems, which have once proven to be reliable, will continue to produce usable data in the future.
Source: golem.de
Researchers examined response quality from March to June
The researchers studied the responses given by GPT over a period of several months, coming into contact with two different generations of development. They published the results of their study under the title "How does the behavior of ChatGPT change over time?" In their publication they describe how they set the GPT-3.5 and GPT-4 versions from March to June 2023 different tasks. On the one hand there were mathematical questions, on the other hand code creation questions. The AI should also draw visual conclusions and respond to sensitive content.
GPT-4 experienced significant performance degradation
The "most advanced" variant, GPT-4, suffered significant performance losses during this relatively short phase. While in March she was still able to identify the prime number 17,077 with 97.6 percent certainty, in June she was only able to do so in 2.4 percent of the queries. GPT-3.5 improved a bit on this one task. Also, in June, GPT-4 suddenly inserted quotation marks into generated codes, making them unexecutable. Performance in this area dropped from 52 percent of perfectly generated code to just 10 percent in March.
The data was published on Github. There the researchers warn all users of LLM offers to take a close look for themselves. Nobody can trust that AI systems, which have once proven to be reliable, will continue to produce usable data in the future.
Source: golem.de