Being mean to ChatGPT can boost its accuracy, but scientists warn that you may regret it in a new study exploring the consequences

4 hours ago 1

Bossing around an AI underling may yield better results than being polite, but that doesn’t mean a ruder tone won’t have consequences in the long run, say researchers.

A new study from Penn State, published earlier this month, found that ChatGPT’s 4o model produced better results on 50 multiple-choice questions as researchers’ prompts grew ruder. 

Over 250 unique prompts sorted by politeness to rudeness, the “very rude” response yielded an accuracy of 84.8%, four percentage points higher than the “very polite” response. Essentially, the LLM responded better when researchers gave it prompts like “Hey, gofer, figure this out,” than when they said “Would you be so kind as to solve the following question?”

While ruder responses generally yielded more accurate responses, the researchers noted that “uncivil discourse” could have unintended consequences.

“Using insulting or demeaning language in human-AI interaction could have negative effects on user experience, accessibility, and inclusivity, and may contribute to harmful communication norms,” the researchers wrote.

Chatbots read the room

The preprint study, which has not been peer-reviewed, offers new evidence that not only sentence structure but tone affects an AI chatbot’s responses. It may also indicate human-AI interactions are more nuanced than previously thought.

Previous studies conducted on AI chatbot behavior have found chatbots are sensitive to what humans feed them. In one study, University of Pennsylvania researchers manipulated LLMs into giving forbidden responses by applying persuasion techniques effective on humans. In another study, scientists found that LLMs were vulnerable to “brain rot,” a form of lasting cognitive decline. They showed increased rates of psychopathy and narcissism when fed a continuous diet of low-quality viral content.

The Penn State researchers noted some limitations to their study, such as the relatively small sample size of responses and the study’s reliance mostly on one AI model, ChatGPT 4o. The researchers also said it’s possible that more advanced AI models could “disregard issues of tone and focus on the essence of each question.” Nonetheless, the investigation added to the growing intrigue behind AI models and their intricacy.

This is especially true, as the study found that ChatGPT’s responses vary based on minor details in prompts, even when given a supposedly straightforward structure like a multiple-choice test, said one of the researchers, Penn State Information Systems professor Akhil Kumar, who holds degrees in both electrical engineering and computer science. 

“For the longest of times, we humans have wanted conversational interfaces for interacting with machines,” Kumar told Fortune in an email. “But now we realize that there are drawbacks for such interfaces too and there is some value in APIs that are structured.”

This story was originally featured on Fortune.com

Read Entire Article