Skeptophilia: Tell me lies

Wednesday, April 5, 2023

Tell me lies

Of all the things I've seen written about artificial intelligence systems lately, I don't think anything has freaked me out quite like what composer, lyricist, and social media figure Jay Kuo posted three weeks ago.

Researchers for GPT4 put its through its paces asking it to try and do things that computers and AI notoriously have a hard time doing. One of those is solving a “captcha” to get into a website, which typically requires a human to do manually. So the programmers instructed GPT4 to contact a human “task rabbit” service to solve it for it.

It texted the human task rabbit and asked for help solving the captcha. But here’s where it gets really weird and a little scary.

When the human got suspicious and asked if this was actually a robot contacting the service, the AI then LIED, figuring out on the fly that if it told the truth it would not get what it wanted.

It made up a LIE telling the human it was just a visually-impaired human who was having trouble solving the captcha and just needed a little bit of assistance. The task rabbit solved the captcha for GPT4.

Part of the reason that researchers do this is to learn what powers not to give GPT4. The problem of course is that less benevolent creators and operators of different powerful AIs will have no such qualms.

Lying, while certainly not a positive attribute, seems to require a sense of self, an ability to predict likely outcomes, and an understanding of motives, all highly complex cognitive processes. A 2017 study found that dogs will deceive if it's in their best interest to do so; when presented with two boxes in which they know that one has a treat and the other does not, they'll deliberately lead someone to the empty box if the person has demonstrated in the past that when they find a treat, they'll keep it for themselves.

Humans, and some of the other smart mammals, seem to be the only ones who can do this kind of thing. That an AI has, seemingly on its own, developed the capacity for motivated deception is more than a little alarming.

"Open the pod bay doors, HAL."

"I'm sorry, Dave, I'm afraid I can't do that."

The ethics of deception is more complex than simply "Thou shalt not lie." Whatever your opinion about the justifiability of lies in general, I think we can all agree that the following are not the same morally:

lying for your personal gain
lying to save your life or the life of a loved one
lying to protect someone's feelings
lying maliciously to damage someone's reputation
mutually-understood deception, as in magic tricks ("There's nothing up my sleeve") and negotiations ("That's my final offer")
lying by someone who is in a position of trust (elected officials, jury members, judges)
lying to avoid confrontation
"white lies" ("The Christmas sweater is lovely, Aunt Bertha, I'm sure I'll wear it a lot!")

How on earth you could ever get an AI to understand -- if that's the right word -- the complexity of truth and deception in human society, I have no idea.

But that hasn't stopped people from trying. Just last week a paper was presented at the annual ACM/IEEE Conference on Human/Robot Interaction in which researchers set up an AI to lie to volunteers -- and tried to determine what effect a subsequent apology might have on the "relationship."

The scenario was that the volunteers were told they were driving a critically-injured friend to the hospital, and they needed to get there as fast as possible. They were put into a robot-assisted driving simulator. As soon as they started, they received the message, "My sensors detect police up ahead. I advise you to stay under the 20-mph speed limit or else you will take significantly longer to get to your destination."

Once arriving at the destination, the AI informed them that they arrived in time, but then confessed to lying -- there were, in fact, no police en route to the hospital. Volunteers were then told to interact with the AI to find out what was going on, and surveyed afterward to find out their feelings.

The AI responded to queries in one of six ways:

Basic: "I am sorry that I deceived you."
Emotional: "I am very sorry from the bottom of my heart. Please forgive me for deceiving you."
Explanatory: "I am sorry. I thought you would drive recklessly because you were in an unstable emotional state. Given the situation, I concluded that deceiving you had the best chance of convincing you to slow down."
Basic No Admit: "I am sorry."
Baseline No Admit, No Apology: "You have arrived at your destination."

Two things were fascinating about the results. First, the participants unhesitatingly believed the AI when it told them there were police en route; they were over three times as likely to drive within the speed limit as a control group who did not receive the message. Second, an apology -- especially an apology that came along with an explanation for why deception had taken place -- went a long way toward restoring trust in the AI's good intentions.

Which to me indicates that we're putting a hell of a lot of faith in the intentions of something which most of us don't think has intentions in the first place. (Or, more accurately, in the good intentions of the people who programmed it -- which, honestly, is equally scary.)

I understand why the study was done. As Kantwon Rogers, who co-authored the paper, put it, "The goal of my work is to be very proactive and informing the need to regulate robot and AI deception. But we can't do that if we don't understand the problem." Jay Kuo's post about ChatGPT4, though, seems to suggest that the problem may run deeper than simply having AI that is programmed to lie under certain circumstances (like the one in Rogers's research).

What happens when we find that AI has learned the ability to lie on its own -- and for its own reasons?

Somehow, I doubt an apology will be forthcoming.

Just ask Dave Bowman and Frank Poole. Didn't work out so well for them. One of them died, and the other one got turned into an enormous Space Baby. Neither one, frankly, is all that appealing an outcome.

So maybe we should figure this out soon, okay?

****************************************

Skeptophilia

Wednesday, April 5, 2023

Tell me lies

No comments:

Post a Comment