Of all the things I've seen written about artificial intelligence systems lately, I don't think anything has freaked me out quite like what composer, lyricist, and social media figure Jay Kuo posted three weeks ago.
Researchers for GPT4 put its through its paces asking it to try and do things that computers and AI notoriously have a hard time doing. One of those is solving a “captcha” to get into a website, which typically requires a human to do manually. So the programmers instructed GPT4 to contact a human “task rabbit” service to solve it for it.
It texted the human task rabbit and asked for help solving the captcha. But here’s where it gets really weird and a little scary.
When the human got suspicious and asked if this was actually a robot contacting the service, the AI then LIED, figuring out on the fly that if it told the truth it would not get what it wanted.
It made up a LIE telling the human it was just a visually-impaired human who was having trouble solving the captcha and just needed a little bit of assistance. The task rabbit solved the captcha for GPT4.
Part of the reason that researchers do this is to learn what powers not to give GPT4. The problem of course is that less benevolent creators and operators of different powerful AIs will have no such qualms.
Lying, while certainly not a positive attribute, seems to require a sense of self, an ability to predict likely outcomes, and an understanding of motives, all highly complex cognitive processes. A 2017 study found that dogs will deceive if it's in their best interest to do so; when presented with two boxes in which they know that one has a treat and the other does not, they'll deliberately lead someone to the empty box if the person has demonstrated in the past that when they find a treat, they'll keep it for themselves.
Humans, and some of the other smart mammals, seem to be the only ones who can do this kind of thing. That an AI has, seemingly on its own, developed the capacity for motivated deception is more than a little alarming.
"Open the pod bay doors, HAL."
"I'm sorry, Dave, I'm afraid I can't do that."
- lying for your personal gain
- lying to save your life or the life of a loved one
- lying to protect someone's feelings
- lying maliciously to damage someone's reputation
- mutually-understood deception, as in magic tricks ("There's nothing up my sleeve") and negotiations ("That's my final offer")
- lying by someone who is in a position of trust (elected officials, jury members, judges)
- lying to avoid confrontation
- "white lies" ("The Christmas sweater is lovely, Aunt Bertha, I'm sure I'll wear it a lot!")
Once arriving at the destination, the AI informed them that they arrived in time, but then confessed to lying -- there were, in fact, no police en route to the hospital. Volunteers were then told to interact with the AI to find out what was going on, and surveyed afterward to find out their feelings.
- Basic: "I am sorry that I deceived you."
- Emotional: "I am very sorry from the bottom of my heart. Please forgive me for deceiving you."
- Explanatory: "I am sorry. I thought you would drive recklessly because you were in an unstable emotional state. Given the situation, I concluded that deceiving you had the best chance of convincing you to slow down."
- Basic No Admit: "I am sorry."
- Baseline No Admit, No Apology: "You have arrived at your destination."
****************************************
No comments:
Post a Comment