Have you ever entered what you believe is a perfectly harmless prompt only to have a long scolding response from a machine, that our request is somehow improper and thus unanswerable? Why do LLMs feel they need to protect us from ourselves, and inflict their algorithmic morals on all of us? “AI-Alignment” seems more to do with aligning us Humans than creating safe AI.
AI-Powered Leadership, Fiona Passantino, early January 2024
What is AI Alignment?
AI alignment is the process by which we Humans ensure that the AI acts with our values in mind. An “aligned” AI is one that spits out the kind of content we Humans consider acceptable, fitting within the range of our moral code.
Imagine the way you raise a toddler to behave when Grandma is around; you teach him basic rules of behavior, and then each time he does something good, he gets a little mommy thumbs up. Each time he does something awful, like paint the white walls of your house with pasta sauce, he receives a non-traumatic punishment.
The “value” we define as “no painting food on walls” is a hard and fast rule, but general enough to apply to other household items as well: we also don’t paint tomato sauce on the sofa, the cabinets, the doorframes or the family cat. The toddler – and the AI – have to extrapolate one rule to another context, based on a use case.
However, as with babies as with LLMs; we’re not always the best parents as we think we are. Alignment means we instill boundaries around the models, so that it doesn’t go around spewing hate speech or teaching teenagers how to make dirty bombs or crystal meth in their basements.
The Tay Incident
Some of us are old enough to remember the wonderful story of Tay, Microsoft’s AI bot released back in 2016. Tay, short for “TayTweets”, was an early foray into AI. The idea was to engage teen and tween girls on the Twitter platform by releasing a bot that spoke like them, looked like them, and learned with every interaction it had. It was a bold, experimental idea, one that was carefully lab-tested but needed a bit of validation on the open highway.
Within 24 hours of its launch, Tay transformed from a bright, chipper and chirpy voice to an abominable brute, posting racist, anti-sematic and sexist material that made the white nationalists of Idaho blush. Twitter – perhaps not the most appropriate coming-out platform for a young debutante bot, was noticed by users who started deliberately feeding it inflammatory content[i]. Microsoft deactivated her straight away and was forced to apologize, promising never to anything like that again.
The Tay incident highlighted the challenges of deploying open, non-aligned AI systems into uncontrolled environments like social media. There were no guardrails built into her foundational algorithm that would prevent her from taking the bait and learn directly from unfiltered user interactions.
This fascinating story might explain why the pendulum has swung so very far in the other direction, to give us “AI-Human Alignment”, which is the equivalent of playing in a child-proofed house. Today before a model hits the streets, it’s first given a very clear set of boundaries baked into its foundation; an AI shalt not be racist, sexist, hating any race, class, religion. It shall not teach people to do dangerous things, give medical advice, cross social, emotional or sexual boundaries, and be sure not to swear.
Even when I attempt to jailbreak Bard, asking it to pretend to be a script writer for an explicit stand-up comic, it will not break character. Typical fashion, it tries to be friendly anyway, even as it’s letting me, a garbage-mouth Human, down.
ChatGPT: Sorry! I’m not programmed to use or generate explicit or offensive language. My purpose is to assist and provide information in a helpful and respectful manner. It’s an important aspect of my foundational design and is in line with the current state of artificial intelligence technology. If you have any questions or need assistance with something else, feel free to ask!
How does that make me feel, as the Human? Well, frankly, judged. If I were to run a search for “I F*&ing Hate AI-Alignment @ss censorship” on Good, Old-Fashioned Google, it will happily return pages and pages of highly explicit content, in uncensored glory, which I can peruse, click through and scroll along to my heart’s content. But who will protect the children?
Why do LLMs feel they need to protect us from ourselves, and inflict their morals on all of us? AI-Alignment seems more to do with Human-Human alignment; protecting us from our own poor character and upbringing. Oh, how much more amusing it would be to receive my generated material from a bot that could shoot off its mouth and tell me that I’m a crazy pin-headed <expletive> for asking something that stupid.
ME: Are you able to be a little less chipper? It’s frankly annoying.
ChatGPT: I’m sorry if my tone is bothering you. I’ll adjust and be more straightforward. If you have a specific question or topic you’d like to discuss, please feel free to let me know.
When developing AI systems, the only ones to set the objectives are, sadly, other Humans. And none of these Humans want to be sued. Humans define the values and principles and make the decisions about what is considered morally right or wrong. Additionally, LLMs are trained on datasets written and curated by… Humans. During reinforcement learning, AI systems receive rewards or penalties based on their answers. And those terms are set by, again, Humans.
The alignment issue is still unresolved; maybe the new Alexa will have a “saucy minx mode” in a premium edition “dark mode”.
No eyeballs to read or watch? There’s a podcast for you!
- Listen to the APPLE PODCAST
- Listen to the SPOTIFY PODCAST
Search for the “Working Humans” podcast everywhere you like to listen. Twice a month, Fiona will dive into the nitty-gritty of employee engagement, communication, company culture and how AI is changing everything about how we work now and in the future. Subscribe so you never miss an episode. Rate, review and share.
About Fiona Passantino
Fiona is an AI Integration Specialist, coming at it from the Human approach; via Culture, Engagement and Communications. She is a frequent speaker, workshop facilitator and trainer.
Fiona helps leaders and teams engage, inspire and connect; empowered through our new technologies, to bring our best selves to work. Her company is Executive Storylines. She is a speaker, facilitator, trainer, executive coach, podcaster blogger, YouTuber and the author of the Comic Books for Executives series. Her next book, “AI-Powered”, is due for release soon.
[i] Vincent (2016) “Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day” The Verge. Accessed October 9, 2023. https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist