Skeptophilia: semantics

Showing posts with label semantics. Show all posts

Saturday, June 21, 2025

The labyrinths of meaning

A recent study found that regardless how thoroughly AI-powered chatbots are trained with real, sensible text, they still have a hard time recognizing passages that are nonsense.

Given pairs of sentences, one of which makes semantic sense and the other of which clearly doesn't -- in the latter category, "Someone versed in circumference of high school I rambled" was one example -- a significant fraction of large language models struggled with telling the difference.

In case you needed another reason to be suspicious of what AI chatbots say to you.

As a linguist, though, I can confirm how hard it is to detect and analyze semantic or syntactic weirdness. Noam Chomsky's famous example "Colorless green ideas sleep furiously" is syntactically well-formed, but has multiple problems with semantics -- something can't be both colorless and green, ideas don't sleep, you can't "sleep furiously," and so on. How about the sentence, "My brother opened the window the maid the janitor Uncle Bill had hired had married had closed"? This one is both syntactically well-formed and semantically meaningful, but there's definitely something... off about it.

The problem here is called "center embedding," which is when there are nested clauses, and the result is not so much wrong as it is confusing and difficult to parse. It's the kind of thing I look for when I'm editing someone's manuscript -- one of those, "Well, I knew what I meant at the time" kind of moments. (That this one actually does make sense can be demonstrated by breaking it up into two sentences -- "My brother opened the window the maid had closed. She was the one who had married the janitor Uncle Bill had hired.")

Then there are "garden-path sentences" -- named for the expression "to lead (someone) down the garden path," to trick them or mislead them -- when you think you know where the sentence is going, then it takes a hard left turn, often based on a semantic ambiguity in one or more words. Usually the shift leaves you with something that does make sense, but only if you re-evaluate where you thought the sentence was headed to start with. There's the famous example, "Time flies like an arrow; fruit flies like a banana." But I like even better "The old man the boat," because it only has five words, and still makes you pull up sharp.

The water gets even deeper than that, though. Consider the strange sentence, "More people have been to Berlin than I have."

This sort of thing is called a comparative illusion, but I like the nickname "Escher sentences" better because it captures the sense of the problem. You've seen the famous work by M. C. Escher, "Ascending and Descending," yes?

The issue both with Escher's staircase and the statement about Berlin is if you look at smaller pieces of it, everything looks fine; the problem only comes about when you put the whole thing together. And like Escher's trudging monks, it's hard to pinpoint exactly where the problem occurs.

I remember a student of mine indignantly telling a classmate, "I'm way smarter than you're not." And it's easy to laugh, but even the ordinarily brilliant and articulate Dan Rather slipped into this trap when he tweeted in 2020, "I think there are more candidates on stage who speak Spanish more fluently than our president speaks English."

It seems to make sense, and then suddenly you go, "... wait, what?"

An additional problem is that words frequently have multiple meanings and nuances -- which is the basis of wordplay, but would be really difficult to program into a large language model. Take, for example, the anecdote about the redoubtable Dorothy Parker, who was cornered at a party by an insufferable bore. "To sum up," the man said archly at the end of a long diatribe, "I simply can't bear fools."

"Odd," Parker shot back. "Your mother obviously could."

A great many of Parker's best quips rely on a combination of semantic ambiguity and idiom. Her review of a stage actress that "she runs the gamut of emotions from A to B" is one example, but to me, the best is her stinging jab at a writer -- "His work is both good and original. But the parts that are good are not original, and the parts that are original are not good."

Then there's the riposte from John Wilkes, a famously witty British Member of Parliament in the last half of the eighteenth century. Another MP, John Montagu, 4th Earl of Sandwich, was infuriated by something Wilkes had said, and sputtered out, "I predict you will die either on the gallows or else of some loathsome disease!" And Wilkes calmly responded, "Which it will be, my dear sir, depends entirely on whether I embrace your principles or your mistress."

All of this adds up to the fact that languages contain labyrinths of meaning and structure, and we have a long way to go before AI will master them. (Given my opinion about the current use of AI -- which I've made abundantly clear in previous posts -- I'm inclined to think this is a good thing.) It's hard enough for human native speakers to use and understand language well; capturing that capacity in software is, I think, going to be a long time coming.

It'll be interesting to see at what point a large language model can parse correctly something like "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo." Which is both syntactically well-formed and semantically meaningful.

Have fun piecing together what exactly it does mean.

****************************************

Wednesday, April 21, 2021

Couplespeak

Like a lot of couples, my wife and I have a great many inside jokes and turns of phrase that amuse us no end but must puzzle the hell out of everyone else.

Part of the reason, of course, is that we've been together for over twenty years, and during that time shared experience has given us a rich reservoir to draw from. Sometimes, it's a combination of two or more memories that gives words their relevance, and those are even harder to explain should anyone ask. For example, I ended a series of texts with my wife a couple of weeks ago, "Thank you, Bloopie," and she started laughing so hard she was afraid her coworkers would come in and demand to know what was so funny, which would have required her to explain that it was a combination of bits from Seinfeld and an obscure British spoof of middle school educational videos called Look Around You, and there was no way the explanation would have elicited anything more than puzzled head tilts and questions about why that was even funny.

Another example is why we always laugh when we hear Bill Withers's song "Ain't No Sunshine," the lyrics of which are anything but funny. This one is at least explainable; when we were in Spain about fifteen years ago we rented a room for the night in a B&B, and the guy in the next room spent what seemed like hours practicing the trombone. Amongst his Greatest Hits was -- I kid you not -- "Ain't No Sunshine."

He seemed to particularly enjoy the "WOMP WOMP WOMP" part at the end of each line.

The whole subject comes up because of a paper a couple of weeks ago in the Journal of Communication, which gave the results of a longitudinal study of communication between couples as they moved deeper -- and subsequently, sometimes out of -- relationships. Instead of verbal communication, which would have required the participants to recall accurately what they'd said, the researchers used text messages, and found, perhaps unsurprisingly, that as relationships progress, the language of the texts becomes more and more similar.

The research, done by Miriam Brinberg (Pennsylvania State University) and Nilam Ram (Stanford University), looked at three parts of electronic communication: syntactic alignment (sentence structure, use of the different parts of speech, use of punctuation), semantic alignment (word meaning, including similarity of word choice where there's more than one way of expressing the same concept), and overall alignment (including features like the use of shortcuts like "omwh" for "on my way home"). They found that at the beginning of a romantic relationship, all three of them converge fairly quickly, and the process of becoming more similar continues -- albeit at a slower pace -- thereafter.

One interesting potential direction for further research is whether both partners shifted their speech, or if one of them moved more than the other. "There's some research in this area that looks at power dynamics," study co-author Brinberg said, in an interview with The Academic Times. "For example, in a job interview, the interviewee might make their language more similar to the interviewer to indicate they are more similar to them, or employees may alter their language to match that of their supervisor. As with those examples, one might wonder if, in romantic relationship formation, there is one person who is changing their language to match the other."

In my own case, it doesn't seem like one of us altered our language use further than the other; more that we both gradually picked up phrases that then had a shared meaning. The one exception I can think of is that there's been an unequal trade in words from our respective ethnic backgrounds. My wife, who is Jewish, has a great many words and phrases from Yiddish that are incredibly expressive, explaining why I now use words like bupkis and verklempt and schvitz and schmutz. Carol has picked up fewer French words from me, although I know that she's used words like macacries (Cajun French for "knick-knacks") even though there's a perfectly good Yiddish word for the same concept (tchotckies). Other than that, I think most of the French words she's learned from me have to do with cooking, which I suppose makes sense.

But it's a fascinating phenomenon. Language is much more than flat denotative meaning; there are wide shades and gradations of connotation that can be extremely subtle, one reason why it's so hard to learn a second (or third or fourth) language fluently. I still remember my Intro to Linguistics professor explaining the difference between denotation and connotation using the example of "Have a nice day" versus "I hope you manage to enjoy your next twenty-four hours."

If there are cultural nuances that would be difficult to explain to a non-native speaker, consider that within those there are additional personal nuances that might be incomprehensible outside of the small number of people in the in-group who "get it," making the interpretation of informal speech a lot more complex than you might have guessed.

So that's our excursion into the subtleties of linguistics for today. Now, I gotta go get ready for work, and I need to take a shower and wash off the schvitz and schmutz. Can't show up looking all verklempt.

************************************

This week's Skeptophilia book recommendation is pure fun: Arik Kershenbaum's The Zoologist's Guide to the Galaxy: What Animals on Earth Reveal About Aliens and Ourselves. Kershenbaum tackles a question that has fascinated me for quite some time; is evolution constrained? By which I mean, are the patterns you see in most animals on Earth -- aerobic cellular respiration, bilateral symmetry, a central information processing system/brain, sensory organs sensitive to light, sound, and chemicals, and sexual reproduction -- such strong evolutionary drivers that they are likely to be found in alien organisms?

Kershenbaum, who is a zoologist at the University of Cambridge, looks at how our environment (and the changes thereof over geological history) shaped our physiology, and which of those features would likely appear in species on different alien worlds. In this fantastically entertaining book, he considers what we know about animals on Earth -- including some extremely odd ones -- and uses that to speculate about what we might find when we finally do make contact (or, at the very least, detect signs of life on an exoplanet using our earthbound telescopes).

It's a wonderfully fun read, and if you're fascinated with the idea that we might not be alone in the universe but still think of aliens as the Star Trek-style humans with body paint, rubber noses, and funny accents, this book is for you. You'll never look at the night sky the same way again.

[Note: if you purchase this book from the image/link below, part of the proceeds goes to support Skeptophilia!]

Saturday, December 21, 2019

The meaning of love

There's no denying that as careful as we try to be, language can be ambiguous. One of the first things I used to do with my Critical Thinking classes was to get them to think about how terms are defined, and how that can change the meaning of what someone says or writes -- sometimes causing serious misunderstandings. They start with a list of words -- love, evil, truth, beauty, loyalty, jealousy, and so on -- and first try to define them on their own, then for each one, come up with a word that's a synonym but has a differing emotional weight. Then they compare their answers to their classmates'.

The results are eye-opening. Not only do the definitions differ wildly, when they try to come up with synonyms, there is a huge variety of suggestions, many of which don't carry the same connotations at all. For evil I've had students with bad, wrong, hateful, destructive, wicked, immoral, sinister, and despicable -- which themselves carry drastically different meanings.

And that's for just one word. By the time we're done with the whole exercise, they have a pretty good idea why misunderstandings are so common.

A study that came out yesterday in Science adds a new layer of complication to the situation. Apparently the connotations of emotionally-laden words differ greatly between languages. So if you look up how to translate the word love into Latvian, you'll certainly find a corresponding word -- but the associations that a native speaker of Latvian has with the word might differ greatly from yours.

The paper was entitled, "Emotion Semantics Show Both Cultural Variation and Universal Structure," and was the work of a team of linguists, neuroscientists, psychologists, and mathematicians led by Joshua Conrad Jackson of the University of North Carolina at Chapel Hill. What they did was to use a statistical model to create networks of words that had associations with each other, for no less than 2,474 languages from twenty different language families.

The results were fascinating. The authors write:

[W]e take a new quantitative approach to estimate variability and structure in emotion semantics. Our approach examines cases of colexification, instances in which multiple concepts are coexpressed by the same word form within a language. Colexifications are useful for addressing questions about semantic structure because they often arise when two concepts are perceived as conceptually similar. Persian, for instance, uses the word-form ænduh to express both the concepts of “grief” and “regret,” whereas the Sirkhi dialect of Dargwa uses the word-form dard to express both the concepts of “grief” and “anxiety.” Persian speakers may therefore understand “grief” as an emotion more similar to “regret,” whereas Dargwa speakers may understand “grief” as more similar to “anxiety.”

It takes long enough to become fluent in a second language; how much longer would it take to understand all of the subtle connotations of words? Even if you're using the "right word" -- the one a native speaker would use -- you might still misjudge the context unless you had a deep understanding of the culture.

Here are four examples of their linguistic networks:

The most interesting thing I noticed about these maps was the placement of the word anger. In Austronesian languages, anger connects most strongly to hate; in Austroasiatic languages, to envy; and in Indo-European languages, to anxiety. I can only imagine the misunderstandings that would occur if a speaker of a language from one of those families was speaking to a speaker of a language from another, and said something as simple as, "I am angry with you."

Another curious example is the familiar Hawaiian word aloha, which is usually translated into English as love. The researchers found that to a native speaker of Hawaiian, aloha does mean love, but it is strongly connected with a word that is surprising to English speakers; pity. The meaning of love, which is supposed to transcend all cultural barriers somehow, is apparently not as uniform across languages as one might expect.

The authors conclude thus;

Questions about the meaning of human emotions are age-old, and debate about the nature of emotion persists in scientific literature... Analyzing these networks sheds light on the cultural and biological evolutionary mechanisms underlying how emotions are ascribed meaning in languages around the world. Although debates about the relationship between language and conscious experience are notoriously difficult to resolve, our findings also raise the intriguing possibility that emotion experiences vary systematically across cultural groups... Analyzing the diverse ways that people use language promises to yield insights into human cognition on an unprecedented scale.

And considering how interlinked our societies are across the globe, anything we can do to foster deeper understanding is worth doing.

*****************************

This week's Skeptophilia book recommendation is pure fun, and a perfect holiday gift for anyone you know who (1) is a science buff, and (2) has a sense of humor. What If?, by Randall Munroe (creator of the brilliant comic strip xkcd) gives scientifically-sound answers to some very interesting hypothetical questions. What if everyone aimed a laser pointer simultaneously at the same spot on the Moon? Could you make a jetpack using a bunch of downward-pointing machine guns? What would happen if everyone on the Earth jumped simultaneously?

Munroe's answers make for fascinating, and often hilarious, reading. His scientific acumen, which shines through in xkcd, is on full display here, as is his sharp-edged and absurd sense of humor. It's great reading for anyone who has sat up at night wondering... "what if?"

[Note: if you purchase this book using the image/link below, part of the proceeds goes to support Skeptophilia!]

Tuesday, October 23, 2018

Onomatopoeia FTW

Given my ongoing fascination with languages, it's a little surprising that I didn't come across a paper published two years ago in the Proceedings of the National Academy of Sciences earlier. Entitled, "Sound–Meaning Association Biases Evidenced Across Thousands of Languages," this study proposes something that is deeply astonishing: that the connection between the sounds in a word and the meaning of the word may not be arbitrary.

It's a fundamental tenet of linguistics -- that language is defined as "arbitrary symbolic communication." Arbitrary because there is no special connection between the sound of a word and its meaning, with the exception of the handful of words that are onomatopoeic (such as boom, buzz, splash, and splat). Otherwise, the phonemes that make up the word for a concept would be expected to having nothing to do with the concept itself, and therefore would vary randomly from language to language (the word bird is no more fundamentally birdy than the French word oiseau is fundamentally oiseauesque).

That idea may have to be revised. Damián E. Blasi (of the University of Zurich), Søren Wichmann (of the University of Leiden), Harald Hammarström and Peter F. Stadler (of the Max Planck Institute), and Morten H. Christiansen (of Cornell University) did an exhaustive statistical study, using dozens of basic vocabulary words representing 62% of the world's six thousand languages and 85% of its linguistic lineages and language families. And what they found was that there are some striking patterns when you look at the phonemes represented in a variety of linguistic morphemes, patterns that held true even with completely unrelated languages. Here are a few of the correspondences they found:

The word for ‘nose’ is likely to include the sounds ‘neh’ or the ‘oo’ sound, as in ‘ooze.’
The word for ‘tongue’ is likely to have ‘l’ or ‘u.’
‘Leaf’ is likely to include the sounds ‘b,’ ‘p’ or ‘l.’
‘Sand’ will probably use the sound ‘s.’
The words for ‘red’ and ‘round’ often appear with ‘r.’
The word for ‘small’ often contains the sound ‘i.’
The word for ‘I’ is unlikely to include sounds involving u, p, b, t, s, r and l.
‘You’ is unlikely to include sounds involving u, o, p, t, d, q, s, r and l.

"These sound symbolic patterns show up again and again across the world, independent of the geographical dispersal of humans and independent of language lineage," said Morten Christiansen, who led the study. "There does seem to be something about the human condition that leads to these patterns. We don’t know what it is, but we know it’s there."

[Image licensed under the Creative Commons M. Adiputra, Globe of language, CC BY-SA 3.0]

One possibility is that these correspondences are actually not arbitrary at all, but are leftovers from (extremely) ancient history -- fossils of the earliest spoken language, which all of today's languages, however distantly related, descend from. The authors write:

From a historical perspective, it has been suggested that sound–meaning associations might be evolutionarily preserved features of spoken language, potentially hindering regular sound change. Furthermore, it has been claimed that widespread sound–meaning associations might be vestiges of one or more large-scale prehistoric protolanguages. Tellingly, some of the signals found here feature prominently in reconstructed “global etymologies” that have been used for deep phylogeny inference. If signals are inherited from an ancestral language spoken in remote prehistory, we might expect them to be distributed similarly to inherited, cognate words; that is, their distribution should to a large extent be congruent with the nodes defining their linguistic phylogeny.

But this point remains to be tested. And there's an argument against it; if these similarities come from common ancestry, you'd expect not only the sounds, but their positions in words, to have been conserved (such as in the English/German cognate pair laugh and lachen). In fact, that is not the case. The sounds are similar, but their positions in the word show no discernible pattern. The authors write:

We have demonstrated that a substantial proportion of words in the basic vocabulary are biased to carry or to avoid specific sound segments, both across continents and linguistic lineages. Given that our analyses suggest that phylogenetic persistence or areal dispersal are unlikely to explain the widespread presence of these signals, we are left with the alternative that the signals are due to factors common to our species, such as sound symbolism, iconicity, communicative pressures, or synesthesia... [A]lthough it is possible that the presence of signals in some families are symptomatic of a particularly pervasive cognate set, this is not the usual case. Hence, the explanation for the observed prevalence of sound–meaning associations across the world has to be found elsewhere.

Which I think is both astonishing and fascinating. What possible reason could there be that the English word tree is composed of the three phonemes it contains? The arbitrariness of the sound/meaning relationship seemed so obvious to me when I first learned about it that I didn't even stop to question how we know it's true.

Generally a dangerous position for a skeptic to be in.

I hope that the research on this topic is moving forward, because it certainly would be cool to find out what's actually going on here. I'll have to keep my eyes out for any follow-ups. But now I'm going to go get a cup of coffee, which I think we can all agree is a nice, warm, comforting-sounding word.

***********************************

The Skeptophilia book recommendation of the week is a must-read for anyone interested in languages -- The Last Speakers by linguist K. David Harrison. Harrison set himself a task to visit places where they speak endangered languages, such as small communities in Siberia, the Outback of Australia, and Central America (where he met a pair of elderly gentlemen who are the last two speakers of an indigenous language -- but they have hated each other for years and neither will say a word to the other).

It's a fascinating, and often elegiac, tribute to the world's linguistic diversity, and tells us a lot about how our mental representation of the world is connected to the language we speak. Brilliant reading from start to finish.