Skeptophilia (skep-to-fil-i-a) (n.) - the love of logical thought, skepticism, and thinking critically. Being an exploration of the applications of skeptical thinking to the world at large, with periodic excursions into linguistics, music, politics, cryptozoology, and why people keep seeing the face of Jesus on grilled cheese sandwiches.
Showing posts with label linguistics. Show all posts
Showing posts with label linguistics. Show all posts

Saturday, August 30, 2025

The universal language

Sometimes I have thoughts that blindside me.

The last time that happened was a couple of days ago, while I was working in my office and our puppy, Jethro, was snoozing on the floor.  Well, as sometimes happens to dogs, he started barking and twitching in his sleep, and followed it up with sinister-sounding growls -- all the more amusing because while awake, Jethro is about as threatening as your average plush toy.

So my thought, naturally, was to wonder what he was dreaming about.  Which got me thinking about my own dreams, and recalling some recent ones.  I remembered some images, but mostly what came to mind were narratives -- first I did this, then the slimy tentacled monster did that.

That's when the blindside happened.  Because Jethro, clearly dreaming, was doing all that without language.

How would thinking occur without language?  For almost all humans, our thought processes are intimately tied to words.  In fact, the experience of having a thought that isn't describable using words is so unusual that we have a word for it -- ineffable.

Mostly, though, our lives are completely, um, effable.  So much so that trying to imagine how a dog (or any other animal) experiences the world without language is, for me at least, nearly impossible.

What's interesting is how powerful this drive toward language is.  There have been studies of pairs of "feral children" who grew up together but with virtually no interaction with adults, and in several cases those children invented spoken languages with which to communicate -- each complete with its own syntax, morphology, and phonetic structure.

A fascinating study that came out in the Proceedings of the National Academy of Sciences, detailing research by Manuel Bohn, Gregor Kachel, and Michael Tomasello of the Max Planck Institute for Evolutionary Anthropology, showed that you don't even need the extreme conditions of feral children to induce the invention of a new mode of symbolic communication.  The researchers set up Skype conversations between monolingual English-speaking children in the United States and monolingual German-speaking children in Germany, but simulated a computer malfunction where the sound didn't work.  They then instructed the children to communicate as best they could anyhow, and gave them some words/concepts to try to get across.

They started out with some easy ones.  "Eating" resulted in the child miming eating from a plate, unsurprisingly.  But they moved to harder ones -- like "white."  How do you communicate the absence of color?  One girl came up with an idea -- she was wearing a polka-dotted t-shirt, and pointed to a white dot, and got the idea across.

But here's the interesting part.  When the other child later in the game had to get the concept of "white" across to his partner, he didn't have access to anything white to point to.  He simply pointed to the same spot on his shirt that the girl had pointed to earlier -- and she got it immediately.

Language is defined as arbitrary symbolic communicationArbitrary because with the exception of a few cases like onomatopoeic words (bang, pow, ping, etc.) there is no logical connection between the sound of a word and its referent.  Well, here we have a beautiful case of the origin of an arbitrary symbol -- in this case, a gesture -- that gained meaning only because the recipient of the gesture understood the context.

I'd like to know if such a gesture-language could gain another characteristic of true language -- transmissibility.  "It would be very interesting to see how the newly invented communication systems change over time, for example when they are passed on to new 'generations' of users," said study lead author Manuel Bohn, in an interview with Science Daily.  "There is evidence that language becomes more systematic when passed on."

In time, might you end up with a language that was so heavily symbolic and culturally dependent that understanding it would be impossible for someone who didn't know the cultural context -- like the Tamarians' language in the brilliant, poignant, and justifiably famous Star Trek: The Next Generation episode "Darmok"?

"Sokath, his eyes uncovered!"

It's through cultural context, after all, that languages start developing some of the peculiarities (also seemingly arbitrary) that led Edward Sapir and Benjamin Whorf to develop the hypothesis that now bears their names -- that the language we speak alters our brains and changes how we understand abstract concepts.  In K. David Harrison's brilliant book The Last Speakers, he tells us about a conversation with some members of a nomadic tribe in Siberia who always described positions of objects relative to the four cardinal directions -- so at the moment my coffee cup wouldn't be on my right, it would be south of me.  When Harrison tried to explain to his Siberian friends how we describe positions, at first he was greeted with outright bafflement.

Then, they all erupted in laughter.  How arrogant, they told him, that you see everything as relative to your own body position -- as if when you turn around, suddenly the entire universe changes shape to compensate for your movement!



Another interesting example of this was the subject of a 2017 study by linguists Emanuel Bylund and Panos Athanasopoulos, and focused not on our experience of space but of time.  And they found something downright fascinating.  Some languages (like English) are "future-in-front," meaning we think of the future as lying ahead of us and the past behind us, turning time into something very much like a spatial dimension.  Other languages retain the spatial aspect, but reverse the direction -- such as the Peruvian language of Aymara.  For them, the past is in front, because you can remember it, just as you can see what's in front of you.  The future is behind you -- therefore invisible.

Mandarin takes the spatial axis and turns it on its head -- the future is down, the past is up (so the literal translation of the Mandarin expression of "next week" is "down week").  Asked to order photographs of someone in childhood, adolescence, adulthood, and old age, they will place them vertically, with the youngest on top.  English and Swedish speakers tend to think of time as a line running from left (past) to right (future); Spanish and Greek speakers tended to picture time as a spatial volume, as if it were something filling a container (so emptier = past, fuller = future).

All of which underlines how fundamental to our thinking language is.  And further baffles me when I try to imagine how other animals think.  Because whatever Jethro was imagining in his dream, he was clearly understanding and interacting with it -- even if he didn't know to attach the word "squirrel" to the concept.

****************************************


Monday, August 25, 2025

Tall tales and folk etymologies

My master's degree is in historical linguistics, and one of the first things I learned was that it's tricky to tell if two words are related.

Languages are full of false cognates, pairs of words that look alike but have different etymologies -- in other words, their similarities are coincidental.  Take the words police and (insurance) policy.  Look like they should be related, right?

Nope.  Police comes from the Latin politia (meaning "civil administration"), which in turn comes from polis, "city."  (So it's a cognate to the last part of words like metropolis and cosmopolitan.)  Policy -- as it is used in the insurance business -- comes from the Old Italian poliza (a bill or receipt) and back through the Latin apodissa to the Greek ἀπόδειξις (meaning "a written proof or declaration").  To make matters worse, the other definition of policy -- a practice of governance -- comes from politia, so it's related to police but not to the insurance meaning of policy.

Speaking of government -- and another example of how you can't trust what words look like -- you might never guess that the word government and the word cybernetics are cousins.  Both of them come from the Greek κυβερνητικός -- a mechanism used to steer a ship.

My own research was about the extent of borrowing between Old Norse, Old English, and Old Gaelic, as a consequence of the Viking invasions of the British Isles that started in the eighth century C.E.  The trickiest part was that Old Norse and Old English are themselves related languages; both of them belong to the Germanic branch of the Indo-European language family.  So there are some legitimate cognates there, words that did descend in parallel in both languages.  (A simple example is the English day and Norwegian dag.)  So how do you tell if a word in English is there because it descended peacefully from its Proto-Germanic roots, or was borrowed from Old Norse-speaking invaders rather late in the game?

It isn't simple.  One group I'm fairly sure are Old Norse imports are most of our words that have a hard /g/ sound followed by an /i/ or an /e/, because some time around 700 C.E. the native Old English /gi/ and /ge/ words were palatalized to /yi/ and /ye/.  (Two examples are yield and yellow, which come from the Anglo-Saxon gieldan and geolu respectively.)  So if we have surviving words with a /gi/ or /ge/ -- gift, get, gill, gig -- they must have come into the language after 700, as they escaped getting palatalized to *yift, *yet, *yill, and *yig.  Those words -- and over a hundred more I was able to identify, using similar sorts of arguments -- came directly from Old Norse.

[Image licensed under the Creative Commons M. Adiputra, Globe of language, CC BY-SA 3.0]

Anyhow, the whole topic comes up because I've been seeing this thing going around on social media headed, "Did You Know...?" with a list of a bunch of words, and the curious and funny origins they supposedly have.

And almost all of them are wrong.

I've refrained from saying anything to the people who posted it, because I don't want to be the "Well, actually..." guy.  But it rankled enough that I felt impelled to write a post about it, so this is kind of a broadside "Well, actually...", which I'm not sure is any nicer.  But in any case, here are a few of the more egregious "folk etymologies," as these fables are called -- just to set the record straight.

  • History doesn't come from "his story," i.e., a deliberate way to tell men's stories and exclude women's.  The word's origins have nothing to do with men at all.  It comes from the Greek ‘ἱστορία, "inquiry."
  • Snob is not a contraction of the Latin sine nobilitate ("without nobility").  It's only attested back to the 1780s and is of unknown origin.
  • Marmalade doesn't have its origin with Mary Queen of Scots, who supposedly asked for it when she had a headache, leading her French servants to say "Marie est malade."  The word is much older than that, and goes back to the Portuguese marmelada, meaning "quince jelly," and ultimately to the Greek μελίμηλον, "apples preserved in honey."
  • Nasty doesn't come from the biting and vitriolic nineteenth-century political cartoonist Thomas Nast.  In fact, it predates Nast by several centuries (witness Hobbes's comment about medieval life being "poor, nasty, brutish, and short," which was written in 1651).  Nasty probably comes from the Dutch nestig, meaning "dirty."
  • Pumpernickel doesn't have anything to do with Napoleon and his alleged horse Nicole who supposedly liked brown bread, leading Napoleon to say that it was "Pain pour Nicole."  Its actual etymology is just as weird, though; it comes from the medieval German words pumpern and nickel and translates, more or less, to "devil's farts."
  • Crap has very little to do with Thomas Crapper, who perfected the design of the flush toilet, although it certainly sounds like it should (and his name and accomplishment probably repopularized the word's use).  Crapper's unfortunate surname comes from cropper, a Middle English word for "farmer."  As for crap, it seems to come from Medieval Latin crappa, "chaff," but its origins before that are uncertain.
  • Last, but certainly not least, fuck is not an acronym.  For anything.  It's not from "For Unlawful Carnal Knowledge," whatever Van Halen would have you believe, and those words were not hung around adulterers' necks as they sat in the stocks.  It also doesn't stand for "Fornication Under Consent of the King," which comes from the story that in bygone years, when a couple got married, if the king liked the bride's appearance, he could claim the right of "prima nocta" (also called "droit de seigneur"), wherein he got to spend the first night of the marriage with the bride.  (Apparently this did happen, but rarely, as it was a good way for the king to seriously piss off his subjects.)  But the claim is that afterward -- and now we're in the realm of folk etymology -- the king gave his official permission for the bride and groom to go off and amuse themselves as they wished, at which point he stamped the couple's marriage documents "Fornication Under Consent of the King," meaning it was now legal for the couple to have sex with each other.  The truth is, this is pure fiction. The word fuck comes from a reconstructed Proto-Germanic root *fug, meaning "to strike."  There are cognates (same meaning, different spelling) in just about every Germanic language there is.  In English, the word is one of the most amazing examples of lexical diversification I can think of; there's still the original sexual definition, but consider -- just to name a few -- "fuck that," "fuck around," "fuck's sake," "fuck up," "fuck-all," "what the fuck?", and "fuck off."  Versatile fucking word, that one.

So anyway.  Hope that sets the record straight.  I hate coming off like a know-it-all, but in this case I actually do know what I'm talking about.  A general rule of thumb (which has nothing to do with the diameter stick you're allowed to beat your wife with) is, "don't fuck with a linguist."  No acronym needed to make that clear.

****************************************


Saturday, July 19, 2025

Footprints

The southern tip of mainland Italy is called Calabria.  It's a strikingly beautiful place, containing three national parks (Pollino National ParkSila National Park and Aspromonte National Park), and a stretch of coastline -- near Reggio, facing across the Straits of Messina to Sicily -- that poet Gabriele D'Annunzio called "the most beautiful kilometer in Italy."  It's a region blessed with more than its share of dramatic scenery.

[Image licensed under the Creative Commons Cliff at Tropea, Italy, Sep 2005 , CC BY-SA 2.5]

Calabria forms the "toe of Italy's boot."  I remember noticing the country's odd shape when I was a kid and first became fascinated with maps (a fascination that remains with me today), and wondering why it looked like that; back then, when plate tectonics was still a new science, I doubt they really understood it on a level any deeper than "it's near a plate margin, and that moves stuff around."  Today, we have a much more detailed understanding of the geology of the area, and it is complex.

Tectonic map of southern Italy and Sicily [Image licensed under the Creative Commons Jpvandijk, J.P. van Dijk, Janpieter van Dijk, Johannes Petrus van Dijk, CentralMediterranean-GeotectonicMap, CC BY-SA 4.0]

On its simplest level, the entire southern half of Italy is being pushed to the southeast, and it's riding up and over the northern edge of the African Plate.  This process is responsible not only for the volcanism of the region -- Mount Etna being the most obvious example -- but the massive earthquakes that have shaped it, in part creating the gorgeous topography.  (It also has made it a dangerous place to live.  The Messina Earthquake of 1908, with an epicenter right across the straits from Calabria, had a magnitude of 7.1 and killed an estimated eighty thousand people, most of them in the first three minutes after the quake struck and the majority of the buildings collapsed.)

As interesting as the geology of the region is, that's not what spurred me to write about the topic today.  What I'd like to tell you about is Calabria's tremendous linguistic diversity, an embarrassment of riches packed into a small geographical area.  The main language, of course, is standard Italian, but a great many people there (especially in the southern parts) speak Calabrian, a Greek-influenced-Latin derivative that is mostly mutually intelligible with Italian but has some distinct vocabulary and pronunciations. 

Then there's Grecanico, which is derived from an archaic dialect of Byzantine Greek, and is spoken by a group of people descended from folks who settled in the region more than a thousand years ago and have somehow maintained their ethnic identity the whole time.  It's written with the Latin, not Greek, alphabet -- but other than that has more in common with Thessalian Greek than with Italian.

Another language that has little to do with Italian is Arbëresh, a dialect of Albanian brought in with migrants during the Late Middle Ages.  From some of its idiosyncrasies, it appears to be related to Tosk Albanian, a group of dialects spoken in the southern parts of Albania, near the border of Greece.  It's astonishing that we can still identify the part of the world the ancestors of the Arbëreshë people came from centuries ago -- by the peculiarities of the language they have spoken during the more than six hundred years they've lived in isolated communities in Calabria.

Finally, there's Gardiol, which is related to Occitan (also known as Provençal or Languedoc), the Romance language widely spoken in the southern half of France.  Like with Calabrian (and also Catalan in Spain), most Occitan speakers in France speak the majority language as well, but use Occitan when speaking with family, friends, and locals.  The ancestors of the speakers of Gardiol came in with the persecution of the Waldensian "heretics" in France in the thirteenth century, who found a refuge in a thinly-populated part of northern Calabria.  Once again -- amazingly -- they've retained their ethnic identity and language through all the vagaries of time since their arrival.

All of that -- and standard Italian as well -- in an area of around fifteen thousand square kilometers, a little more than the size of the state of Connecticut.

UNESCO describes all four of these languages -- Calabrian, Grecanico, Arbëresh, and Gardiol -- as "in serious danger of disappearing."  It's sad to think of these footprints of history vanishing, and taking along with them pieces of human culture that somehow had persisted for centuries.  I understand why this happens; in modern life, speaking and writing the dominant language is not only useful, it's often essential for getting a job and making a living.  These little pockets of other languages survived better when people had little mobility and even less connectedness to others living far away.  In today's world, they seem doomed.

Change is the fate of all things, but it inevitably comes with a sense of loss.  The linguistic diversity of the beautiful region of Calabria will, very likely, soon be gone.  Like biodiversity loss, this diminishes the richness of our world.  I hope that linguists are working to catalog and study these unique languages -- before the last native speakers are gone forever.

****************************************


Tuesday, July 8, 2025

Linguistic Calvinball

I've written here before about the monumental difficulty of translating written text when you (1) don't know what the character-to-sound correspondence is (including whether the script is alphabetic, syllabic, or ideographic), (2) don't know what language the script represents, and (3) don't know whether it's read left-to-right, right-to-left, or alternating every other line (boustrophedonic script).  This was what Arthur Evans, Alice Kober, and Michael Ventris were up against with the Linear B script of Crete.  That they succeeded is a testimony not only to their skill as linguists and to their sheer dogged persistence, but to the fact that they had absolutely astonishing pattern-recognition ability.  Despite my MA in linguistics and decent background in a handful of languages, I can't imagine taking on such a task, much less succeeding at it.

The problem becomes even thornier when you consider that what appears to be a script might be asemic -- something that looks like a real written language but is actually meaningless.  (Just a couple of months ago, I wrote here about an asemic text called A Book From the Sky that the creator himself said was nonsense, but that hasn't stopped people from trying to translate it anyhow.)

Which brings us to the Rohonc Codex.

The first certain mention of the Rohonc Codex is in the nineteenth century, although a 1743 catalog of the Rohonc (now the city of Rechnitz, Austria) Library might refer to it -- it says, "Magyar imádságok, volumen I in 12" ("Hungarian prayers in one volume, size duodecimo"). 

As you'll see, that the text represents prayers, or is even in Hungarian, very much remains to be seen.  The size matches; duodecimo means "twelve sheets, approximately 127 millimeters by 187 millimeters in size," and given that some of the earliest guesses about the book's contents were that it was a prayerbook in archaic Hungarian, it's possible that the catalog entry refers to the Codex.  The paper it's printed on appears to be sixteenth-century Venetian in origin, but of course this doesn't mean that's when the book was written -- only that it's unlikely to be any older than that.

One page of the Rohonc Codex [Image is in the Public Domain]

The drawings are rather crude, and the lettering doesn't resemble any known script, although various linguists have compared it to Hungarian runes, Dacian, a dialect of early Romanian, and some variant of Hindi.  Others think it's simply a forgery -- asemic, in other words -- with a sizable number attributing it to the antiquarian Sámuel Nemes, who was known to have forged other documents.

There's no sure connection between Nemes and the Rohonc Codex, however.  He's not known ever to have handled the document, and certainly never mentioned it.  So this seems as tentative as all the other explanations.

Attempts to use the statistical distribution of clusters of symbols, invoking such patterns as Zipf's Law -- the tendency across languages for the word rank to be inversely proportional to word frequency -- have also failed.

Like with A Book From the Sky, this hasn't stopped hopeful scholars from claiming success.  Some of them have been eye-rollingly bad, like the solution proposed in 1996 by one Attila Nyíri of Hungary.  Nyíri combined some Sumerian symbols with chance resemblances to the Latin alphabet, and used such expedients as rearranging letters and letting the same symbol correspond to more than one sound, and still came up with gibberish like, Eljött az Istened. Száll az Úr.  Ó.  Vannak a szent angyalok.  Azok.  Ó.  ("Your God has come.  The Lord flies.  Oh.  There are the holy angels.  Them.  Oh."

I'm perhaps to be excused for being reminded of the Dick and Jane readers.  "Oh, Jane, see Spot.  See Spot run.  Oh, Spot, don't roll in that dead squirrel.  Oh."

Another attempt, this one only marginally more plausible, was made by Romanian linguist Viorica Enăchiuc, and hypothesized that the document (1) is read right-to-left and bottom-to-top, and (2) was written in a Dacian dialect of Latin.  This one came up with lines like Solrgco zicjra naprzi olto co sesvil cas  ("O Sun of the live let write what span the time"), which still isn't exactly what I'd call lucid writing.  

Then there's the Indian linguist Mahesh Kumar Singh, who said the Codex is written left-to-right and top-to-bottom in Hindi, using an obscure variant of the ancient Brahmi script.  Singh translated one passage as, He bhagwan log bahoot garib yahan bimar aur bhookhe hai / inko itni sakti aur himmat do taki ye apne karmo ko pura kar sake ("Oh, my God!  Here the people is very poor, ill and starving, therefore give them sufficient potency and power that they may satisfy their needs.")  His "translation," though, was immediately excoriated by other linguists, who said that he was playing fast-and-loose with the script interpretation, and had come up with symbol-to-sound correspondences that were convenient to how he wanted the translation to come out, not what was supported in other texts.

So the whole enterprise has turned into the linguistic version of Calvinball (from Bill Watterson's brilliant Calvin and Hobbes).  If you make up the rules as you go, and never play by the same rules twice, anything can happen.

The upshot of it all is that the Rohonc Codex is still undeciphered, if there's even anything there to decipher.  Like the more famous Voynich Manuscript, it retains its aura of attractive mystery, because most of us can't resist a puzzle, even if a lot of the best linguists think the script is nonsense.  Because how do you prove decisively that something isn't sensible language?

After all, there are still people who think that Donald Trump's speeches make sense, even when he says shit like, "I saw engines about three, four years ago.  These things were coming—cylinders, no wings, no nothing—and they’re coming down very slowly, landing on a raft in the middle of the ocean someplace, with a circle, boom!  Reminded me of the Biden circles that he used to have, right?  He’d have eight circles, and he couldn’t fill ’em up.  But then I heard he beat us with the popular vote.  He couldn’t fill up the eight circles.  I always loved those circles, they were so beautiful, so beautiful to look at."

So maybe "Oh.  There are the holy angels.  Them.  Oh," isn't so bad.

In any case, I'm sure there'll be further attempts to solve it.  Which falls into the "no harm if it amuses you" department.  And who knows?  Maybe there's a team made up of this century's Evans/Kober/Ventris triumvirate who will actually succeed.

All I know is that attempting it is way above my pay grade.

****************************************


Monday, July 7, 2025

Dord, fnord, and nimrod

We were having dinner with our younger son a while back, and he asked if there was a common origin for the -naut in astronaut and the naut- in nautical.

"Yes," I said.  "Latin nauta, meaning 'sailor.'  Astronaut literally means 'star sailor.'  Also cosmonaut, but that one came from Latin to English via Russian."

"How about juggernaut?" he asked.

"Nope," I said.  "That's a false cognate.  Juggernaut comes from Hindi, from the name of a god, Jagannath.  Every year on the festival day for Jagannath, they'd bring out his huge stone statue on a wheeled cart, and the (probably apocryphal) story is that sometimes it would get away from them, and roll down the hill and crush people.  So it became a name for a destructive force that gets out of hand."

Nathan stared at me for a moment.  "How the hell do you know this stuff?" he asked.

"Two reasons.  First, M.A. in historical linguistics.  Second, it takes up lots of the brain space that otherwise would be used for less important stuff, like where I put my car keys and remembering to pay the utility bill."

I've been fascinated with words ever since I was little, which probably explains not only my degree but the fact that I'm a writer.  And it's always been intriguing to me how words not only shift in spelling and pronunciation, but shift in meaning, and can even pop into and out of existence in strange and unpredictable ways.  Take, for example, the word dord, that for eight years was in the Merriam-Webster New International Dictionary as a synonym for "density."  In 1931, Austin Patterson, the chemistry editor for Merriam-Webster, sent in a handwritten editing slip for the entry for the word density, saying, "D or d, cont./density."  He meant, of course, that in equations, the variable for density could either be a capital or a lower case letter d.  Unfortunately, the typesetter misread it -- possibly because Patterson's writing left too little space between words -- and thought that he was proposing dord as a synonym.

Well, the chemistry editor should know, right?  So into the dictionary it went.

It wasn't until 1939 that editors realized they couldn't find an etymology for dord, figured out how the mistake had come about, and the word was removed.  By then, though, it had found its way into other books.  It's thought that the error wasn't completely expunged until 1947 or so.

Then there's fnord, which is a word coined in 1965 by Kerry Thornley and Greg Hill as part of the sort-of-parody, sort-of-not Discordian religion's founding text Principia Discordia.  It refers to a stimulus -- usually a word or a picture -- that people are trained as children not to notice consciously, but that when perceived subliminally causes feelings of unease.  Government-sponsored mind-control, in other words.  It really took off when it was used in the 1975  Illuminatus! Trilogy, by Robert Shea and Robert Anton Wilson, which became popular with the counterculture of the time (for obvious reasons).

Fnord isn't the only word that came into being because of a work of fiction.  There's grok, meaning "to understand on a deep or visceral level," from Robert Heinlein's novel Stranger in a Strange Land.   A lot of you probably know that the quark, the fundamental particle that makes up protons and neutrons, was named by physicist Murray Gell-Mann after the odd line from James Joyce's Finnegan's Wake, "Three quarks for Muster Mark."  Less well known is that the familiar word robot is also a neologism from fiction, from Czech writer Karel Čapek's play R.U.R. (Rossum's Universal Robots); robota in Czech means "hard labor, drudgery," so by extension, the word took on the meaning of the mechanical servant who performed such tasks.  Our current definition -- a sophisticated mechanical device capable of highly technical work -- has come a long way from the original, which was closer to slave.

Sometimes words can, more or less accidentally, migrate even farther from their original meaning than that.  Consider nimrod.  It was originally a name, referenced in Genesis 10:8-9 -- "Then Cush begat Nimrod; he began to be a mighty one in the Earth.  He was a mighty hunter before the Lord."  Well, back in 1940, the episode of Looney Tunes called "A Wild Hare" was released, the first of many surrounding the perpetual chase between hunter Elmer Fudd and the Wascally Wabbit.  In the episode, Bugs calls Elmer "a poor little Nimrod" -- poking fun at his being a hunter, and a completely inept one at that -- but the problem was that very few kids in 1940 (and probably even fewer today) understood the reference and connected it to the biblical character.  Instead, they thought it was just a humorous word meaning "buffoon."  The wild (and completely deserved) popularity of Bugs Bunny led to the original allusion to "a mighty hunter" being swamped; ask just about anyone today what nimrod means and they're likely to say something like "an idiot."


Interestingly, another of Bugs's attempted coinages meaning "a fool" -- maroon, from the hilarious 1953 episode "Bully for Bugs" -- never caught on in the same way.  When he says about the bull, "What a maroon!", just about everyone got the joke, probably because both the word he meant (moron) and the conventional definition of the word he said (a purplish-red color) are familiar enough that we realized he was mispronouncing a word, not coining a new one.


It's still funny enough, though, that I've heard people say "What a maroon!" when referring to someone who's dumb -- but as a quote from a fictional character, not because they think it's the correct word.

Languages shift and flow constantly.  Fortunately for me, since language evolution is my area of study.  It's why the whole prescriptivism vs. descriptivism battle is honestly pretty comical -- the argument over whether, respectively, linguists are recording the way languages should be used (forever and ever amen), or simply describing how they are used.  Despite the best efforts of the prescriptivists, languages change all the time, sometimes in entirely sudden and unpredictable ways.  Slang words are the most obvious examples -- when I was a teacher, I was amazed at how slang came and went, how some words would be en vogue one month and passé the next, while others had real staying power.  (And sometimes resurface.  I still remember being startled the first time I heard a student unironically saying "groovy.")

But that's part of the fun of it.  That our own modes of communication change over time, often in response to cultural phenomena like books, television, and movies, is itself an interesting feature of our ongoing attempt to be understood. 

And I'm sure Bugs would be proud of how he's influenced the English language, even if it was inadvertent.

****************************************


Saturday, June 21, 2025

The labyrinths of meaning

A recent study found that regardless how thoroughly AI-powered chatbots are trained with real, sensible text, they still have a hard time recognizing passages that are nonsense.

Given pairs of sentences, one of which makes semantic sense and the other of which clearly doesn't -- in the latter category, "Someone versed in circumference of high school I rambled" was one example -- a significant fraction of large language models struggled with telling the difference.

In case you needed another reason to be suspicious of what AI chatbots say to you.

As a linguist, though, I can confirm how hard it is to detect and analyze semantic or syntactic weirdness.  Noam Chomsky's famous example "Colorless green ideas sleep furiously" is syntactically well-formed, but has multiple problems with semantics -- something can't be both colorless and green, ideas don't sleep, you can't "sleep furiously," and so on.  How about the sentence, "My brother opened the window the maid the janitor Uncle Bill had hired had married had closed"?  This one is both syntactically well-formed and semantically meaningful, but there's definitely something... off about it.

The problem here is called "center embedding," which is when there are nested clauses, and the result is not so much wrong as it is confusing and difficult to parse.  It's the kind of thing I look for when I'm editing someone's manuscript -- one of those, "Well, I knew what I meant at the time" kind of moments.  (That this one actually does make sense can be demonstrated by breaking it up into two sentences -- "My brother opened the window the maid had closed.  She was the one who had married the janitor Uncle Bill had hired.")

Then there are "garden-path sentences" -- named for the expression "to lead (someone) down the garden path," to trick them or mislead them -- when you think you know where the sentence is going, then it takes a hard left turn, often based on a semantic ambiguity in one or more words.  Usually the shift leaves you with something that does make sense, but only if you re-evaluate where you thought the sentence was headed to start with.  There's the famous example, "Time flies like an arrow; fruit flies like a banana."  But I like even better "The old man the boat," because it only has five words, and still makes you pull up sharp.

The water gets even deeper than that, though.  Consider the strange sentence, "More people have been to Berlin than I have."

This sort of thing is called a comparative illusion, but I like the nickname "Escher sentences" better because it captures the sense of the problem.  You've seen the famous work by M. C. Escher, "Ascending and Descending," yes?


The issue both with Escher's staircase and the statement about Berlin is if you look at smaller pieces of it, everything looks fine; the problem only comes about when you put the whole thing together.  And like Escher's trudging monks, it's hard to pinpoint exactly where the problem occurs.

I remember a student of mine indignantly telling a classmate, "I'm way smarter than you're not."  And it's easy to laugh, but even the ordinarily brilliant and articulate Dan Rather slipped into this trap when he tweeted in 2020, "I think there are more candidates on stage who speak Spanish more fluently than our president speaks English."

It seems to make sense, and then suddenly you go, "... wait, what?"

An additional problem is that words frequently have multiple meanings and nuances -- which is the basis of wordplay, but would be really difficult to program into a large language model.  Take, for example, the anecdote about the redoubtable Dorothy Parker, who was cornered at a party by an insufferable bore.  "To sum up," the man said archly at the end of a long diatribe, "I simply can't bear fools."

"Odd," Parker shot back.  "Your mother obviously could."

A great many of Parker's best quips rely on a combination of semantic ambiguity and idiom.  Her review of a stage actress that "she runs the gamut of emotions from A to B" is one example, but to me, the best is her stinging jab at a writer -- "His work is both good and original.  But the parts that are good are not original, and the parts that are original are not good."

Then there's the riposte from John Wilkes, a famously witty British Member of Parliament in the last half of the eighteenth century.  Another MP, John Montagu, 4th Earl of Sandwich, was infuriated by something Wilkes had said, and sputtered out, "I predict you will die either on the gallows or else of some loathsome disease!"  And Wilkes calmly responded, "Which it will be, my dear sir, depends entirely on whether I embrace your principles or your mistress."

All of this adds up to the fact that languages contain labyrinths of meaning and structure, and we have a long way to go before AI will master them.  (Given my opinion about the current use of AI -- which I've made abundantly clear in previous posts -- I'm inclined to think this is a good thing.)  It's hard enough for human native speakers to use and understand language well; capturing that capacity in software is, I think, going to be a long time coming.

It'll be interesting to see at what point a large language model can parse correctly something like "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo."  Which is both syntactically well-formed and semantically meaningful.  

Have fun piecing together what exactly it does mean.

****************************************


Tuesday, May 20, 2025

Talking to the animals

An Introduction to Language (by Victoria Fromkin and Robert Rodman, Third Edition, 1974) defines language as "rule-governed arbitrary symbolic communication."

The "rule-governed" and "arbitrary" parts might seem contradictory, but they're not.  That language has rules is self-evident whether you are a prescriptivist (someone who believes there are correct and incorrect ways to use language) or a descriptivist (someone who believes that as long as communication is occurring, it's language; so the primary role of the linguist is not to enforce rules but to document them).  Being that my master's degree is in historical linguistics, I'm strongly of a descriptivist bent; if I thought there were an inflexible lexicon and set of grammatical rules that never ever changed, I'd kind of be out of a job.

The arbitrary part is less obvious.  It has to do with the sound-to-meaning correspondence.  Dog in English is inu in Japanese, chien in French, kare in Hausa, and hundur in Icelandic; none of those words are, in fact, especially doggy in nature.  Other than a handful of onomatopoeic words like bang, oink, meow, and hiccup, the connection between a word and its meaning is essentially accidental.

Curiously, humans are the only species on Earth that we are certain have true language, by the Fromkin and Rodman definition.  There's long been a suspicion that dolphin and whale vocalizations might be language, but as of this writing, that remains conjecture.  Recently, there have been some interesting studies of other primates indicating that certain features of language might exist outside of Homo sapiens -- a paper out of the University of Warwick last week suggests that orangutan vocalizations might exhibit recursion, the nesting structure you see in the children's rhyme "This is the House That Jack Built."  The researchers found that the sounds orangutans make are grouped into clusters, and those clusters put together in at least two additional tiers of structure, hinting that their vocalizations might have a much richer information-carrying capacity than we'd thought.

Another recent study, this one out of the University of Vienna, found that chimps might use drumming as a means of long-distance communication -- that the spacing of beats when they drum on tree roots varies but is non-random.  Like the recursion found in orangutans, the fact that the rhythm of drumming in chimps isn't just random noise opens up the possibility that it might be meaningful.  The researchers found that different chimps have different rhythmic styles, and that groups also developed their own unique patterns of drumming -- suggestive that drumming in chimps could be a cultural phenomenon.

How we developed language, and (likely) no other extant species did, is still open to question.  There are some interesting genetic pieces to the puzzle; the forkhead box protein 2 (FOX-P2) gene seems to be an important one, as the human variant of FOX-P2 isn't found in any known living species other than ourselves, and mutations in that sequence result in significant problems with learning and utilizing language.  (Genetic studies of Neanderthal remains found that Neanderthals had an identical FOX-P2 gene to that of modern humans; obviously we can't be sure that they had language, but it seems likely.)

[Image licensed under the Creative Commons Emw, Protein FOX-P2 PDB 2a07, CC BY-SA 3.0]

Actually, it was genetics that got me thinking about this topic today; yet another study, this one out of Rockefeller University and Cold Springs Harbor Laboratory, did a gene insertion on mice, replacing the murine version of the NOVA-1 gene with the human variant.  The human NOVA-1 has only a single base pair substitution as compared with that of other mammals, but -- like FOX-P2, damage to this gene is known to impair language learning and production.

And when you replace a mouse embryo's NOVA-1 gene with a human's, the resulting adult mouse is capable of making strikingly more complex vocalizations than your ordinary mouse can do.

"When adult male mice were genetically altered with the human NOVA-1 variant, their squeaks during courtship didn't become higher pitched like the pups," said Robert Darnell, who was lead author on the paper.  "Instead, their vocalizations included more complex syllables.  They 'talked' differently to the female mice.  One can imagine how such changes in vocalization could have a profound impact on evolution....  NOVA-1 encodes a protein that can cut out and rearrange sections of messenger RNA when it binds to neurons.  This changes how brain cells synthesize proteins, probably creating molecular diversity in the central nervous system...  The 'humanized' mice with the NOVA-1 variant had molecular changes in the RNA splicing seen in brain cells, especially in regions associated with vocal behavior."

So we're one step closer to figuring out a uniquely human phenomenon.  That communication in the animal world exists on a spectrum of complexity is certain, but by the Fromkin/Rodman definition, we're kind of it for true language, as far as we know.  How we gained that ability is still not entirely clear, but its advantages are obvious -- and it may be that mutations in two regulatory genes are what kickstarted a capacity for chatter that in large part is responsible for our dominance of the entire biosphere.

****************************************


Monday, May 19, 2025

The loss of memory

British science historian James Burke has a way of packing a lot of meaning into a small space.

I still recall the first time I watched his amazing series The Day the Universe Changed, in which he looked at moments in history that radically altered the direction of human progress.  The final installment, titled "Worlds Without End," had several jaw-hanging-open scenes, but one that stuck with me was near the beginning, where he's recapping some of the inventions that had led to our current scientific outlook and high-tech world.  "In the fifteenth century," Burke said, "the invention of the printing press by Johannes Gutenberg took our memories away."

Being someone who has always loved the written word, it had honestly never occurred to me that writing -- and, even more, mass printing -- had a downside; the fact that we no longer have to commit information to memory, but can rely on what amount to external memory storage devices.  Burke, of course, is hardly the first person to make this observation.  Back in around 370 B.C.E., Socrates (as recorded by his disciple Plato in the dialogue Phaedrus) comments that the invention of writing is as much a curse as a blessing, a viewpoint he frames as a discussion between the Egyptian gods Thamus and Thoth, the latter of whom is credited with the creation of Egyptian hieroglyphics:

"This invention, O king," said Thoth, "will make the Egyptians wiser and will improve their memories; for it is an elixir of memory and wisdom that I have discovered."  But Thamus replied, “Most ingenious Thoth, one man has the ability to beget arts, but the ability to judge of their usefulness or harmfulness to their users belongs to another; and now you, who are the father of letters, have been led by your affection to ascribe to them a power the opposite of that which they really possess.

"For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory.  Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them.  You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise."

Socrates also points out that once written, a text is open to anyone's interpretation; it can't say, "Hey, wait, that's not what I meant:"

I cannot help feeling, Phaedrus, that writing is unfortunately like painting; for the creations of the painter have the attitude of life, and yet if you ask them a question they preserve a solemn silence.  And the same may be said of speeches.  You would imagine that they had intelligence, but if you want to know anything and put a question to one of them, the speaker always gives one unvarying answer.  And when they have been once written down they are tumbled about anywhere among those who may or may not understand them, and know not to whom they should reply, to whom not: and, if they are maltreated or abused, they have no parent to protect them; and they cannot protect or defend themselves.

And certainly he has a point.  A writer can write down nonsense just as easily as universal truth, and (as I've found out with my own writing!) two people reading the same passage can come to completely different conclusions about what it means.  Even the most careful and skillful writing can't avoid all ambiguity.

I'm not clear that we're on any surer footing with the oral tradition, though.  Not only do we have the inevitable "mutations" in lineages passed down orally (a phenomenon that was used to brilliant effect by sociolinguist Jamshid Tehrani in his delightful research into the phylogeny of "Little Red Riding Hood"), there's the problem that suppression of cultures from invasion, colonization, or conquest often wipes out (or at least drastically alters) the cultural memory.

How much of our history, mythology, and knowledge has been erased simply because the last person who had the information died without ever passing it on?

[Image licensed under the Creative Commons Planemad, Chart of world writing systems, CC BY-SA 3.0]

Swiss philosopher Jean-Jacques Rousseau seems to side with Socrates, though.  In his Essay on the Origin of Languages, he writes:

Writing, which would seem to crystallize language, is precisely what alters it.  It changes not the words but the spirit, substituting exactitude for expressiveness.  Feelings are expressed in speaking, ideas in writing.  In writing, one is forced to use all the words according to their conventional meaning.  But in speaking, one varies the meanings by varying one’s tone of voice, determining them as one pleases.  Being less constrained to clarity, one can be more forceful.  And it is not possible for a language that is written to retain its vitality as long as one that is only spoken.
I wonder about that last bit.  Chinese has been a written language for over eight millennia, and I think you'd be hard-pressed to defend the opinion that it has "lost its vitality."  Seems to me that like most arguments of this ilk, the situation is complex.  Writing down our ideas may mean losing nuance and increasing the dependence on interpretation, but the gain in (semi-) permanence is pretty damn important.

And of course, this has bearing on our own century's old-school pearl-clutching; people decrying the shift toward electronic (rather than print) media, and in English, the fact that cursive isn't being taught in many elementary schools.  My guess is that like the loss of memory Socrates predicted, and Rousseau's concerns over the "crystallization" of language into something flat and dispassionate, the human mind -- and our ability to communicate meaningfully -- will survive this latest onslaught.

So I'm still in favor of the written word.  Obviously.  My own situation is a little like the exchange between the Chinese philosophers Lao Tsu and Zhuang Zhou.  Lao Tsu, in his book Tao Te Ching, famously commented, "Those who say don't know, and those who know don't say."  To which Zhuang Zhou wryly responded, "If 'those who say don't know and those who know don't say,' why is Lao Tsu's book so long?"

****************************************


Thursday, May 15, 2025

Borrowers and lenders

My master's thesis is titled, "The Linguistic Effects of the Viking Invasions on England and Scotland," which should put it in contention for winning the Scholarly Research With The Least Practical Applications Award.

Even so, I still think it's a pretty interesting topic.  My contention was that the topography of the two countries are a big part of the reason that their languages, Old English and Old Gaelic respectively, were affected so differently.  England, with its largely level countryside and a networked road system even back then, adopted hundreds of Old Norse borrow-words into every lexical category, even though the explicit rule by Scandinavia (the "Danelaw") was confined to the eastern half of the country and only lasted two centuries.  Hundreds of place names in England are Norse in origin; any town ending in "-by" owes that part of its name to the Norse word for "town."  (Similarly. places ending in -thorpe, -thwaite, -foss, -toft, or -ness reflect a Norse influence; and all the streets in the city of York that end in -gate -- well, gata is Old Norse for "street.")  

The usual pattern is that languages borrow words for concepts they didn't already have covered, but Old English saw Norse supersede even perfectly good native words that were in wide use.  The result is that Modern English has way more words of Norse origin than you might expect, including many in the common, everyday vocabulary.  A few examples of the more than two hundred documented Norse borrow-words:

  • window
  • gift
  • sky
  • egg
  • scare
  • scream
  • anger
  • awkward
  • fellow

Even the pronoun "they" is Norse in origin; the Old English words for "he," "she," and "they," hé, híe, and héo, respectively, were pronounced so much alike that it could be confusing knowing who you were talking about.  The practical English fixed this by palatalizing híe to she and adopting the Norse third-person plural pronoun ∂eira as our modern "they" and "their."

Gaelic, though, responded differently.  Scotland was (and is) rugged terrain, and the big settlements tended to be clustered around the coast and inland waterways.  Even though Scandinavian rule in Scotland lasted much longer -- Norwegian rule of the Hebrides didn't end until 1266 -- the influence on the language was minor, and largely restricted to place names (the -ey found in the names of lots of the islands of Scotland simply means "island" in Old Norse) and terms related to living near water.  The Gaelic words for net, sail, anchor, boat, ford, delta, beach, seagull, seaweed, and skiff are all Norse in origin, but of the common vocabulary, only a few are (including the words for noise, shoe, guide, time, and scatter).

[Nota bene: The Orkneys were a different matter entirely.  Norse rule in the Orkneys continued until 1472, and the people there actually lost Gaelic altogether.  Until the eighteenth century the main language was Norn, a dialect of West Norse, at which point it was superseded by the Orcadian dialect of Scots English.  The last native speaker of Norn died in 1850.]

Of course, English is an amalgam of a great many languages; not only did the Vikings leave their thumbprint on it, but the Normans in the eleventh century brought in a great many words of French origin.  Additionally, a lot of our technical vocabulary comes from Latin and Greek.  Until the eighteenth century, English was kind of a backwater language spoken only by people in one corner of Europe, so when scientists and other academics from different countries were communicating, they usually did so in Latin.  The result is that we still have a ton of Latin and Greek borrow-words in English, including most of our scientific, legal, and scholarly vocabulary.  To demonstrate how dependent the sciences are on Latin and Greek roots, the brilliant science fiction author Poul Anderson wrote a piece on the atomic theory using only words native to Old English -- and the result ("Uncleftish Beholding") sounds like some ancient mythological tale, and gives you an idea of just how much Latin and Greek have influenced the cadence of our language.  Here's a short excerpt to give the flavor, but you really should read the whole thing, because it's just that wonderful:

For most of its being, mankind did not know what things are made of, but could only guess.  With the growth of worldken, we began to learn, and today we have a beholding of stuff and work that watching bears out, both in the workstead and in daily life.

The underlying kinds of stuff are the *firststuffs*, which link together in sundry ways to give rise to the rest.  Formerly we knew of ninety-two firststuffs, from waterstuff, the lightest and barest, to ymirstuff, the heaviest. Now we have made more, such as aegirstuff and helstuff.

The firststuffs have their being as motes called *unclefts*.  These are mightly small; one seedweight of waterstuff holds a tale of them like unto two followed by twenty-two naughts.  Most unclefts link together to make what are called *bulkbits*.  Thus, the waterstuff bulkbit bestands of two waterstuff unclefts, the sourstuff bulkbit of two sourstuff unclefts, and so on.  (Some kinds, such as sunstuff, keep alone; others, such as iron, cling together in ices when in the fast standing; and there are yet more yokeways.)  When unlike clefts link in a bulkbit, they make *bindings*.  Thus, water is a binding of two waterstuff unclefts with one sourstuff uncleft, while a bulkbit of one of the forestuffs making up flesh may have a thousand thousand or more unclefts of these two firststuffs together with coalstuff and chokestuff.
Everywhere English speakers went -- which, for better or worse, was kind of everywhere -- we picked up and adopted new words.  The result is a rich, often confusing patchwork quilt of a language, with strange sound-to-spelling correspondences, remnants of grammar and morphology from a dozen different places, and weird attempts to blend it all together.  (I don't know how many times I told students that the plurals of hippopotamus and rhinoceros were not hippopotami and rhinoceri.  That'd be trying to pluralize them like Latin words, and they're actually Greek -- hippopotamus is Greek for "river horse," and rhinoceros for "nose horn" -- so if you want to be fancy about it, it'd be hippoipotamou and rhinoucerates.  But that sounds pretentious as hell, so let's stick with hippopotamuses and rhinoceroses.)

Anyhow, that's our excursion into our peculiar hodgepodge of a language.  Hodgepodge, by the way, is French in origin, from hochepot, meaning "a stew."  The hoche part comes from the Old Germanic word hocher, meaning "to shake."

Okay, I'd better stop here.  I could do this all day.

****************************************