Skeptophilia (skep-to-fil-i-a) (n.) - the love of logical thought, skepticism, and thinking critically. Being an exploration of the applications of skeptical thinking to the world at large, with periodic excursions into linguistics, music, politics, cryptozoology, and why people keep seeing the face of Jesus on grilled cheese sandwiches.
Showing posts with label historical linguistics. Show all posts
Showing posts with label historical linguistics. Show all posts

Monday, August 25, 2025

Tall tales and folk etymologies

My master's degree is in historical linguistics, and one of the first things I learned was that it's tricky to tell if two words are related.

Languages are full of false cognates, pairs of words that look alike but have different etymologies -- in other words, their similarities are coincidental.  Take the words police and (insurance) policy.  Look like they should be related, right?

Nope.  Police comes from the Latin politia (meaning "civil administration"), which in turn comes from polis, "city."  (So it's a cognate to the last part of words like metropolis and cosmopolitan.)  Policy -- as it is used in the insurance business -- comes from the Old Italian poliza (a bill or receipt) and back through the Latin apodissa to the Greek ἀπόδειξις (meaning "a written proof or declaration").  To make matters worse, the other definition of policy -- a practice of governance -- comes from politia, so it's related to police but not to the insurance meaning of policy.

Speaking of government -- and another example of how you can't trust what words look like -- you might never guess that the word government and the word cybernetics are cousins.  Both of them come from the Greek κυβερνητικός -- a mechanism used to steer a ship.

My own research was about the extent of borrowing between Old Norse, Old English, and Old Gaelic, as a consequence of the Viking invasions of the British Isles that started in the eighth century C.E.  The trickiest part was that Old Norse and Old English are themselves related languages; both of them belong to the Germanic branch of the Indo-European language family.  So there are some legitimate cognates there, words that did descend in parallel in both languages.  (A simple example is the English day and Norwegian dag.)  So how do you tell if a word in English is there because it descended peacefully from its Proto-Germanic roots, or was borrowed from Old Norse-speaking invaders rather late in the game?

It isn't simple.  One group I'm fairly sure are Old Norse imports are most of our words that have a hard /g/ sound followed by an /i/ or an /e/, because some time around 700 C.E. the native Old English /gi/ and /ge/ words were palatalized to /yi/ and /ye/.  (Two examples are yield and yellow, which come from the Anglo-Saxon gieldan and geolu respectively.)  So if we have surviving words with a /gi/ or /ge/ -- gift, get, gill, gig -- they must have come into the language after 700, as they escaped getting palatalized to *yift, *yet, *yill, and *yig.  Those words -- and over a hundred more I was able to identify, using similar sorts of arguments -- came directly from Old Norse.

[Image licensed under the Creative Commons M. Adiputra, Globe of language, CC BY-SA 3.0]

Anyhow, the whole topic comes up because I've been seeing this thing going around on social media headed, "Did You Know...?" with a list of a bunch of words, and the curious and funny origins they supposedly have.

And almost all of them are wrong.

I've refrained from saying anything to the people who posted it, because I don't want to be the "Well, actually..." guy.  But it rankled enough that I felt impelled to write a post about it, so this is kind of a broadside "Well, actually...", which I'm not sure is any nicer.  But in any case, here are a few of the more egregious "folk etymologies," as these fables are called -- just to set the record straight.

  • History doesn't come from "his story," i.e., a deliberate way to tell men's stories and exclude women's.  The word's origins have nothing to do with men at all.  It comes from the Greek ‘ἱστορία, "inquiry."
  • Snob is not a contraction of the Latin sine nobilitate ("without nobility").  It's only attested back to the 1780s and is of unknown origin.
  • Marmalade doesn't have its origin with Mary Queen of Scots, who supposedly asked for it when she had a headache, leading her French servants to say "Marie est malade."  The word is much older than that, and goes back to the Portuguese marmelada, meaning "quince jelly," and ultimately to the Greek μελίμηλον, "apples preserved in honey."
  • Nasty doesn't come from the biting and vitriolic nineteenth-century political cartoonist Thomas Nast.  In fact, it predates Nast by several centuries (witness Hobbes's comment about medieval life being "poor, nasty, brutish, and short," which was written in 1651).  Nasty probably comes from the Dutch nestig, meaning "dirty."
  • Pumpernickel doesn't have anything to do with Napoleon and his alleged horse Nicole who supposedly liked brown bread, leading Napoleon to say that it was "Pain pour Nicole."  Its actual etymology is just as weird, though; it comes from the medieval German words pumpern and nickel and translates, more or less, to "devil's farts."
  • Crap has very little to do with Thomas Crapper, who perfected the design of the flush toilet, although it certainly sounds like it should (and his name and accomplishment probably repopularized the word's use).  Crapper's unfortunate surname comes from cropper, a Middle English word for "farmer."  As for crap, it seems to come from Medieval Latin crappa, "chaff," but its origins before that are uncertain.
  • Last, but certainly not least, fuck is not an acronym.  For anything.  It's not from "For Unlawful Carnal Knowledge," whatever Van Halen would have you believe, and those words were not hung around adulterers' necks as they sat in the stocks.  It also doesn't stand for "Fornication Under Consent of the King," which comes from the story that in bygone years, when a couple got married, if the king liked the bride's appearance, he could claim the right of "prima nocta" (also called "droit de seigneur"), wherein he got to spend the first night of the marriage with the bride.  (Apparently this did happen, but rarely, as it was a good way for the king to seriously piss off his subjects.)  But the claim is that afterward -- and now we're in the realm of folk etymology -- the king gave his official permission for the bride and groom to go off and amuse themselves as they wished, at which point he stamped the couple's marriage documents "Fornication Under Consent of the King," meaning it was now legal for the couple to have sex with each other.  The truth is, this is pure fiction. The word fuck comes from a reconstructed Proto-Germanic root *fug, meaning "to strike."  There are cognates (same meaning, different spelling) in just about every Germanic language there is.  In English, the word is one of the most amazing examples of lexical diversification I can think of; there's still the original sexual definition, but consider -- just to name a few -- "fuck that," "fuck around," "fuck's sake," "fuck up," "fuck-all," "what the fuck?", and "fuck off."  Versatile fucking word, that one.

So anyway.  Hope that sets the record straight.  I hate coming off like a know-it-all, but in this case I actually do know what I'm talking about.  A general rule of thumb (which has nothing to do with the diameter stick you're allowed to beat your wife with) is, "don't fuck with a linguist."  No acronym needed to make that clear.

****************************************


Thursday, May 15, 2025

Borrowers and lenders

My master's thesis is titled, "The Linguistic Effects of the Viking Invasions on England and Scotland," which should put it in contention for winning the Scholarly Research With The Least Practical Applications Award.

Even so, I still think it's a pretty interesting topic.  My contention was that the topography of the two countries are a big part of the reason that their languages, Old English and Old Gaelic respectively, were affected so differently.  England, with its largely level countryside and a networked road system even back then, adopted hundreds of Old Norse borrow-words into every lexical category, even though the explicit rule by Scandinavia (the "Danelaw") was confined to the eastern half of the country and only lasted two centuries.  Hundreds of place names in England are Norse in origin; any town ending in "-by" owes that part of its name to the Norse word for "town."  (Similarly. places ending in -thorpe, -thwaite, -foss, -toft, or -ness reflect a Norse influence; and all the streets in the city of York that end in -gate -- well, gata is Old Norse for "street.")  

The usual pattern is that languages borrow words for concepts they didn't already have covered, but Old English saw Norse supersede even perfectly good native words that were in wide use.  The result is that Modern English has way more words of Norse origin than you might expect, including many in the common, everyday vocabulary.  A few examples of the more than two hundred documented Norse borrow-words:

  • window
  • gift
  • sky
  • egg
  • scare
  • scream
  • anger
  • awkward
  • fellow

Even the pronoun "they" is Norse in origin; the Old English words for "he," "she," and "they," hé, híe, and héo, respectively, were pronounced so much alike that it could be confusing knowing who you were talking about.  The practical English fixed this by palatalizing híe to she and adopting the Norse third-person plural pronoun ∂eira as our modern "they" and "their."

Gaelic, though, responded differently.  Scotland was (and is) rugged terrain, and the big settlements tended to be clustered around the coast and inland waterways.  Even though Scandinavian rule in Scotland lasted much longer -- Norwegian rule of the Hebrides didn't end until 1266 -- the influence on the language was minor, and largely restricted to place names (the -ey found in the names of lots of the islands of Scotland simply means "island" in Old Norse) and terms related to living near water.  The Gaelic words for net, sail, anchor, boat, ford, delta, beach, seagull, seaweed, and skiff are all Norse in origin, but of the common vocabulary, only a few are (including the words for noise, shoe, guide, time, and scatter).

[Nota bene: The Orkneys were a different matter entirely.  Norse rule in the Orkneys continued until 1472, and the people there actually lost Gaelic altogether.  Until the eighteenth century the main language was Norn, a dialect of West Norse, at which point it was superseded by the Orcadian dialect of Scots English.  The last native speaker of Norn died in 1850.]

Of course, English is an amalgam of a great many languages; not only did the Vikings leave their thumbprint on it, but the Normans in the eleventh century brought in a great many words of French origin.  Additionally, a lot of our technical vocabulary comes from Latin and Greek.  Until the eighteenth century, English was kind of a backwater language spoken only by people in one corner of Europe, so when scientists and other academics from different countries were communicating, they usually did so in Latin.  The result is that we still have a ton of Latin and Greek borrow-words in English, including most of our scientific, legal, and scholarly vocabulary.  To demonstrate how dependent the sciences are on Latin and Greek roots, the brilliant science fiction author Poul Anderson wrote a piece on the atomic theory using only words native to Old English -- and the result ("Uncleftish Beholding") sounds like some ancient mythological tale, and gives you an idea of just how much Latin and Greek have influenced the cadence of our language.  Here's a short excerpt to give the flavor, but you really should read the whole thing, because it's just that wonderful:

For most of its being, mankind did not know what things are made of, but could only guess.  With the growth of worldken, we began to learn, and today we have a beholding of stuff and work that watching bears out, both in the workstead and in daily life.

The underlying kinds of stuff are the *firststuffs*, which link together in sundry ways to give rise to the rest.  Formerly we knew of ninety-two firststuffs, from waterstuff, the lightest and barest, to ymirstuff, the heaviest. Now we have made more, such as aegirstuff and helstuff.

The firststuffs have their being as motes called *unclefts*.  These are mightly small; one seedweight of waterstuff holds a tale of them like unto two followed by twenty-two naughts.  Most unclefts link together to make what are called *bulkbits*.  Thus, the waterstuff bulkbit bestands of two waterstuff unclefts, the sourstuff bulkbit of two sourstuff unclefts, and so on.  (Some kinds, such as sunstuff, keep alone; others, such as iron, cling together in ices when in the fast standing; and there are yet more yokeways.)  When unlike clefts link in a bulkbit, they make *bindings*.  Thus, water is a binding of two waterstuff unclefts with one sourstuff uncleft, while a bulkbit of one of the forestuffs making up flesh may have a thousand thousand or more unclefts of these two firststuffs together with coalstuff and chokestuff.
Everywhere English speakers went -- which, for better or worse, was kind of everywhere -- we picked up and adopted new words.  The result is a rich, often confusing patchwork quilt of a language, with strange sound-to-spelling correspondences, remnants of grammar and morphology from a dozen different places, and weird attempts to blend it all together.  (I don't know how many times I told students that the plurals of hippopotamus and rhinoceros were not hippopotami and rhinoceri.  That'd be trying to pluralize them like Latin words, and they're actually Greek -- hippopotamus is Greek for "river horse," and rhinoceros for "nose horn" -- so if you want to be fancy about it, it'd be hippoipotamou and rhinoucerates.  But that sounds pretentious as hell, so let's stick with hippopotamuses and rhinoceroses.)

Anyhow, that's our excursion into our peculiar hodgepodge of a language.  Hodgepodge, by the way, is French in origin, from hochepot, meaning "a stew."  The hoche part comes from the Old Germanic word hocher, meaning "to shake."

Okay, I'd better stop here.  I could do this all day.

****************************************


Monday, March 24, 2025

Walkabout

There's an ongoing war of words between people who consider themselves generalists and those who consider themselves specialists.

I recall being in the Graduate School of Oceanography at the University of Washington -- a placement that only lasted a semester, for a variety of reasons -- and my advisor sneeringly referring to generalists as "people who lack the focus, drive, and brains to stay with something long enough to learn it thoroughly."  Countering this is the quip that specialists are "learning more and more about less and less, until finally they'll know everything about nothing."

Although I am squarely in the generalist camp, I'm strongly of the opinion that we need both.  The specialists' depth and the generalists' breadth should be complementary, not in contention.  The focus of specialists has given us most of our detailed knowledge of science and technology; the wide-ranging interest of generalists -- who, in a kinder time, were called polymaths rather than dilettantes or dabblers -- allow them to draw connections between disparate fields, and bring that curiosity and wonder to others.

I'm hoping this doesn't come across as self-defensive, given my B.S. in physics, attempted/abortive M.S. in oceanography, final M.A. in historical linguistics, and teaching certification in biology.  Perhaps my long-ago advisor wasn't entirely incorrect; my "oh look something shiny!" approach to learning would likely have made a Ph.D. in anything unattainable.  But it does have the distinct advantage that I'm still unendingly curious about the world, and almost on a daily basis stumble on cool things in a vast array of disciplines that I didn't know about.

Take, for example, the fact that yesterday I learned about a language I'd never heard of before, belonging to an entire language family I'd never heard of before.  Illustrating, perhaps, that even at the master's degree level, my study of linguistics had already narrowed to the point of excluding all but a tiny fraction of what's out there (my study focused primarily on Scandinavian and Celtic languages; my only real work in a non-Indo-European language has been my recent attempts to learn some Japanese).  But this odd language I found out about has a curious history -- and a possible connection to another language family, on the opposite side of the world.

The language is called Ket, and is spoken by a small number -- estimates are between fifty and two hundred -- people in the remote region of Krasnoyarsk Krai in central Siberia.  It is the sole surviving member of the Yeniseian language family; the last speaker of the related language called Yugh died in 1970, and other members of the Yeniseian family, Kott, Arin, Assan, and Pumpokol, were all extinct by the mid-nineteenth century.

A Ket family, circa 1900 [Image is in the Public Domain]

Here's where it gets interesting, though.  There's some evidence that Ket and the other Yeniseian languages are related to the language spoken by the Xiongnu Confederation, a group of interrelated nomadic peoples who dominated the east Eurasian steppes -- what are now parts of Siberia, Mongolia, and northern China -- from the third century B.C.E. to the first century C.E.  And one hypothesis is that when the Xiongnu Confederation fell to pieces, in part because of a climatic shift that led to severe drought, they upped stakes and moved west, where they became known to history as...

... the Huns.

So an obscure language currently spoken by under two hundred people may be the closest surviving cousin of the language spoken by one of the most feared warrior people ever, who made it all the way to what is now eastern France before finally being defeated.

But it gets weirder still.  Because linguistic analysis has suggested one other possible relative of Ket -- the Na Dene languages of western North America, including Athabaskan, Tlingit, Eyak, and Navajo.  Linguist Bernard Comrie calls it "the first demonstration of a genealogical link between Old World and New World language families that meets the standards of traditional comparative historical linguistics."  Supporting this is a study by Edward Vajda of Western Washington University finding that the Q1 Y-chromosome haplogroup is extremely common in Na Dene speakers, and close to universal amongst the Ket -- but is found almost nowhere else in Eurasia.

How the Ket (and the other Yeniseian speakers) got where they are is a matter of conjecture.  One possibility is that the ancestors of the Yeniseians (including, possibly, the Xiongnu and the Huns) were left behind when the ancestors of today's North American Na Dene speakers crossed Beringia into Alaska during the last Ice Age.  Other anthropologists believe that the split occurred later, as some of the North American migrants crossed back into what is now Siberia, and got stranded there when the seas rose.  It's hard to imagine what evidence could settle this conclusively; but the relationship between the Yeniseian languages and the Na Dene languages, along with the highly suggestive DNA connection, seems to support a relationship between those two now-widely-separated groups.  However the walkabout happened, it's left its fingerprint in three different continents.

So there you have it.  A link between the Huns, the Navajo, and a tiny and declining group of Siberians.  That's our excursion into linguistics for today.  Tomorrow it might be astronomy or geology or archaeology or meteorology or, perhaps, ghosts and Bigfoots or whatnot.  You never know.  I presume you must on some level enjoy my random musings, or you wouldn't be here.  Even if I might well "lack focus, drive, and brains," I still have more fun jumping from topic to topic than I would if I'd buckled down and focused on one cubic centimeter of the universe.

Here's to being a generalist!

****************************************


Monday, December 30, 2024

Root and branch

Linguists estimate that there are a little over seven thousand languages spoken in the world, sorted into around four hundred language families (including linguistic isolates, languages or language clusters that appear to be related to no other known languages).

As a historical linguist, one of the most common questions I've been asked is if, ultimately, all of those languages trace back to a common origin.  Or, perhaps, did disparate groups develop spoken language independently, so there is no single "pre-Tower-of-Babel" language (if I can swipe a metaphor from the Bible)?  The honest answer is "we don't know."  Determining the relationships between languages -- their common ancestry, as it were -- is tricky business, and relies on more than chance similarity between a few words.  My own area of research was borrow words in Old English and Old Gaelic (mostly from Old Norse), a phenomenon that significantly complicates matters.  English has an unfortunate habit of appropriating words from other languages -- a selective list of English vocabulary could easily lead the incautious to the incorrect conclusion that it originated from Latin, for example.  (In the preceding sentence, the words unfortunate, habit, appropriating, language, selective, vocabulary, incautious, incorrect, conclusion, originated, and example all come directly from Latin.  As do preceding, sentence, and directly.  So none of those are original to English -- they were adopted by scholars and clerics between the thirteenth and sixteenth centuries C.E.)


As you might expect, the longer two languages have been separate, the further they diverge, not only because they borrow words from (different) neighboring languages but because of random changes in pronunciation and syntax.  There's a good analogy here to biological evolution; the process is much like the effect that mutations have in evolution.  Closely-related species have very similar DNA; extremely distantly-related ones, like humans and apple trees, have very few common genes, and it's taken a great deal of detailed analysis to show that all life forms do have a single common ancestor.

That feat has not yet been accomplished with language evolution.  Finnish and Swahili may have a common ancestor, but if so, they've been separate for so long that all traces of that relationship have been erased over time.

Even with groups of languages with a more recent common ancestor, it can remarkably difficult to piece together what their relationship is.  For Indo-European languages, surely the most studied group of languages in the world, we're still trying to figure out their family tree, and aligning it with what is known from history and archaeology.  This was the subject of a study out of the University of Copenhagen that was published last week, and looked at trying to reconcile the language groups in southern and western Europe with what we now know from genetic studies of ancient bones and teeth.

[Nota bene: the Germanic and Slavic peoples were not part of this study; the current model suggests that 
Germanic groups are allied to the neolithic northern Corded Ware and Funnelbeaker Cultures, which appear to have originated in the steppes of what are now western Russia and Ukraine; the Slavs came in much later, probably from the region between the Danube River and the Black Sea.]

The study found a genetic correlation between speakers of the Italo-Celtic language cluster (Italian, Spanish, French, Portuguese, Catalan, Occitan, and Romanian; Irish, Scottish Gaelic, Manx, Cornish, Breton, and Welsh) and one between speakers of the Greco-Armenian cluster (Greek, Cypriot, Albanian, and Armenian).  The southern branch of the Corded Ware culture seems to have undergone two influxes from the east -- one from the Bell Beaker Culture, starting in around 2800 B.C.E. (so called because of the characteristically bell-shaped ceramic drinking vessels found at their settlement sites), which ended up migrating all the way to the Iberian Peninsula, and the other from the Yamnaya, which came from the Pontic steppe but never got past what is now Switzerland and eastern Italy (most of them didn't even get that far).

It's tempting to overconclude from this; just like my earlier example of Latin borrow words in English, the genetic correlation between the Italo-Celtic and Greco-Armenian regions doesn't mean that the differences we see in those two branches of the Indo-European language family come from the Bell Beaker people and the Yamnaya, respectively.  The lack of early written records for most of these languages means that we don't have a good "fossil record" of how and when they evolved.

But the current study provides some tantalizing clues about how migration of speakers of (presumably) two different dialects of Proto-Indo-European may have influenced the evolution of the western and eastern branches of today's Indo-European languages.

So it's one step toward finding the common roots of (most) European languages.  Even if we may never settle the question of how they're related for certain, it's cool that they're using the techniques of modern genetics to find out about where our distant ancestors came from -- and what languages they may have spoken.

****************************************

Monday, April 8, 2024

The relic

The first thing I learned in my studies of linguistics is that languages aren't static.

It's a good thing, because my field is historical linguistics, and if languages didn't change over time I kind of wouldn't have anything to study.  There's an ongoing battle, of course, as to how much languages should change, and what kinds of changes are acceptable; this is the whole descriptivism vs. prescriptivism debate about which I wrote only last month.  My own view on this is that languages are gonna change whether you want them to or not, so being a prescriptivist is deliberately choosing the losing side -- but if lost causes are your thing, then knock yourself out.

Where it gets interesting is that the rates of language change can vary tremendously.  Some cultures are inherently protective of their language, and resist things like borrow words -- a great example is Icelandic, which has changed so little in a thousand years that modern Icelanders can still read the Old Norse sagas with little more difficulty than we read Shakespeare.

Speaking of Shakespeare, it bears mention that the language of Shakespeare and his contemporaries isn't (as I heard some students call it) "Old English."  Old English is an entirely different language, not mutually intelligible with Modern English, and by Shakespeare's time had been an extinct language for about four hundred years.  Here's a sample of Old English:

Fæder ure şu şe eart on heofonum, si şin nama gehalgod.  To becume şin rice, gewurşe ğin willa, on eorğan swa swa on heofonum.

I wonder how many of you recognized this as the first two lines of the Lord's Prayer:

Our Father, who art in heaven, hallowed be thy name.  Thy kingdom come, thy will be done on Earth as it is in heaven.

There's been a discussion going on in linguistic circles for years about which dialect of English has changed the least -- not since the time of Old English, but at least since Elizabethan English, the dialect of Shakespeare's time.  We have a tendency, largely because of some of the famous performances of Hamlet and Macbeth and Richard III, to imagine Shakespeare's contemporaries as speaking something like the modern upper-class in southeastern England, but that's pretty clearly not the case.  Analyses of the rhyme and rhythm schemes of Shakespeare's sonnets, for example, suggest that Shakespearean English was rhotic -- the /r/ in words like far and park were pronounced -- while the speech of southern England today is almost all non-rhotic.  Vowels, too, were probably different; today a typical English person pronounces words like path with an open back unrounded vowel /ɑ/ (a bit like the vowel in the word cop); in Shakespeare's time, it was probably closer to the modern American pronunciation, with a front unrounded vowel /æ/ (the vowel sound in cat).

Analysis of spoken English from dozens of different regions has led some linguists to conclude -- although the point is still controversial -- that certain Appalachian dialects, and some of the isolated island dialects of coastal North and South Carolina, are the closest to the speech of Shakespeare's day, at least in terms of pronunciation.  Vocabulary changes according to the demands of the culture -- as I said, there's no such thing as a static language.

[Image licensed under the Creative Commons Alumnum, Primary Human Languages Improved Version, CC BY-SA 4.0]

The reason all this comes up is that linguists have come upon another example of a dialect that preserves a relic dialect -- this one, from a great deal longer ago than Elizabethan English.  In the region of Trabzon in northern Turkey, there is a group of people who speak Romeyka -- a dialect of Pontic Greek that is thought to have changed little since the region was settled from classical-era Greece over two thousand years ago.

Since that time, Romeyka has been passed down orally, and its status as a cultural marker meant that like Icelandic, it has been maintained with little change.  Modern Greek, however, has changed a great deal in that same time span; in terms of syntax (and probably pronunciation as well), Romeyka is closer to what would have been spoken in Athens in Socrates's time than Modern Greek is.  "Conversion to Islam across Asia Minor was usually accompanied by a linguistic shift to Turkish, but communities in the valleys retained Romeyka," said Ioanna Sitaridou, of the University of Cambridge, who is heading the study.  "And because of Islamization, they retained some archaic features, while the Greek-speaking communities who remained Christian grew closer to Modern Greek, especially because of extensive schooling in Greek in the nineteenth and early twentieth centuries...  Romeyka is a sister, rather than a daughter, of Modern Greek.  Essentially this analysis unsettles the claim that Modern Greek is an isolate language."

The problem facing the researchers is that like many minority languages, Romeyka is vanishing rapidly.  Most native speakers of Romeyka are over 65; fewer and fewer young people are learning it as their first language.  It's understandable, of course.  People want their children to succeed in the world, and it's critical that they be able to communicate in the majority language in schools, communities, and jobs.

But the loss of any language, especially one that has persisted virtually unchanged for so long, still strikes me as sad.

It's a consolation, though, that linguists like Ioanna Sitaridou are working to record, study, and preserve these dwindling languages before it's too late.  Especially in the case of a language like Romeyka, where there is no written form; without recordings and scholarly studies, once it's gone, it's gone.  How many other languages have vanished like that, without a trace -- when no more children are being raised to speak it, when the last native speaker dies?  It's the way of things, I suppose, but it's still a tragedy, a loss of the way of communication of an entire culture.

At least with Romeyka, we have people working on its behalf -- trying to find out what we can of a two-thousand-year-old linguistic relic from the time of Alexander the Great.

****************************************



Friday, February 23, 2024

The language of Sark

The title of my master's thesis was The Linguistic and Cultural Effects of the Viking Invasions on England and Scotland.  I don't think many people read it other than me and my committee, but it did win the 1996 International Prize For Research With Absolutely No Practical Applications Whatsoever.  And it allowed me to learn valuable information such as the fact that there were two words in eleventh-century England for window -- one from Old English (eagþyrl, literally "eye-hole") and one from Old Norse (vindauga, literally "wind-eye") -- and for some reason the Old Norse one won and our word window comes from it rather than from Old English.

Which is a handy "fun fact" for me to bring out at cocktail parties, especially if I want everyone to back away slowly and then find other people to talk to for the rest of the evening.

In any case, I spent a good bit of my time in graduate school learning assorted random facts about western European linguistics, which was why I was a bit gobsmacked when I found out that there's a language in western Europe that I had never even heard of.  It's called Sarkese, and is only found on the tiny (1.5 by 3.5 kilometers) island of Sark, east of Guernsey in the Channel Islands.

The Channel Islands [Image licensed under the Creative Commons Aotearoa, Wyspy Normandzkie, CC BY-SA 3.0]

Sark is currently home to five hundred people, of whom only three learned Sarkese (known colloquially as patois) as their first language.  It's a Romance language -- the closest relative is French, but it's not mutually intelligible.  It came originally from medieval Norman French via the isle of Jersey; the ancestors of the people of Sark came over from Jersey in 1565 and it's been relatively isolated ever since.

The samples of Sarkese in the article I linked above illustrate how far the two have diverged in the close to a thousand years since it split from mainland French.  "Thank you very much," for example -- merci beaucoup in French -- is mérsî ben dê fê in Sarkese.  French has seventeen different vowel phonemes; Sarkese has over fifty.  Add to that the complication that the island is shaped like an hourglass, with a narrow isthmus (La Coupée) that is all but impassible during storms, and the two pieces (Big Sark and Little Sark) have different dialects.

Fortunately, a Czech linguist, Martin Neudörfl, is trying to document Sarkese, and has worked with the three remaining fluent speakers -- who are all over eighty years old -- and about fifteen semi-fluent individuals to produce a huge library of recordings, and reams of documents describing the morphology and syntax of Sarkese.  "We have hundreds of hours [of recordings] and our audio archive is outstanding," Neudörfl said.  "Even if I were to disappear, someone could revive the language just using the recordings.  We've only achieved this through years of exhaustive research.  It's all thanks to [the speakers] for sharing their knowledge."

It's always sad when a language goes extinct, and so many have done so without anyone ever recording them or writing them down.  In large part it's due to competition with more widely spoken languages; it's eye-opening to know that half of the world's individuals are native speakers of only fifteen different languages.  The other half speak one of the other seven-thousand-odd languages that currently exist in the world.  Sarkese is one of many languages that have fallen prey to the prevalence, convenience, and ubiquity of English.

On the one hand, I get why it happens.  If you want to be understood, you have to speak a language that the people around you can understand, and if you only spoke Sarkese you could communicate with eighteen other people on the island (and one Czech linguist).  But still, each language represents a trove of knowledge about the culture and history of a people, and it's a tragedy when that is lost.

So kudos to Martin Neudörfl, and the Sarkese speakers who are working with him to record this language before it's too late.  Makes me wish I'd tackled a project like this for my master's research.  I could be wrong, but I don't think Old Norse is coming back any time soon.

****************************************



Monday, September 11, 2023

Escapees from Siberia

As you might expect from someone who is passionately interested in genealogy, linguistics, and evolutionary genetics, when there's a study that combines all three, it's a source of great joy to me.

This was my reaction to a study in Nature on the evolutionary history of humans in northern Europe, specifically the Finns.  Entitled, "Ancient Fennoscandian Genomes Reveal Origin and Spread of Siberian Ancestry in Europe," it was authored by no less than seventeen researchers (including Svante Pääbo, the Nobel Prize-winning Swedish biologist who is widely credited as founding the entire science of paleogenetics) from the Max Planck Institute, the University of Helsinki, the Russian Academy of Sciences, the Vavilov Institute for General Genetics, and the University of Turku.

Quite a collaborative effort.

It's been known for a while that Europe was populated in three broad waves of settlement.  First, there were hunter-gatherers who came in as early as forty thousand years ago, and proceeded not only to hunt and gather but to have lots of hot caveperson-on-caveperson sex with the pre-existing Neanderthals, whose genetic traces can be discerned in their descendants unto this very day.  Then, there was an agricultural society that came into Europe from what is now Turkey starting around eight thousand years ago.  Finally, some nomadic groups -- believed to be the ancestors of both the Scythians and the Celts -- swept across Europe around 4,500 years ago.

Anyone with European ancestry has all three.  Despite the genetic distinctness of different ethnic groups -- without which 23 & Me genetic analysis wouldn't work at all -- there's been enough time, mixture, and cross-breeding between the groups that no one has ancestry purely from one population or another.

Which, as an aside, is one of the many reasons that the whole "racial purity" crowd is so ridiculous.  We're all mixtures, however uniform you think your ethnic heritage is.  Besides, racial purity wouldn't a good thing even if it were possible.  That's called inbreeding, and causes a high rate of homozygosity (put simply, you're likely to inherit the same alleles from both your mother and father).  This causes lethal recessives to rear their ugly heads; heterozygous individuals are protected from these because the presence of the recessive allele is masked by the other, dominant (working) copy.  It's why genetic disorders can be localized to different groups -- cystic fibrosis in northern Europeans, Huntington's disease in people whose ancestry comes from eastern England, sickle-cell anemia from sub-Saharan Africa, Tay-Sachs disease in Ashkenazic Jews, and so on.

So mixed-ethnic relationships are more likely to produce genetically healthy children.  Take that, neo-Nazis.

Map of ethnic groups in Europe, ca. 1899  [Image is in the Public Domain]

In any case, the current paper looks at the subset of Europeans who have a fourth ancestral population -- people in northeastern Europe, including Finns, the Saami, Russians, the Chuvash, Estonians, and Hungarians.  And they found that the origin of this additional group of ancestors is all the way from Siberia!

The authors write:
[T]he genetic makeup of northern Europe was shaped by migrations from Siberia that began at least 3500 years ago.  This Siberian ancestry was subsequently admixed into many modern populations in the region, particularly into populations speaking Uralic languages today.  Additionally... [the] ancestors of modern Saami inhabited a larger territory during the Iron Age.
The coolest part is that this lines up brilliantly with what we know about languages spoken in the area:
The Finno-Ugric branch of the Uralic language family, to which both Saami and Finnish languages belong, has diverged from other Uralic languages no earlier than 4000–5000 years ago, when Finland was already inhabited by speakers of a language today unknown.  Linguistic evidence shows that Saami languages were spoken in Finland prior to the arrival of the early Finnish language and have dominated the whole of the Finnish region before 1000 CE. Particularly, southern Ostrobothnia, where Levänluhta is located, has been suggested through place names to harbour a southern Saami dialect until the late first millennium, when early Finnish took over as the dominant language.  Historical sources note Lapps living in the parishes of central Finland still in the 1500s.  It is, however, unclear whether all of them spoke Saami, or if some of them were Finns who had changed their subsistence strategy from agriculture to hunting and fishing.  There are also documents of intermarriage, although many of the indigenous people retreated to the north...  Ancestors of present-day Finnish speakers possibly migrated from northern Estonia, to which Finns still remain linguistically close, and displaced but also admixed with the local population of Finland, the likely ancestors of today’s Saami speakers.
Which I think is pretty damn cool.  The idea that we can use the genetics and linguistics of people today, and use it to infer migratory patterns back forty thousand years, is nothing short of stunning.

Unfortunately, however, I have zero ancestry in Finland or any of the other areas the researchers were studying.  According to 23 & Me, my presumed French, Scottish, Dutch, German, and English ancestry was shown to be... French, Scottish, Dutch, German, and English.  No surprise admixtures of genetic information from some infidelity by my great-great-grandmother with a guy from Japan, or anything.

On the other hand, I did have 284 markers associated with Neanderthal ancestry.  Probably explaining why I like my steaks medium-rare and run around more or less naked when the weather's warm.  Which I suppose makes up for my lack of unexpected ethnic heritage.

****************************************



Monday, December 12, 2022

The origins of Thule

There's a logical fallacy called appeal to authority, and it's trickier than it sounds at first.

Appeal to authority occurs when you state that a claim is correct solely because it was made by someone who has credentials, prestige, or fame.  Authorities are, of course, only human, and make mistakes just like the rest of us, so the difficulty lies in part with the word "solely."  If someone with "M.S., Ph.D." after their name makes a declaration, those letters alone aren't any kind of argument that what they've said is correct, unless they have some hard evidence to back them up.

There's a subtler piece of this, though, and it comes in two parts.  The first is that because scientific research has become increasingly technical, jargon-dense, and specialized, laypeople sometimes are simply unqualified to evaluate whether a claim within a field is justified.  If Kip Thorne, Lee Smolin, or Steven Weinberg were to tell me about some new discovery in theoretical physics, I would be in well over my head (despite my B.S. in physics) and ridiculously out of line to say, "No, that's not right."  At that point, I don't have much of a choice but to accept what they say for the time -- and hope that if it is incorrect, further research and the peer-review process will demonstrate that.  This isn't so much avoiding appeal to authority as it is accepting that bias as an inevitable outcome of my own incomplete knowledge.

The second problem is that sometimes, people who are experts in one field will make statements in another, cashing in on their fame and name recognition to give unwarranted credence to a claim they are unqualified to make.  A good, if disquieting, example of this is the famous molecular geneticist James Watson.  As the co-discoverer of both the double-helical structure of the DNA molecule and the genetic code, anything he had to say about genetic biochemistry should carry considerable gravitas.  On the other hand, he's moved on to making pronouncements about (for example) race that are nothing short of repellent -- including, "I am inherently gloomy about the prospect of Africa [because] all our social policies are based on the fact that their intelligence is the same as ours, whereas all the testing says not really."  Believing this statement "because James Watson said it, and he's a famous scientist" is appeal to authority at its worst.  In fact, he is wildly unqualified to make any such assessment, and the statement reveals little more than the fact that he's an asshole.  (In fact, in 2019 that statement and others like it, including ones reflecting blatant sexism, resulted in Watson being stripped of all his honorary titles by Cold Springs Harbor Laboratory.)

My point here is that appeal to authority is sometimes difficult to pin down, which is why we have to rely on knowledgeable people policing each other.  Which brings us to philologist Andrew Charles Breeze.

Breeze has been a professor of philology at the University of Navarra for thirty-five years, and is a noted scholar of the classics.  His knowledge of Celtic languages, especially as used in ancient Celtic literature, is superb.  But he's also, unfortunately, known for his adherence to hypotheses based on evidence that is slim at best.  One example is his claim that the beautiful Welsh legend cycle The Mabinogion was written by a woman, Gwenllian ferch Gruffydd, daughter of Gruffydd ap Cynan, Prince of Gwynedd.  This claim has proven controversial to say the least.  He also has championed the idea that King Arthur et al. lived, fought, and died in Strathclyde rather than in southwestern England, a claim that has been roundly scoffed at.  Even Arthur's existence is questionable, given that his earliest mention in extant literature is Nennius's Historia Brittonum, which was written in 830 C.E., four hundred years after Arthur was allegedly King of the Britons.  As far as where he lived -- well, it seems to me that establishing if he lived is the first order of business.  

But even making the rather hefty assumption that the accounts of Nennius are true, we still have a problem with Breeze's claim.  Arthur's enemies the Saxons didn't really make any serious incursions into Strathclyde until the early seventh century, so an Arthur in Strathclyde would be in the position of fighting the Battle of Badon Hill against an enemy who wasn't there at the time. 

Awkward.

Anyhow, my point is that Breeze kind of has a reputation for putting himself out on the edge.  Nothing wrong with that; that's why we have peer review.  But I also have to wonder about people who keep making claims with flimsy evidence.  You'd think they'd become at least a little more cautious.

Why this comes up is that Breeze just made yet another claim, and this one is on a topic about which I'm honestly qualified to comment in more detail.  It has to do with the origin of the word "Thule."  You probably know that Thule is the name given in classical Greek and Roman literature to the "most northern place."  It was written in Greek as Θούλη, and has been identified variously as the Faeroe Islands, the Shetland Islands, northern Scotland, Greenland, Iceland, Norway, Finnish Lapland, an "area north of Scythia," the island of Saaremaa (off the coast of Estonia), and about a dozen other places.  The problem is -- well, one of many problems is -- there's no archaeological or linguistic evidence that the Greeks ever went to any of those places.  In the absence of hard evidence, you could claim that Thule was on Mars and your statement would carry equivalent weight.

Another difficulty is that even in classical times, the first source material mentioning Thule, written by Pytheas of Massalia, was looked at with a dubious eye.  The historian Polybius, writing only a century and a half after Pytheas's time, scathingly commented, "Pytheas... has led many people into error by saying that he traversed the whole of Britain on foot, giving the island a circumference of forty thousand stadia, and telling us also about Thule, those regions in which there was no longer any proper land nor sea nor air, but a sort of mixture of all three of the consistency of a jellyfish in which one can neither walk nor sail, holding everything together, so to speak."

Well, Breeze begs to differ.  In a recent paper, he said that (1) Thule is for sure Iceland, and (2) the Greeks (specifically Pytheas and his pals) got to Iceland first, preceding the Vikings by a thousand years.

[Image is in the Public Domain]

Bold claim, but there are a number of problems with it.

First, he seems to be making this claim based on one thing -- that the Greek word for Thule (Θούλη) is similar to the Greek word for altar (θῠμέλη), and that the whole thing was a transcription error in which the vowel was changed (ού substituted for ῠ) and the middle syllable (μέ) dropped.  Well, this is exactly the kind of thing I specialized in during my graduate studies, and I can say unequivocally that's not how historical linguistics works.  You can'd just jigger around syllables in a couple of words and say "now they're the same, q.e.d."  

He says his idea is supported by the fact that from the sea, the southern coast of Iceland looks kind of like an altar:

The term Thymele may have arisen from the orographic features of the south of the island, with high cliffs of volcanic rock, similar to that of Greek temple altars.  Probably, when Pytheas and his men sighted Iceland, with abundant fog, and perhaps with columns of smoke and ashes from volcanoes like Hekla, he thought of the altar of a temple.

This is what one of my professors used to call "waving your hands around in the hopes of distracting the audience into thinking you have evidence."  Also, the geologists have found evidence of only one major eruption in Iceland during Pytheas's lifetime -- the Mývatn eruption in around 300 B.C.E. -- and it occurred in the north part of Iceland, over three hundred kilometers from the southern coast of the island.

Oops.

Another thing that makes me raise an eyebrow is where the paper is published -- the Housman Society Journal, which is devoted to the study of the works of British classicist and poet A. E. Housman.  If Breeze's claim was all that and a bag of crisps, why hasn't it been published in a peer-reviewed journal devoted to historical linguistics?

Third, there's another classical reference to Thule that puts Breeze's claim on even thinner ice, which is from Strabo's Geographica, and states that when Pytheas got to Thule, he found it already thickly inhabited.  There is zero evidence that Iceland had any inhabitants prior to the Vikings -- it may be that the Inuit had summer camps in coastal western Iceland, but that is pure speculation without any hard evidential support.  The earliest Norse writings about Iceland describe it as "a barren and empty land, devoid of people."  Despite all this, Strabo writes:

The people [of Thule] live on millet and other herbs, and on fruits and roots; and where there are grain and honey, the people get their beverage, also, from them.  As for the grain, he says, since they have no pure sunshine, they pound it out in large storehouses, after first gathering in the ears thither; for the threshing floors become useless because of this lack of sunshine and because of the rains.

Oops again.

I can say from experience that establishing linguistic evidence for contact between two cultures is difficult, requires rigorous evidence, and can easily be confounded by chance similarities between words.  My own work, which involved trying to figure out the extent to which Old Norse infiltrated regional dialects of Old English and Archaic Gaelic, was no easy task (and was made even more difficult by the fact that two of the languages, Old Norse and Old English, share relatively recent a common root language -- Proto-Germanic -- so if you see similarities, are they due to borrowing or parallel descent?  Sometimes it's mighty hard to tell).

I'm not in academia and I'm in no position to write a formal refutation of Breeze's claim, but I sure as hell hope someone does.  Historical linguistics is not some kind of bastard child of free association and the game of Telephone.  I've no doubt that Breeze's expertise in the realm of ancient Celtic literature is far greater than mine -- but maybe he should stick to that subject.

****************************************


Thursday, September 9, 2021

The voices of the Aztecs

When a region is conquered, one of the first things the conquerors usually do is to suppress (or explicitly outlaw) indigenous languages.

One reason is purely practical -- to eliminate the possibility that the subjugated group can communicate with each other without being understood.  The other, however, is more insidious.  Language is a huge part of culture, and if you want to destroy the native society (or, more accurately, replace it with your own, something euphemistically called '"assimilation"), you must eliminate the most vital part of that culture -- how its members communicate with each other, how they express poetry and ethnic history and local knowledge.

Destroy the language, and you've struck at the heart of the culture itself.

An excellent (if tragic) case in point is Australia.  It is the home of over three hundred languages, 170 of which are indigenous.  (One of the reasons why indigenous Australians dislike the word "Aborigine" about as much as Native Americans do "Indian;" it implies the wildly-incorrect assessment that the entire indigenous population is a single culture.)  What is appalling, though, is that even if you exclude English -- the most widely-spoken language in Australia -- none of the top-ten-most-spoken languages in Australia are indigenous.  (In order, they are: Mandarin, Arabic, Cantonese, Vietnamese, Greek, Italian, Tagalog, Hindi, Spanish, and Korean.)  Only a quarter of a percent of Australian citizens speak an indigenous language at home.  Of the 170 indigenous languages that still survive (i.e. with at least some native speakers), all but fifteen are classified as severely endangered, with virtually no one learning them as children.  All of the speakers of those remaining 155 unique languages are elderly, and with the passing of that generation, they'll be gone forever except as a curiosity amongst linguists.

Not all indigenous languages are in quite that bad a shape.  One somewhat more hopeful case is Nahuatl, the language of the pre-Spanish-conquest Aztecs in central Mexico.  The clash of the Spanish and native cultures in the Americas is rightly depicted as the worst of the worst -- between the conquering armies and the self-righteous (and often just as violent) Christian missionaries, only a few decades after conquest there usually wasn't much left of the original language, art, music, and religion.  In the case of central Mexico, however, the conquerors took a more nuanced approach, introducing the Latin alphabet but allowing native speakers to continue using their own language.  In fact, during the sixteenth and seventeenth centuries, the missionaries did a decent job writing Nahuatl grammars and dictionaries, and during that time there were hundreds of works written in the language, including administrative documents as well as poetry, stories, histories, and religious codices.  Most striking of all -- and, as far as I know, unique in the history of contact between conquerors and the conquered -- in 1536, only twenty years after the arrival of the Spanish, the Colegio de Santa Cruz de Tlatelolco was founded, where bilingual classes were offered to teach Nahuatl to the missionaries and Spanish to the natives.  It wasn't until 1696 that King Charles II of Spain outlawed Nahuatl, but by that time enough of the Mexican Spanish upper-crust spoke Nahuatl themselves that it was pretty much too late to do anything about it.

As a result, there are still 1.5 million speakers of Nahuatl in Mexico.  Not bad, considering the moribund nature of most of the indigenous languages in the world.

The reason this comes up is because of a discovery that was the subject of a paper in Seismological Research Letters a couple of weeks ago that was about the intersection between historical linguistics and another fascination of mine -- geology.  A recently-deciphered fifty-page codex in Nahuatl turns out to describe a series of massive earthquakes that hit central Mexico between 1460 and 1542, including one that triggered a flood resulting in the drowning of eighteen hundred warriors.

The codex itself was created by Aztec tlacuilos ("those who write with painting") and is made up of pictograms that predate the adoption of the Latin alphabet by speakers of Nahuatl.  One of the most striking is a combination of four projections like the vanes of a windmill around a central circle, followed by a rectangle filled with dots.  The windmill-like symbol is the pictogram for the word ollin, meaning "movement;" the rectangle is tlalli, meaning "earth."  Taken together, it means "earthquake."  Further, if the central circle is open, it indicates that the quake happened during the daytime, and if it's closed, it happened at night.

You can see the composite pictogram for "earthquake" in the lower right; all the way at the bottom is a depiction of the unfortunate warriors who drowned in the resulting flood.

As far as the timekeeping, the Aztecs -- like many Central American cultures -- were obsessive about the calendar, and had a 52-year calendrical cycle represented by the arrangement of four symbols -- tecpatl (knife), calli (house), tochtli (rabbit) and acatl (reed) -- arranged in thirteen different permutations.  Decoding that system allowed researchers to figure out that the earthquake that killed the warriors took place in 1507.

At night.

It's simultaneously fascinating and sad how few of the world's cultures have left significant traces for us to study, and of course that's largely humanity's own fault.  For example, the campaign of suppression by the Romans two-and-a-half millennia ago eliminated virtually every last trace of Etruscan -- there are over thirteen thousand inscriptions in Etruscan known to archaeologists, and they've been able to decipher only a fraction of them.  I can only hope that the endangered languages of our own time are treated more kindly.  What a pity it would be if in three thousand years, of the estimated 6,500 languages currently spoken, the only ones our descendants will be able to read are Mandarin, English, Hindi, Spanish, French, and Arabic.

*********************************

My friends know, as do regular readers of Skeptophilia, that I have a tendency toward swearing.

My prim and proper mom tried for years -- decades, really -- to break me of the habit.  "Bad language indicates you don't have the vocabulary to express yourself properly," she used to tell me.  But after many years, I finally came to the conclusion that there was nothing amiss with my vocabulary.  I simply found that in the right context, a pungent turn of phrase was entirely called for.

It can get away with you, of course, just like any habit.  I recall when I was in graduate school at the University of Washington in the 1980s that my fellow students were some of the hardest-drinking, hardest-partying, hardest-swearing people I've ever known.  (There was nothing wrong with their vocabularies, either.)  I came to find, though, that if every sentence is punctuated by a swear word, they lose their power, becoming no more than a less-appropriate version of "umm" and "uhh" and "like."

Anyhow, for those of you who are also fond of peppering your speech with spicy words, I have a book for you.  Science writer Emma Byrne has written a book called Swearing Is Good for You: The Amazing Science of Bad Language.  In it, you'll read about honest scientific studies that have shown that swearing decreases stress and improves pain tolerance -- and about fall-out-of-your-chair hilarious anecdotes like the chimpanzee who uses American Sign Language to swear at her keeper.

I guess our penchant for the ribald goes back a ways.

It's funny, thought-provoking, and will provide you with good ammunition the next time someone throws "swearing is an indication of low intelligence" at you.  

[Note: if you purchase this book using the image/link below, part of the proceeds goes to support Skeptophilia!]