Skeptophilia (skep-to-fil-i-a) (n.) - the love of logical thought, skepticism, and thinking critically. Being an exploration of the applications of skeptical thinking to the world at large, with periodic excursions into linguistics, music, politics, cryptozoology, and why people keep seeing the face of Jesus on grilled cheese sandwiches.
Showing posts with label Sanskrit. Show all posts
Showing posts with label Sanskrit. Show all posts

Tuesday, December 20, 2022

Language machines

If you've ever used Google Translate, you've probably noticed that it can be a little wonky.

Take, for example, the anecdote about the French guy who was wooing an American woman long-distance, and texted to her, "Prends une photo coquine pour moi."  ("Take a naughty picture for me.")  The woman wasn't certain what that meant, so she popped it into Google Translate, and was told it meant, "Take a photo for me, slut."

I think my favorite, though, is some feedback that a company called Koyu Matcha Green Tea received via their website, from a customer in Finland.  When they ran what the customer wrote through a Finnish-to-English Google Translate, it came out as the following:
If it resonated with cold to the bone?  Matcha Latté is guaranteed fireman, green tea with hot steamed milk.  Behold, thou hast already tasted.
Um... thanks?  We think?

The difficulty is that languages are complex entities, full of idioms and peculiarities and exceptions, so trying to find a mechanistic, totally rule-based way to characterize them is somewhere beyond tricky.  But because of the work of a Ph.D. student at the University of Cambridge, we have come one step closer to doing exactly that -- at least for Sanskrit.

About 2,500 years ago, a man named Dakṣiputra Pāṇini living in what is now northwestern Pakistan wrote a work called Aṣṭādhyāyī, which created a set of rules for the morphology -- the way words, prefixes, suffixes, and so on combine -- of the Sanskrit language.  An example of linking together these fragments, called morphemes, in English is the word incomprehensibly -- made up of in- (prefix meaning "not"), comprehend (stem of the word, altered to replace /d/ with /s/), -ible (suffix meaning "capable of"), and -ly (adverbial marker), in that order.

Imagine trying to come up with a list of rules for all the ways morphemes can combine in English, such that the rules only produced well-formed words and not garbled messes like iblecomprehendlyin.

That's what Pāṇini tried to do for Sanskrit.

The problem is that Pāṇini's rules seemed sometimes to lead to self-contradictions.  Given a particular combination of morphemes, there are often two or more rules that apply, so which should you use?  Linguists analyzing the rule-set discovered that Pāṇini had written a "metarule" -- a rule determining how other rules should be applied -- which said that if two rules seem to conflict, the "later rule should take precedence."  Everyone had interpreted this to mean that the one mentioned later in the book was the more important.

But that sometimes led to ungrammatical words.  So something was off, but what?

Enter Cambridge student Rishi Rajpopat, who had been toiling over Pāṇini's rules for months.  Then he had a brainstorm; what if the problem was that the metarule itself had been mistranslated?  He altered the metarule to read that if two rules are in conflict, the one that applies to the latter part of the word (the suffix) takes precedence over the one that applies to the first part of the word (the stem).

With that one change in interpretation, Pāṇini's rule system works to combine morphemes and produces grammatically-correct words almost one hundred percent of the time.

Which, of course, is a cause for much rejoicing amongst both linguists and people who are attempting to create high-quality translation software.

I wonder, though, how any such attempt would fare for English.  English is an amalgam of a Germanic root language, with heavy borrowing from French, Latin, and Spanish, and less-frequent (but still significant) borrowing from Old Norse, Italian, Greek, Dutch, Gaelic, and several Indigenous American languages.  This has introduced spellings, pronunciations, and morphologies that defy easy characterization.


Even some of the simple rules you learned in elementary school can't be applied with anything like real consistency.  "I before e except after c" -- unless your weird foreign neighbor Keith forfeits eight beige sleighs to a feisty caffeinated weightlifter.

You see the difficulty.

So as much as I'm impressed by Rajpopat's accomplishment, I don't think it's going to go very far toward fixing Google Translate's problem.

No matter.  The delight of being told the tea is so good it's "guaranteed fireman" makes up for any potential awkwardness incurred because you accidentally called your girlfriend an unpleasant name while attempting to initiate sexytimes.  You gotta take the good with the bad.

****************************************


Thursday, September 12, 2019

A new view of the Indus Valley

It's always fun when I stumble across some research that ties together three of my fascinations -- linguistics, genetics, and unsolved mysteries.

The research in question was published this week in Science, and gives us a new lens into the mysterious Indus Valley (or Harappan) Civilization.  This civilization, which started some time around 3,300 B.C.E. and lasted for a good two thousand years, flourished in what is now the western part of India and eastern part of Pakistan, producing massive cities, temples, a distinctive form of pottery, work in tin, bronze, lead, and copper, and a mysterious script that no one has been able to decipher (in fact, some linguists don't even believe it's a written language -- possibly just a set of non-linguistic symbols).

A Harappan seal with an example of the Indus Valley script [Image licensed under the Creative Commons PHGCOM IndusValleySeals.JPG, Indus seal impression, CC BY-SA 3.0]

Considering the extent of the artifacts and archaeological sites the civilization left behind, it's amazing how little we know for sure about them.  Their affiliations to other groups who were around at the same time (especially in the Middle East), what language they spoke, what religion they practiced -- all are inferences based on relatively scanty evidence.

This latest research adds a significant piece to what we know for sure about the mysterious Harappans.  The researchers who conducted it, a team made up of scientists from Washington University, Harvard Medical School, and the University of Vienna, looked at DNA extracted from 523 skeletons in the region dating all the way back to twelve thousand years ago.  The scientists were trying to shed light on two questions.  First, where did the Indus Valley Civilization's agricultural knowhow come from -- was it a local invention/innovation, or was it brought into a previous hunter-gatherer society by an influx of migrants?  And second, what is the origin of the languages spoken in the region then and now?

The answer to the first question seems to be that Harappans' agriculture was an innovation of their own.  The researchers found traces of DNA from contemporaneous farming cultures no nearer than Iran, and no evidence that they got any further than that.  So it seems like the Indus Valley transition from hunters to farmers was something they figured out for themselves.

What the research uncovered vis-à-vis the second question was that there was a DNA signature from European hunter-gatherers, but not as big as expected.  The usual linguistic model is that when there's a major language shift, it's usually caused by a large influx of migrants (consider the shifts from Native languages to English in Australia and the Americas).  Here, there was not nearly the amount of European and Middle Eastern DNA to explain the shift to an Indo-European language; the Eurasians who showed up there, the Yamnaya people, were apparently present in fairly small numbers.  What's fascinating, though, is that Yamnaya DNA is disproportionally present in modern-day Indians of the highest social classes -- since social class has traditionally been hereditary in Indian culture, the surmise is that the Indo-European speaking Yamnaya were in charge, and their language ended up superseding (or at least strongly influencing) the language(s) spoken at the time.

It's kind of analogous to the influence Norman French had on Old English in the years after the Norman Invasion in 1066 C. E.  Most of our terms that have to do with governance come from Latin via French, while a lot of the basic vocabulary (pronouns, prepositions, and so on) are from the original Germanic language.  Even more interesting is that the Norman Invasion left pairs of parallel words associated with food -- the one used for the animal as it's found on the farm is from the Old English peasants who raised them, and the one for the meat as it's seen on the table from Norman French aristocracy who only came in contact with the animal after it was cooked.  (Thus cow/beef, sheep/mutton, pig/pork, chicken/poultry, and so on.)  As the Indo-European influx into India happened five centuries earlier, you have to wonder if those kinds of word pairs existed for while there, too, eventually being swamped by the higher-prestige Indo-European verbiage.

So this research gives us one more piece of the puzzle regarding a group of people about whom we've known relatively little, despite their being ancestral to the vast majority of the population on the Indian Subcontinent.  And, of course, this is nowhere near the last word on the subject.  We'll continue to uncover more, and refine our understanding of the Harappans -- a civilization that has been gone for almost three thousand years.

********************************************

This week's Skeptophilia book recommendation is pure fun: science historian James Burke's Circles: Fifty Round Trips Through History, Technology, Science, and Culture.  Burke made a name for himself with his brilliant show Connections, where he showed how one thing leads to another in discoveries, and sometimes two seemingly unconnected events can have a causal link (my favorite one is his episode about how the invention of the loom led to the invention of the computer).

In Circles, he takes us through fifty examples of connections that run in a loop -- jumping from one person or event to the next in his signature whimsical fashion, and somehow ending up in the end right back where he started.  His writing (and his films) always have an air of magic to me.  They're like watching a master conjuror create an illusion, and seeing what he's done with only the vaguest sense of how he pulled it off.

So if you're an aficionado of curiosities of the history of science, get Circles.  You won't be disappointed.

[Note: if you purchase this book using the image/link below, part of the proceeds goes to support Skeptophilia!]