Skeptophilia: Václav Volhejn

Thursday, March 19, 2026

Whistling words

I've long been fascinated by the phenomenon of priming, where our interpretation of a sensory stimulus is altered by what we expected to see or hear. An excellent example of priming is this famous image:

If you've never seen this before, it's hard to see anything but black blotches. Once you realize it contains a Dalmatian dog -- his head and dark ear are right dead-center in the image -- you'll always see it. You can't go back to your previous state of blissful ignorance.

It works in the auditory realm, too. My wife and I are absolutely addicted to the wonderful British series The Great Pottery Throwdown, where a group of twelve amateur potters participate in a series of challenges and ultimately are whittled down to three finalists and a single winner. Carol and I are both potters -- I won't speak for her, but I can say with confidence that if I were on Throwdown I would be eliminated in the first round -- and it's astonishing what these artists can create given the demands and time constraints. (I also really enjoy how kind they are to each other. Although it's a competition, they help each other, and everyone seems genuinely heartbroken every time one of them gets sent home.) Well, we're re-watching one of the early seasons, and there's a young woman on the show with a pronounced Welsh accent. Even though I'm usually pretty good at understanding people from the UK, I'm baffled by something like half of what she says...

... until we turn on captioning. Then I have no problem. And it's not just that I'm reading along (although I certainly am) -- it really seems like her voice is much more understandable with that little bit of help.

The reason this comes up is a recent study by Cambridge University engineer Václav Volhejn, who is working with sine-wave speech, a voice simulation using a mixture of pure tones (sine waves). The result sounds like someone trying to imitate human speech using a slide whistle. (You can read how he creates the audio here.) If I close my eyes, I can barely get anything from it -- maybe a word here or there. But once I get the cues of what I was supposed to hear, suddenly it seems obvious. The effect lasts, too. If I turn off captioning and go back and listen to the audio again, I can still understand it nearly perfectly.

How this all works is not understood, but probably has something to do with how our brain accomplishes recall. A 1994 study found that we're primed to recognize words faster if we have prior exposure to semantically-related words; shown the word dog, for example, we recognize the word wolf more quickly than if we're presented it without the prime. We're also primed to anticipate -- and therefore more quickly recognize -- words that are commonly found in association (lot would be primed by parking), or words that have similar sounds even if they're semantically unrelated (ground would be primed by round). That it has something to do with the brain's recall network is supported by research suggesting that priming effects vanish very early in the development of dementia; apparently even before significant cognitive impairment occurs, dementia patients lose their ability to make these kinds of efficient associations.

What's strangest, though, is that you can be primed two different ways with equal strength. This article from Stranger Dimensions contains an audio clip of sine-wave speech that can be primed to sound like either green needle or brainstorm -- which have almost nothing in common phonetically, and don't even have the same number of syllables. Which you hear depends on which text you're looking at, and if you're like me you can go back and forth indefinitely, from exactly the same audio input.

Then there's the McGurk effect, where what we see actually overrides what we hear so completely that it can cause us not to understand what's coming in through our ears. The two syllables ba and va sound a great deal alike, but the first sound differs in how it's produced; /b/ is a voiced bilabial stop, /v/ a voiced labiodental fricative. But when we see someone's mouth moving in an audio/video clip that's been altered to make it look like he's saying va when he's actually saying ba, we hear va. It's absolutely convincing. Somehow, we're primed by seeing his mouth move -- explaining why it's always easier to understand someone face to face than on the telephone.

All of this is further evidence of a point I've made many times here at Skeptophilia; what you perceive is incomplete, inaccurate, and dependent on a great many external and internal conditions that can change from one moment to the next. "I know it happened that way, I saw it with my own eyes!" is fairly close to nonsense. Oh, sure; for most of us, our sensory-perceptual systems work well enough to get by on. But the idea that what we seem to perceive is some kind of perfect transcription of reality is simply wrong.

It's humbling and a little frightening how easily fooled we are, but the implications for how our brain retrieves stored information are absolutely fascinating. So even if we should be a little more careful about acting certain of the accuracy of our own perceptions and memories, it does open the window on how our brains make sense of the world we live in.

****************************************