Discussion about this post

User's avatar
Daniel Reeves's avatar

Musk vs McGurk

I'm thinking more about my final Random Roundup item about Waymo vs Tesla and sensor fusion. There's a theoretical sense in which adding a sensor like lidar can't make a car more dangerous. At worst the car can ignore input from the lidar, if it can't figure out how to reconcile what the lidar vs the cameras are telling it. But of course it can and does reconcile it. Just like, presumably, Tesla's FSD reconciles the inputs from the many different cameras it has.

Consider how the human brain handles conflicting sensors, as in the mind-melting McGurk effect: https://www.youtube.com/watch?v=2k8fHR9jKVM

It turns out that your brain puts a lot of weight on its camera inputs aka your eyes. If your eyes see one thing while your ears are telling you something incompatible with that, your subconscious brain just overrides that auditory signal and feeds your conscious brain *different sounds* — sounds that are compatible with the visual signal. If that strikes you as blatantly, idiotically false then you'll definitely want to follow that link to a demonstration of the McGurk effect!

Could the McGurk effect be the key to steelmanning Musk's claim that multiple conflicting sensors can reduce safety? (Thanks GPT-5.1-Thinking again, for pointing this out.) After all, in the demonstration, your lying eyes (or the lying video, technically) cause you to hear a blatantly incorrect sound!

I think the answer is no, your brain is brilliantly, optimally fusing different inputs. Say you're chatting with someone in a loud room. Your brain combines what you hear with what you see the person's lips doing. You don't have to be able to read lips for the visual signal to help you disambiguate what you're hearing. Multiple inputs can give a big accuracy boost.

So then the McGurk effect is basically an adversarial exploit of that system. It works by contriving a scenario for which your brain has to make a choice about what you're hearing. It's pretty much impossible in the real world. The trick is to superimpose the audio for sound A onto the video for sound B. The prior probability on that is near zero. Your subconscious brain concludes, implicitly, that either your eyes or your ears are just wrong. Which to believe? Your eyes are giving you a high-fidelity, less ambiguous signal. The sound is more likely to be distorted. So that's what you, your conscious brain, thinks it hears: the "corrected" sound.

(Similar mind-meltingness happens strictly within your vision system too. Like the gaping hole near the middle of your field of vision where your optic nerve goes through your retina. Two eyes can cover for each other fill in the other eye's gap accurately. But even with one eye open, your brain just infers what you *ought* to be seeing in that hole and makes your conscious brain think you're seeing it, the same way an image diffusion model hallucinates plausible details.)

Back to Musk's claim that when sensors disagree it hurts accuracy, I'm claiming the McGurk effect is the exception that proves the rule. Literally: the McGurk failure shows what powerful sensor fusion the brain is capable of. It's making the optimal Bayesian update and causing you to perceive what is mostly like to be the truth. Only with the additional evidence of learning about video editing and the McGurk effect can you do any better.

(Except, no, just kidding, your subconscious brain refuses to make that update and the McGurk effect keeps right on working despite yourself. Oh well. It's still an amazing tradeoff, improving your hearing accuracy in all real-world situations at the tiny cost of being wrong in that one McGurk video.)

In conclusion, more sensors more better. Imagine you're a self-driving car getting conflicting information from cameras vs lidar about how far away a grand piano in the middle of the road is. Lidar's great at measuring distance so disregarding the cameras might be correct. Better yet, conservatively take the distance-to-piano number to be the min of the two. Or, in full generality, start with a prior probability distribution and update it repeatedly based on the evidence of all your input streams.

Related reading:

* https://www.lesswrong.com/w/predictive-processing

* https://en.wikipedia.org/wiki/Kalman_filter

* https://pubmed.ncbi.nlm.nih.gov/40569419/ (Apparently not everyone is on board with the Bayesian brain idea)

PS: Oh look, I ended up writing a 767-word comment. Now to decide if it's too cheap to count this for my Inkhaven post today. Or maybe I'll take a poll on how superfluous of an elaboration on the original bullet item this is. If enough people say "not superfluous" maybe it's worth repeating as next week's AGI Friday?

Expand full comment
Neural Foundry's avatar

Using LLMs as an uber thesaurus is such a smart framing. The line you draw between words and phrases feels right, its about keeping your voice intakt while using AI as a thinking partner. That bit about Christopher Moravecs dont be a secret cyborg nails it. The cringe factor when reading AI generated prose is real, even when technically correct it just feels hollow.

Expand full comment
3 more comments...

No posts