How the cochlea computes (2024)
The thesis about human speech occupying less crowded spectrum is well aligned with a book called "The Great Animal Orchestra" (https://www.amazon.com/Great-Animal-Orchestra-Finding-Origin...).
That author details how the "dawn chorus" is composed of a vast number of species making noise, but who are able to pick out mating calls and other signals due to evolving their vocalizations into unique sonic niches.
It's quite interesting but also a bit depressing as he documents the decline in intensity of this phenomenon with habitat destruction etc.
> A Fourier transform has no explicit temporal precision, and resembles something closer to the waveforms on the right; this is not what the filters in the cochlea look like.
Perhaps the ear does someting more vaguely analogous to a discrete Fourier transforms on samples of data, which is what we do in a lot of signal processing.
In signal processing, we take windowed samples, and do discrete transforms on these. These do give us some temporal precision.
There is a trade off there between frequency and temporal precision, analgous to the Pauli exclusion principle in quantum mechanics. The better we know a frequency, the less precisely we know the timing. Only an infinite, periodic signal has a single precise frequency (or precise set of harmonics) which are infinitely narrow blips in the frequency domain.
The continuous Fourier transform deals with periodic signals only. We transform an entire function like sin(x) over the entire domain. If that domain is interpreted as time, we are including all of eternity, so to speak from negative infinite time to positive.
Nit: It’s an unfortunate confusion of naming conventions, but Fourier Transform in the strictest sense implies an infinite “sampling” period, while the finite “sample” period counterpart would correspond to Fourier Series even though we colloquially refer to them interchangeably.
(I had put “sampling” in quotes as they’re actually “integration period” in this context of continuous time integration, though it would be less immediately evocative of the concept people are colloquially familiar with. If we actually further impose a constraint of finite temporal resolution so that it is honest-to-god “sampling” then it becomes Discrete Fourier Transform, of which the Fast Fourier Transform is one implementation of.)
It is this strict definition that the article title is rebuking, but it’s not quite what the colloquial usage loosely evokes in most people’s minds when we usually say Fourier Transform as an analysis tool.
So this article should have been comparing to Fourier Series analysis rather than Fourier Transform in the pedantic sense, albeit that’ll be a bit less provocative.
Regardless, it doesn’t at all take away from the salient points of this excellent article which are really interesting reframing of the concepts: what the ear does mechanistically is applying a temporal “weigting function” (filter) so it’s somewhere between Fourier series and Fourier transform. This article hits the nail on the head on presenting the sliding scale of conjugate domain trade offs (think: Heisenberg)
To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.
The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.
A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.
Wow, this discussion about how our ears work is mind-blowing! It's amazing how complex sound processing is, and the comparison to signal processing concepts is really illuminating.
Nice to see a video for the tip links and ion channels.
I spent a while reading up on that stuff because I was trying to figure what causes my tinnitus. My best guess is if the hairs over bend, that stuff can break and an ion channel get stuck open causing the cell to fire continually.
Another fun ear fact is they incorporate active amplification. You can hook an electrical signal to the loudspeaker type cell to make it vibrate around https://youtu.be/pij8a8aNpWQ
Just a warning that the video ends with a loud, high pitched tone that will make you want to rip your headphones off.
Ironic for a video about hearing.
This subject has bothered me for a long time. My question to guys into acoustics was always: If the cochlea performs some kind of Fourier transform, what are the chances, that it uses sinus waves as a base for the vector-space? - if it did anything like that it could just as good use any slightly different wave-forms as a base for transformation. Stiffness and non-linearity will for sure take care that any ideal rubber model in physics will in reality be different from the perfect sinus.
I've always thought the basilar membrane was a fascinating piece of biological engineering. Whether or not the difference between its behavior vs FT really matters depends on the context. Audio processing on a computer, FFT is often great. Trying to understand / model human sound perception, particularly in relation to time, FFT has weaknesses.
As the auditory associative cortex in parietal lobe discriminates frequencies, there must be some time-frequency transform between the ear and the brain. This must be discrete (as neurons fire in bursts and there is a finite frequency resolution capacity) and finite time.
The poor man's conversion of finite to equivalent infinite time is if you assume an infinite signal where the initial finite one is repeated infinately to the past and the future.
Somewhere here must lie the cure to tinnitus.
This is fascinating.
I know of vocoders in the military hardware that encode voices to resemble something more simple for compression (a low-tone male voice), smaller packets that take less bandwidth. This evolution of the ear to must also have evolved with our vocal chords and mouth to occupy available frequencies for transmission and reception for optimal communication.
The parallels with waveforms don't end there. Waveforms are also optimized for different terrains (urban, jungle).
Are languages organic waveforms optimized to ethnicity and terrain?
Cool article indeed.
man I need to finally learn what a Fourier transform is
"It appears that human speech occupies a distinct time-frequency space. Some speculate that speech evolved to fill a time-frequency space that wasn’t yet occupied by other existing sounds."
I found this quite interesting, as I have noticed that I can detect voices in high-noise environments. E.g. HF Radio where noise is almost a constant if you don't use a digital mode.
supplemental:
Neuroanatomy, Auditory Pathway
https://www.ncbi.nlm.nih.gov/books/NBK532311/
Cochlear nerve and central auditory pathways
https://www.britannica.com/science/ear/Cochlear-nerve-and-ce...
Molecular Aspects of the Development and Function of Auditory Neurons
FT is frequency domain representation.
neural signaling by action potential, is also a representation of intensity by frequency.
the cochlea is where you can begin to talk about bio-FT phenomenon.
however the format "changes" along the signal path, whenever a synapse occurs.
Tbh I used to think that it does. For example, when playing higher notes, it's harder to hear the out-of-tune frequencies than on the lower notes.
What does the continuous tingling of a hair cell sound like to the subject?
Many versions of this article could be written:
The computer does not do a Fourier transform (FFT computes the discrete Fourier transform)
Spectroscope dont do a Fourier transform (it's actually the short time FT)
The only thing that actually does Fourier transform is a mathematician, with a pen and some paper.
Why is there no box diagram for cochlea "between wavelet and Gabor" ?
Fourear transform
Spoiler: yes it does, but the author isn't familiar with how the term Fourier Transform is used in signal processing.
Man, I've been spreading disinformation for years.
OT: Does anyone here believe in Intelligent Design?
The title seems a little click-baity and basically wrong. Gabor transforms, wavelet transforms, etc are all generalizations of the fourier transform, which give you a spectrum analysis at each point in time
The content is generally good but I'd argue that the ear is indeed doing very Fourier-y things.
[dead]
[flagged]
If you want to get really deep into this, Richard Lyon has spent decades developing the CARFAC model of human hearing: Cascade of Asymmetric Resonators with Fast-Acting Compression. As far as I know it's the most accurate digital model of human hearing.
He has a PDF of his book about human hearing on his website: https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018_smaller...