The world can be a noisy place. If you have ever walked around a busy city street, you have experienced being bombarded by sound coming at you from every angle. A car whizzes by, and you hear it as it approaches to your left and then disappears behind you. To your right and 3 stories up, a construction crew works on a new building and you hear the clang of metal beams. You hear live music coming from a cafe up ahead and to the left. If you were to close your eyes, you could probably do a pretty good job of describing where all of these sounds were coming from, but how?

As it turns out we have a number of mechanisms working for us in tandem in order to help us localize the sounds we hear. It also helps that as we go through our lives we learn by sonic experience. Places we’ve been, and things we’ve heard form a basis of comparison for the sounds we will hear later on.

Binaural Hearing Cues

We use what are called binaural hearing cues to localize the sounds happening around us. Without even thinking about it, our brain compares the signals coming from our left and right ears and helps us to make split second decisions about the location of sounds.

A binaural dummy head microphone records stereo sounds just like our ears hear them.

Interaural Time Difference

The first of these cues is called interaural time difference, or ITD. Imagine a sound originating from directly in from of you (point A in the image below). Given a clear path to your head, we would expect that sound to arrive at both of your ears at the same time. Our brain picks up on this similarity in time of arrival between our ears and we are able to reason that the sound came from directly in front of us (or behind, above or below us) based on this information. However, if sound comes from one side or the other it is a different story. Let’s say the sound is coming from the right side of the you (point B in the image below). In this case, sound will arrive at your right ear slightly sooner than it will arrive at your left ear. This time of arrival difference can be very small (0.6ms for the average sized head) but it is still significant enough for our brain to be able to pick up on it and reason that the sound came from that side.

Interaural Amplitude Difference

The next cue is called interaural amplitude difference, or IAD. IAD takes into consideration the amplitude, or more specifically the timbre of sounds arriving at our ears. Again, imagine a sound coming from directly in front of you (point A in the image below). With a clear path to your head, the sound should arrive at your two ears at the same amplitude and with the same timbre. But what if the sound comes from one side or the other? If the sound comes from the right of us (point B in the image below), it will arrive at the right ear unimpeded and relatively full spectrum. However, the sound that makes it to our left ear is partially absorbed by our head (especially the high frequencies which have shorter wavelengths than the width of our head). This is called the acoustic shadow effect and our brain picks up on it and tells us that the sound came from the right.

ITD tends to be most useful for low frequencies, whereas we utilize IAD for higher frequencies. Above 500 Hz, the human head starts to pose a significant boundary for sound.

These first two cues also form the basis for some classic stereo microphone techniques. For instance, near coincident techniques such as ORTF and NOS rely on differences in time of arrival and amplitude between the two mics in order to create a stereo image. It is a similar experience when we localize instruments within a mix played on loudspeakers.

Head Related Transfer Functions

The other mechanism by which we determine the location of the sounds around us is called Head Related Transfer Functions or HRTFs. HRTFs take into account the filtering effect created by our ears, heads and torso and are particularly helpful in vertical localization. Much like the acoustic shadow effect, our bodies themselves absorb the sounds moving around us. Furthermore, the many folds of our ear (or pinna) cause both reflection and absorption of sound as it makes its way to our ear drum. Even the resonance of our ear canal itself plays a part. All of these factors come together and the net effect is a filtering effect. It is as if for every angle that sound could come from, there is a slightly different EQ setting being imposed on the sound. As we go through life experiencing sounds (many of them moving sounds), we learn our own personalized HRTFs.

In Practice

These binaural hearing cues are used to great effect in a variety of music technologies. For instance, a binaural dummy head microphone takes advantage of all of these hearing cues by virtue of being shaped like a human head. If you listen back to a signal recorded with one of these microphones over headphones it should sound like you are experiencing the sounds first hand, with sound sources localizing outside of your head in the same positions they were relative to the dummy head. This technology is also being utilized in may 3D sound applications for ‘surround’ sound over headphones.