![](https://static.wixstatic.com/media/33c71b_250629970aed46dab28e39ac6a3e35eb~mv2.jpg/v1/fill/w_800,h_800,al_c,q_85,enc_auto/33c71b_250629970aed46dab28e39ac6a3e35eb~mv2.jpg)
The magic that lies behind spatialized audio is the use of head-related transfer functions (HRTFs). These are audible cues (or small changes in a sound) that allow our brains to spatialize sounds in our 3-dimensional world. Whether we’re aware of it or not, our brains are very good at noticing subtle changes in a sound as it travels from the source (for example, a dog barking) to our ears. The shape of our ears, the width of our heads, and our torsos all play a role in these cues. These subtle changes alert our brains to where the sound is coming from. There are three changes that can take place as a sound travels to our ears:
Interaural Time Difference (ITD): This is the slight delay of a sound when it reaches the first ear before reaching the second ear. If a dog barks to your right, the sound of the bark will reach your right ear a fraction of a second before it reaches your left ear. This cue is fundamental for your ear to know the direction of the sound source. However, this cue is not always accurate because of an area of ambiguity called the Cone of Confusion. This cone-shaped area is located to the left and right of our ears. Imagine an ice cream cone stuck in each ear with the cones extending infinitely left and right of your head. Any sound source that emanates from the edges of this cone have ambiguous ITDs that our brains have a hard time deciphering. Since ITDs aren’t always helpful, it is important for our brains to get different cues in order to determine a sound’s location.
Interaural Level Difference (ILD): This is the volume difference in the sound when it reaches each individual ear. The ear that is farthest from the sound will receive a quieter signal because of the sonic shadow our heads cast. For example, if the dog is to your right, it’s bark will be quieter when it reaches your left ear. Just like ITDs, ILDs are not always helpful alone, and require the help of other cues for key information on a sound’s location. ILDs are useful if we have an understanding about the sound we are hearing, because if we recognize the sound then we know how loud it should be when it’s close in proximity to us, and when its far in proximity. However, if it’s an unknown sound, we have no knowing of how it should sound close or far, therefore causing confusion and a reliance on other cues for more sonic information.
Interaural Phase Difference (IPD): This is the phase difference in the sound wave as it reaches each ear. It is essentially the same as ITDs, except more applicable to continuous sounds that are heard for a few seconds or longer. Though this is a very subtle change that is barely audible, it is observed to be effective for sound spatialization.
In the workflow of game audio, HRTFs can be used to apply these cues to your game sounds. As a result, the game sounds will mimic the way we hear sounds in the real world, and the listener can accurately pinpoint where sounds are coming from. At the time of writing, there are many 3rd party plugins that are available for Wwise and other middlewares that provide these HRTFs: Oculus Spatializer, Steam Audio, and MS HRTF. Unfortunately, these HRTFs don’t work for everyone because everyone has a different physical shape. Our brains learn to hear cues from our unique ears, heads, and torsos, so the basic library of HRTFs that most of these plugins pull from are generalized and don’t work for all listeners. A hope for the future is to have all people get their own personalized HRTFs that match their physical makeup, but the process of doing so is time-consuming and clunky, and not viable for most people. Gan Woon-Seng and his co-authors voice the importance of personalized HRTFs in their scientific paper, “due to our idiosyncratic anthropometric characteristics, HRTFs are unique to each person and thus, individualized HRTFs should be used for natural 3D audio rendering. As a result, individualized HRTFs are required to be measured for every individual from acoustical measurements “
Comments