Despite the name, television is not all about vision. Often underestimated part of transmission, but highly important for appreciation of the content quality, is sound. In some TV formats it is absolutely essential. Does immersive video require special sound? One may argue about definitions, but in some way playing a sound is always an immersive experience. The question is only about intensity.
Sound fills your room and surrounds you, bouncing from the walls it gets to your ears from many directions. It brings “there” to your home and the strength of the feeling depends on how well we can recreate original acoustic field. Watching a football match muted makes it somewhere far, behind a thick glass. Adding sounds of a stadium makes an impression it’s just outside your window. The simplest stereo gives you basic sensation of the sound sources position. With binaural recording (stereo taking into account psychoacoustics) and headphones this feeling becomes really irresistible.
What about home theatre surround audio systems (5.1, 7.1, etc.)? Watch a good horror movie with such set and you will know the thrill of something scary creeping just behind your back. Improvements in this area, addressed usually for professional setups such as 22.2, add new channels to increase sound field resolution and to represent height in so far horizontal plane of speakers position.
These commonly used technologies can extend immersive video, but there is a situation when it is still not enough. For HMD used with headphones, the sound wouldn’t match what you see when you rotate your head. For this purpose, the ambisonic sound recording is much better suited. Instead of providing one channel per fixed speaker position this format describes sound field at the point of listening. In order to actually play it, the position of the speaker must be included in calculation, which is exactly what we need for rotating headphones.
In Pilot 3 of ImmersiaTV we are going to use first order ambisonic sound. This means that there are four channels: one for omnidirectional mix and three for the difference in each of 3 dimensions. Mind that referred listening point is virtual and lies inside our heads, but first order allows to calculate fair approximation in real location of our ears. For better approximation we would need higher order – what means more channels and much more complicated production side.
Another option of our great interest is providing sound objects – description of audio source including their position in 3D space. This approach would greatly support interactivity, allowing selectively adjust sources to user actions. All these sound types could be consistently encoded and delivered together as they are defined in MPEG-H 3D Audio standard.
Production of highly immersive soundtrack is not an easy thing. Recording requires special equipment, postproduction adjustments, etc. But spatially located sound, matching the visual input, can intensify overall immersive experience, giving us great motivation to include it in our solutions.
Author: Szymon Malewski, PSNC
Photo: Eryk Skotarczak, PSNC; credits: Magdalena Wilk, PSNC