Immersive audio production

In the previous article about immersive audio, we explained the importance of deploying a spatial representation of sound in immersive environments. Now, it is time to look into the production side of this topic.

A straightforward way to produce ambisonic sound is using a soundfield microphone. It consists of high quality microphone capsules arranged, with their polar pattern taken into account, in angles and directions that allow covering the whole 360° sound sphere. Their frequency response and sensitivity should also be carefully matched. What is more, for good directionality of high frequencies, due to spatial resolution of sound waves, they should be packed really close to each other – ideally they would be placed in the same point.

Most ambisonic microphones on the market are made of four small capsules, with cardioid pickup patterns, arranged in a form of a tetrahedron. Four channels on the direct output represent ambisonic sound in so called A-format. Usually manufacturers provide software tools dedicated to their microphones to convert it to standard B-format ambisonics track. Some popular models and companies that produce ambisonic microphones are: Sennheiser AMBEO, Soundfield SPS200 and ST450, Rode Videomic Soundfield.

Soundfield microphones are a perfect solution for recording ambient noise – sounds of the street, nature, etc. However, when recording speech, we can’t really expect the highest quality – mostly due to the distance to the microphone. It is much better to add for certain sound sources, like speaking actors, traditional e.g. Lavalier microphones and then adjust levels and mix them with ambisonic background. First issue here is that we need to control delays. Otherwise, an unwanted short echo effect or sound phase shift could appear, as soundfield microphones still records the same source but from different distances.

To mix traditional recording into ambisonic format, we need to transform it – simulate what soundfield microphones would ‘hear’ from the sound source. In this conversion we don’t lose quality, but we need the right tools. Basic equipment for such processing is in general called DAWDigital Audio Workstation. It is a common name for software or devices for recording sound, editing it and manipulating (filters, effects, loops, etc.) to produce digital audio content. The DAWs market provides whole range of solutions from cheap or free applications running on a commodity PC to complex hardware stations. Modern DAWs are easily extended using software plugins that provide additional functionalities. For ambisonic sound production plugins worth checking are e.g ambiXatkToolkit or O3A Core.

Encoding of traditional sound track into an ambisonic format requires a relative position of sound source and virtual microphone. Additionally, it can be also adjusted by the attenuation of objects between camera and actor. It is not a problem for static sound sources, but it becomes an issue with moving actors. Usually, it requires an experienced operator that ‘animates’ the sound (with 6 degrees of freedom – rotation also makes a difference!). With more actors or live productions, it’s usually better and more convenient to use an automatic tracking systems.

In fact, with this approach at production side we create virtual sound environment consisting of audio objects. The final format depends on the intended use. It all can be mixed as an ambisonic sound, but for highly interactive scenarios the final mixing can be left for the user application.

Authors: Mikołaj Węgrzynowski (PSNC), Eryk Skotarczak (PSNC)
Photo: Eryk Skotarczak (PSNC)

Leave a Reply