Sound Propagation With Bidirectional Path Tracing | Two Minute Papers #111


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Imagine if we had an accurate algorithm to
simulate how different sound effects would propagate in a virtual world. We would find computer games exhibiting gunfire
in open areas, or a pianist inside a castle courtyard to be way more immersive, and we’ve
been waiting for efficient techniques for this for quite a while now. This is a research field where convolutions
enjoy quite a bit of attention due to the fact that they are a reliable and efficient
way to approximate how a given signal would sound in a room with given geometry and material
properties. However, the keyword is approximate – this,
however, is one of those path sampling techniques that gives us the real deal, so quite excited
for that! So what about this path sampling thing? This means an actual simulation of sound waves. We have a vast literature and decades of experience
in simulating how rays of light bounce and reflect around in a scene, and leaning on
this knowledge, we can create beautiful photorealistic images. The first idea is to adapt the mathematical
framework of light simulations to be able to do the very same with sound waves. Path tracing is a technique where we build
light paths from the camera, bounce them around in a scene, and hope that we hit a light source
with these rays. If this happens, then we compute the amount
of energy that is transferred from the light source to the camera. Note that energy is a more popular and journalistic
term here, what researchers actually measure here is a quantity called radiance. The main contribution of this work is adapting
bidirectional path tracing to sound. This is a technique originally designed for
light simulations that builds light paths from both the light source and the camera
at the same time, and it is significantly more efficient than the classical path tracer
on difficult indoors scenes. And of course, the main issue with these methods
is that they have to simulate a large number of rays to obtain a satisfactory result, and
many of these rays don’t really contribute anything to the final result, only a small
subset of them are responsible for most of the image we see or sound we hear. It is a bit like the Pareto principle or the
80/20 rule on steroids. This is ice cream for my ears. Love it! This work also introduces a metric to not
only be able to compare similar sound synthesis techniques in the future, but the proposed
technique is built around minimizing this metric, which leads us to an idea on which
rays carry important information and which ones we are better off discarding. I also like this minimap on the upper left
that actually shows what we hear in this footage, exactly where the sound sources are and how
they change their positions. Looking forward to seeing and listening to
similar presentations in future papers in this area! A typical number for the execution time of
the algorithm is between 15-20 milliseconds per frame on a consumer-grade processor. That is about 50-65 frames per second. The position of the sound sources makes a
great deal of difference for the classical path tracer. The bidirectional path tracer, however, is
not only more effective, but offers significantly more consistent results as well. This new method is especially useful in these
cases. There are way more details explained in the
paper, for instance, it also supports path caching and also borrows the all-powerful
multiple importance sampling from photorealistic rendering research. Have a look! Thanks for watching and for your generous
support, and I’ll see you next time!

Author Since: Mar 11, 2019

  1. nice! Although that first test scene was a bit too busy to my ears. Hard to judge the quality with all of that going on at the same time. I thought I might have heard some background noise in it but that may just have been the source playing back water sounds.
    Really the main problem with such techniques, if you want to add them to games, is that those might seem like rather quick times but that neglects the potentially many other things that have to be computed at the same time for a game to work. Still, I'd love games to have such proper full 3D sound.

  2. Eventually this technology will be used in video gaming. We have the theoretical knowledge, we just need the hardware capabilities.

  3. what games uses all these real time sound propagation tech? I feel audio is too neglected in games… come to think about it, these tech should be able to be fully implemented for walking simulators.

  4. Vision is said to be the primary human sense, or so I have heard. I personally find sense of hearing to be very important in computer reality simulations like many games aspire to do, and I am sure there are many others who feel the same way. Yet, sound has often been a second grade citizen in at least games (which can be said to primarily drive evolution of rendering solutions). Probably because if we start to dedicate as much effort to calculate sound correctly, development teams will have to spend a far larger budget of their time and money for that, meaning that someone will have to do the R&D upfront and give it or sell it afterwards in the least. Then there is also the factor of having enough computational resources — strangely and in contrast with what happened to at least the PC graphics rendering hardware since the VGA adapter, the modern sound card is a rather cheap and simple piece of silicone that has not seen much love since the days of Sound Blaster, Gravis Ultrasound and Aureal Vortex sound cards which aspired to do for sound what cards like Nvidia GeForce and AMD Radeon do for graphics. And it's a shame because the market ought to be there, or at least get there soon. Which brings me to another interesting point which has to do with this video of course: sound "ray" tracing (because sound isn't really rays).

    Now, what is typically referred to with "ray tracing" is special application of wave propagation calculations to light "wave-particles" (because light behaves as both, as we know from school). Meaning that sound behaves in a similar way — waves of sound originate from sources (typically called light sources with light, unsurprisingly), carry multiple frequencies (color with light rays/waves), refract, reflect, diffuse and do pretty much all the things waves, including light waves, do. The point is that much of what we have learned about ray tracing and ways to optimize it with cards like Nvidia RTX etc, can be applied to sound calculation, with equally impressive results, I believe. We should totally be looking into it. Sound is good 🙂

  5. My guess is that the audio samples themselves should be as raw as possible for this to achieve optimal effect. Recordings usually don't isolate the effects of the environment from the sound. If there is a way to record the sounds in a room with walls that simply absorb the sound, I bet that they would fit very well into these engines, even if the raw recordings would sound very unnatural.
    I wonder how much the microphones themselves affect the recording, though.

Related Post