Google Makes Its Lyra Low Bitrate Speech Codec Public

Google has released its Lyra audio codec beta source code on GitHub, making incredible quality low-bitrate audio processing available to all developers. The codec is most useful in embedded and bandwidth restricted situations where as much data needs to be saved as possible.

Lyra: Almost Nothing Never Sounded So Good

The audio codec works on the principle of providing the most natural-sounding speech with the lowest data rate possible. It succeeds in creating almost eerie levels of audio reproduction with bitrates as low as 3 kbps. Google uses real-time Lyra compression already in its Duo app, though you’d not be blamed for not even realizing a difference from regular bandwidth audio.

To demonstrate how much better Lyra is than other codecs, Google provides examples via a blog post comparing the Machine Learning driven compression codec to other 3 and 6 kbps alternatives.

It’s a night a day difference, and giving developers the world over these tools will be a significant driver in improving communication quality where bandwidth is scarce. It’s also an excellent motivator for developers looking to create new apps in emerging markets, something that Google is sure to cover in this year’s free virtual Google I/O online conference.

The beta source code is currently designed with 64-bit Arm devices in mind, though the examples will also run on 64-bit x86 Linux systems. The source code is provided fully documented, though it is in beta, and the GitHub page provides installation instructions and how to build Lyra on Linux for Arm 64-bit targets.

To get the Lyra beta source code, head to the Lyra GitHub page.

How Does Lyra Work?

While the actual process Lyra uses is an incredibly complex combination of machine learning models trained on thousands of hours of speech data and optimizations of existing audio codec technology, the theory is quite simple.

Every 40ms, features are taken from speech and compressed down to 3kbps. These features represent speech energy points across the frequency spectrum closest to human auditory speech response – the things we need to recognize and understand when someone speaks.

The key part of what makes Lyra special is how it uses this information:

However traditional parametric codecs, which simply extract from speech critical parameters that can then be used to recreate the signal at the receiving end, achieve low bitrates, but often sound robotic and unnatural. These shortcomings have led to the development of a new generation of high-quality audio generative models that have revolutionized the field by being able to not only differentiate between signals, but also generate completely new ones.

After transmission, Lyra rebuilds the waveform by filling in what is missing using this process, while somehow not being too computationally complex.

On the one hand, it’s a technological marvel that will run almost anywhere. On the other, I’m still not 100% convinced it isn’t witchcraft.

Join the Discussion

Back to top