When you look at the spec sheet of the IP phone you just bought, have you ever wondered what all those codecs mean? What all those things starting with G are? What Opus is? What iLBC is?
Or for that matter, what a codec is?
This blog is for you.
We’re going to run through the most widely used audio codecs in VoIP telephony. After reading this, you’ll be able to look at a spec sheet and know what all those codecs refer to. You’ll know what codecs are good for what use-case.
We’re not going to get too technical here. Rather, we’re going to focus on what each codec means for you.
Introduction to Audio Codecs
Before diving in, we need to explain what an audio codec actually is.
Codec is short for both “coder-decoder” and “compression-decompression.”
Here’s a very brief, lay person’s explanation of what happens when you speak into a microphone:
As you speak, your voice is turned into digital information. This information is encoded for transmission or storage according to specific algorithms. When you listen to this encoded data, it’s decoded for playback according to algorithms. That’s how you get coder-decoder, or codec. It’s a two-way street.
As a general rule, codecs try to fit as much data as they can into as small of a package as they can. Just like with coding and decoding, they compress and decompress the data according to algorithms. That’s how you also get compression-decompression.
When you talk about audio codecs, you’re talking about standards that dictate the use of all those algorithms: sets of instructions that specify how much information is kept in how large of a container.
(As a side note, you can see why it’s actually more accurate to refer to them as “audio coding formats” or “audio compression format” or “audio coding standards” or “audio coding specifications,” depending on the example. It’s common practice, however, to refer to them all as “audio codecs.” That’s how you’ll see them referred to most often in spec sheets, so that’s what we’re going to call them. Hope you’re ok with that!)
How do the people who write the standards determine what should be kept? They try to balance sound quality and resource use, while keeping in mind technical limitations.
Many, many audio codecs have been developed over the years to find the perfect balance. The problem is, what’s perfect in one case isn’t necessarily perfect for all cases. What do we mean by that? To explain, let’s dig a little deeper into the types of audio codecs.
Uncompressed vs Lossless vs Lossy
Audio codecs come in three flavors: uncompressed, lossless, and lossy.
Uncompressed codecs, like those that produce .wav files, create massive audio files, because they include everything. They also give you the absolute best sound quality, if the original data is high quality. Codecs can never make up for poor production values.
Lossless codecs, like FLAC, create very large files that are still compressed. The original, uncompressed file can be remade from a lossless file: that’s why it’s called lossless. They give you excellent audio quality, which is why audiophiles are willing to pay a premium for lossless files.
Lossy codecs, like MP3, create files of a manageable size. The original, uncompressed file cannot be remade from lossy files. Information is irretrievably lost. Sound quality can still be excellent, but it’ll always be lagging behind lossless or uncompressed.
For VoIP, lossy codecs are standard.
Audio in the Office
You might think that uncompressed or lossless codecs are the way to go. But they’re actually not used in business communications. Why not? For two main reasons: 1) the realities of resource use, and 2) the situations when you use business audio technology.
Maybe one day technology will progress to the point that we don’t have to worry about bandwidth or storage. Maybe the whole telephony system will be lightning fast fiber. Maybe harddrives will be limitless. On that day we’ll all migrate to uncompressed audio communication. But for now the battle between higher quality audio and limited resources favors the resources.
Here’s an example. If you ever download a music file in the .flac format, you’ll notice that the file is many times larger than an .mp3. It’ll also sound richer, if your audio equipment is good enough for the format to make a difference. Now imagine a song that’s as long as a conference call. It it were in .flac, that would be a enormous file, potentially running into the Gigabytes. Making sure all that information gets to its destination in the proper order in real time—in other words, that you can hold a conversation—that’s really tricky. That takes a whole lot of dedicated bandwidth.
Conference calls are important, but is better sound quality actually meaningful in this example? As long as you can understand the conversation, both words and tone of voice, you’ll be happy.
The audio can be compressed much more while still being perfectly usable.
The second reason lossless codecs haven’t taken over is more practical: the situations when you use audio technology in the office aren’t conducive to high-quality sound. Think of the difference between the audio from a professionally recorded and produced movie and the chaos of a conference call. One deserves to be in lossless, while the other doesn’t. Lossless or uncompressed files can’t make up for poor production values.
Business audio equipment has improved greatly over the years, and sound quality has improved with it. But there are always going to be limitations based solely on the realities of the workplace.
To sum up: The ideal audio codec provides sensible resource use for the specific scenario and the specific equipment being used.
Audio Resolution: Bit Rates
Generally speaking, larger files or higher resource use means better sound quality, because they increase audio resolution. This is not always the case, as we’ll get to in a second, but first, let’s explain the basics.
You’ve probably seen music files referred to by bit rate. 64 kilobit per second, 128 kbit/s, 192, 256, 320. You may have seen the term “variable bit rate” or VBR. These are simply measurements of how much information is processed at any given moment. That’s why it’s helpful to think of bit rates as the resolution of an audio file. The higher the bit rate, the more information is present.
Variable bit rate refers to files that have optimized how much information is needed at any given moment. When the orchestra is swelling to a crescendo, you’ll want all the information you can get. When the singer pauses and there’s a moment of silence, there’s almost no audio information. VBR accounts for these differences, and compresses files according to need.
We said above that it’s not always the case that a larger file means better sound. Let’s explain a bit more.
A good comparison is a digital image. If you start out with a lovely, large image in high resolution, you’re going to have a big file, and it’ll be worth it. If you start with a small image in low resolution and blow it up, you can make a file that’s just as big, but it’s going to be a pixellated mess. No matter how hard your software tries to fill in the blanks, the fact of the matter is that the information necessary for the large image isn’t present in the small image. Resolution matters.
The same goes for audio. If you start with bad sound, a higher bit rate won’t help it sound any better.
The other reason is that some codecs simply produce better audio per bit rate than others. Engineers have been working on these problems for a long time by now, and have improved on the codecs of previous generations.
Before we get into all the different codecs you’ll encounter, there is one more distinction you need to be aware of: wideband vs narrowband audio.
Frequency Range: Wideband vs Narrowband
The “band” in wideband or narrowband refers to frequency range: the range between low-pitch and high-pitch sounds.
One easy way to reduce resource use with audio is to eliminate unnecessary information. Human voices fall within a particular frequency range, so if you’re building a system for vocal communication, you don’t need to incorporate the full spectrum of audible sound, which for humans is roughly 20 Hz to 20 kHz. You just need the human vocal range.
Traditional telephony uses narrowband audio, which includes enough of the human vocal range for general use (300 Hz to 3.4 kHz). We don’t want to get into the technical details, but just know that narrowband audio is the reason that your old telephone makes voices sound tinny. Roughly, the tops and bottoms of voices have been eliminated.
Wideband audio, which you’ll often see referred to as HD audio, simply expands the frequency range that’s included (50 Hz to 7 kHz). That’s why wideband audio sounds more lifelike: more of the vocal information you expect to hear is actually being transmitted.
With VoIP, both wideband and narrowband codecs have their place.
Guide to VoIP Audio Codecs
So then, we come to the main question: What are the codecs that are used in VoIP? What do all those letters and numbers mean?
We’re going to start with the standards from the International Telecommunication Union (ITU). The ITU is a UN agency that tries to develop universal standards for telecommunications. They’re responsible for maintaining many of the most common codecs used in VoIP.
ITU standards also happen to be easy to recognize, because they all start with G.
G.711 (G.711 µ-law, G.711 A-law, G.711.0, G.711.1)
G.711 is a narrowband codec, also known as Pulse Code Modulation (PCM). It’s been around since 1972 and is still used today.
G.711 provides audio at 64 kbit/s with a frequency range of 300 Hz to 3.4 kHz.
There are two similar versions of G.711: µ-law and A-law. These refer to the algorithms used to encode audio data. G.711 µ-law is the commonly used version in North America. Everywhere else commonly uses G.711 A-law. In terms of your experience, they’re going to be identical.
You’ll often see something like G.711u/a or G.711µ/a written in spec sheets. That means both versions are supported.
G.711.0 is a new enhancement to G.711, approved in 2009, that adds extra compression for even more resource efficiency. G.711.1 is also new, approved in 2008. It’s essentially a wideband expansion of G.711, providing audio at 64, 80, or 96 kbit/s. Both of these have yet to penetrate the VoIP market deeply.
G.719 is a full-band codec, approved in 2008, based on technology from Polycom and Ericsson.
What is a full-band codec? G.719 covers the entire range of human hearing, 20 Hz to 20 kHz. This means you can use the codec for music as well as speech, since music uses a much broader swath of the audio spectrum than speech. It has a bit rate of 32 up to 128 kbit/s, so it can be more resource-intensive.
G.722 (G.722.1, G.722.1C, G.722.2)
G.722 was the first wideband codec to be approved by the ITU. It was approved in 1988. G.722 provides audio at rates of 48, 56, or 64 kbit/s with a frequency range of 50 Hz to 7 khz. Voices sound more lifelike, yet it’s still efficient.
G.722.1 and its variant G.722.1C are patented versions of wideband audio developed by Polycom, and are essentially implementations of Polycom Siren 7 and Siren 14 formats. They offer wideband audio at lower bit rates for greater efficiency. G.722.1C expands the range to 14 kHz.
G.722.2 is also known as Adaptive Multi-Rate Wideband or AMR-WB. It was developed by Nokia and VoiceAge. It is slightly more efficient than G.722.1, and is found especially in the world of mobile devices.
All of these provide wideband audio, meaning voices will sound fuller and more natural than with G.711.
G.726 is a narrowband codec. It’s a combination and extension of the older G.721 and G.723 standards.
G.726 transmits at bit rates of 16, 24, 32, and 40 kbit/s, yet the sound quality is very similar to G.711, which transmits at 64 kbit/s. With G.726, you’re getting old-school sound, but efficient resource use.
In the spec sheets, you might see G.726-32, which refers to G.726 at 32 kbit/s.
G.729 (G.729A, G.729B, G.729AB)
G.729 operates at a bit rate of 8 kbit/s, making it the lightest on your resources of all the common G codecs you’ll encounter. It’s particularly good for bandwidth-intensive activities, like conference calls, where every bit counts.
There are numerous varieties of G.729. The most common are A and B (and AB). G.729A is an even lighter version of G.729, with a consequent reduction in sound quality. G.729B uses voice activity detection to reduce resource use. G.729AB means both are supported.
AMR stands for Adaptive Multi-Rate. It’s a narrowband codec with a frequency range of 200 Hz to 3.4 kHz. Because it uses variable bit rate (VBR) technology, meaning that it uses less data in quiet spots and more data in louder spots, it can be extremely efficient. Its bit rates vacillate between 4.75 and 12.2 kbit/s.
AMR technology is incorporated into the GSM codec, which is discussed below.
AMR-WB is a wideband adaptation of AMR, and is codified at G.722.2, which we discussed above.
GSM (Full Rate, GSM-FR, GSM 06.10)
GSM stands for Groupe spécial mobile, now Global System for Mobile Communications. It was developed in the early 1990s, as you might guess, for mobile devices. The section of the original specifications that dealt with this codec is 06.10, so you’ll often see it written GSM 06.10.
It has a bit rate of 13 kbit/s, so while it doesn’t produce good sound quality, it’s very efficient, which is particularly necessary for cell phones.
One interesting thing about GSM is that by solving for the limitations of mobile devices, engineers unintentionally produced a codec that’s good for VoIP.
iLBC stands for Internet Low Bit rate Codec. It’s an open source narrowband audio codec with fixed bit rates of 13.33 or 15.2 kbit/s. It produces audio similar to G.711, but comes with packet loss concealment.
When data are transmitted, they’re broken down into discrete packets, which are sent out into the network. Packet loss happens when the data don’t reach their destination or are delayed. The effect is patchy sound, where whole sections of the conversation go missing.
iLBC reduces the effect of packet loss, yet retains the efficiency of other low bit rate codecs.
Opus is a very dynamic, open audio format developed by the Internet Engineering Task Force. It can support constant or variable bit rates from 6 kbit/s to 510 kbit/s, which means it works for a wide variety of tasks.
It’s the successor of the Speex codec, and is, roughly speaking, a combination of the CELT and SILK codecs.
Opus is gaining in popularity, even though it was only released in 2012.
RTAudio is a codec developed by Microsoft for real-time VoIP communication over Microsoft platforms, so you’ll only be encountering it when using Microsoft Skype for Business, Microsoft Lync, or a similar program.
It can be either narrowband or wideband, depending on the use-case, and produces sound similar to the more commonly used, open codecs.
Siren (Siren 7, Siren 14, Siren 22)
The Siren codecs are wideband codecs developed and patented by Polycom. The numbers—7, 14, 22—refer to the frequency range in kHz.
All of the Siren codecs have been incorporated into ITU standards, which is where you’ll see them most often. Siren 7 became G.722.1, Siren 14 became G.722.1C, and Siren 22 became G.719.
Are there any other formats that you would like to know about? Let us know in the comments, and we’ll add the information for you!