Russ Hepworth-Sawyer enters the world of data compression formats, exploring their past, widespread acceptance in the present, and what the future holds…
DAT (top), despite being initially introduced as a consumer format, soon found favour with studio engineers worldwide as the was backward-compatible. two-track master format, while the Digital Compact Cassette held much promise at a time when analogue cassette still ruled as it.
While the debate over whether digital audio can ever equal or surpass the quality of analogue rages on, the portable audio file reigns supreme. The prevalence of mp3 and AAC due to delivery portals such as iTunes – plus the accompanying attractive players, of course – ensure that data-compressed audio is more the norm these days than the exception.
Rise of the machine
At the dawn of the consumer digital audio era, the Compact Disc (CD) first delivered a full Long Play record (LP to you and me) at a resolution of 44.1kHz and a bit depth of 16 bits – as it remains today. The CD standard is uncompressed and is often referred to as PCM (Pulse Code Modulation).
The new-fangled CD was quickly adopted by the consumer and in a short space of time outstripped vinyl as the main delivery medium. CD was invented through a successful partnership between Sony and Philips; this partnership widened as the search heated up to replace the humble analogue cassette, with a larger consortium of companies inventing a format called Digital Audio Tape (DAT).
DAT’s benefits were portability and the ability to record full-quality digital audio. However, it failed in its quest to become cassette’s replacement for too many reasons to list here. Studios, however, immediately adopted it as a replacement for 2-track tape. For many smaller studios, DAT represented an improvement over many cheaper two-track open reel machines.
As DAT failed, new consumer formats were sought to meet the need for a recordable digital format. Sony and Philips – independently, this time – developed new portable systems: Philips’ Digital Compact Cassette (DCC) and Sony’s MiniDisc (MD).
Philips’ product was backwards-compatible with its analogue compact cassette (still a very popular format for the car and Walkmans at the time), and to enable the company to squeeze the necessary digital audio data onto the small and slow-running cassette, a lossy data-reduction algorithm called Precision Adaptive Sub-band Coding (PASC) was invented, reducing the amount of data recorded by around 75%.
Sony’s MiniDisc was an entirely new format based on magneto-optical drives using a data-compression method called Adaptive Transform Acoustic Coding (ATRAC), with data being read in an identical way to that of CD. While both formats made an impact in differing markets – with MiniDisc becoming reasonably popular in radio for a spell – neither was particularly successful in replacing the cassette.
Meanwhile, the Motion Picture Experts Group (MPEG), a subset of the International Organisation for Standardisation (ISO) and the International Electotechnical Commission (IEC), was working on data compression for video, and in 1993 launched MPEG-1. Within MPEG-1 is the Layer III audio codec, more commonly referred to as mp3.
The format was soon adopted by the audio world just as the internet became more widespread, and thanks to the format’s relatively small file sizes it gave rise to the instant digital delivery of audio material. mp3 and its later variant, mp4 (AAC), have improved in sonic quality through the use of Variable Bit Rates (VBR) and have become the mostly listened-to form of audio globally.
Feel the squeeze
Many codecs use data-reduction algorithms based around Perceptive Coding. These are realised through a number of clever techniques understood from studying the way in which we hear. The most notable is sub-band coding that emulates auditory masking – a natural phenomenon in the ear as we hear sound. The concept states that one fine frequency band can be masked by a louder frequency band adjacent to it.
The ear splits up the frequency range into clusters of frequencies known as Critical Bands. These are considered the discernible frequencies that the ear can detect. Visualise a 31-band, third-octave graphic EQ and you’re thinking in a similar fashion to Critical Bands, although many data compression algorithms comprise many more than 31 bands.
Understanding Critical Bands and manipulating their capture over time offers something for the algorithms to chop out of the data without the ear technically ‘missing’ the information. Expanding on this principle, MiniDisc and mp3 subscribe to a form of data reduction called Transform Coding, whereby audio data is ‘transformed’ so it can be analysed both against time as well as frequency, as if plotting the sound on a spectrograph.
Transform Coding takes not just a static analysis of the frequency content at any one fixed point (such as above), but against time, too – the concept being that auditory masking also works in time. For example, our hearing seemingly ignores quiet sounds immediately prior to a loud transient. The ability to remove the immediate quiet sound before a transient and the quiet bands on either side results in effective data reduction.
To reduce the data yet further, redundant 1s and 0s are stripped out, leaving a more economical data stream. This reduces the word length of the data depending on its content, leaving redundant 0s off a word at the end for louder sounds. The louder the sound, the lower the word length (and vice versa).
Another format increasing in popularity is Advanced Acoustic Coding (AAC), also known as mp4. AAC, like mp3, employs Transform Coding to apply a psychoacoustic model to the data. However, AAC’s jewel in the crown is that it employs a number of additional coding modules that are arranged for the type of audio it is processing. These include modules to do with auditory masking and Transform Coding, plus some additional processes. The modules are arranged into a number of profiles – the Main Profile (Main) is the most involved (using all of the modules available), while Low Complexity (LC) mode is the leanest option. The final option is Scalable Sample Rate (SSR), which will change sample rate depending on the frequency content of the material.
A notable alternative to the MPEG standards is Ogg Vorbis, a freeware audio encoder currently used in Spotify. Many champion this as it is said to sound better than the mp3/AAC varieties.
As you work it is worth taking a listen to how each of the varieties influences how you hear PCM audio. In this respect, one of the most novel and exciting tools to emerge in recent years is Sonnox’s Fraunhofer Pro-Codec, which enables you to compare, in real time, the difference between four different MPEG compressions. If you work with Pro-Codec you can tailor your mixes and masters to react better in data-compressed form.
Nothing to lose
Looking forward, there are three immediately obvious paths that data-compressed audio could follow. Firstly, given all of the past attempts to bring improvements to CD quality with SACD and DVD-A, it may go nowhere at all. These formats have failed to take hold. The consumer is voting for convenience over quality.
Secondly, we could move iteratively through newer lossy file formats that improve on the AAC standard, providing us with yet smaller file sizes and improved audio quality.
Thirdly, perhaps one day internet connectivity will become sufficiently fast and portable devices capable of storing such huge quantities of data that we will finally be able us to use lossless audio without flinching. Only time will tell…