Surround Sound Formats

Introduction

Surround sound uses different audio channels or tracks to simulate the effect of being inside the action for the audience. This technology is used both for movie sound track and for recorded music like concerts.

Kopfhörer The idea is to place the audience inside an arrangement of speakers. Normally there are 6 speakers with fixed positions. The positions are marked as center, front right and left, and both sides are called surround. The center, left and right speakers are used for speech and music information while the surround speakers are used for sound effects.

Additionally, there is a speaker for low frequencies called subwoofer to generate the low frequency effects (LFE). The subwoofer’s position is at front center, because the human ear cannot locate low frequencies very well.

The whole arrangement is also called 5.1 Surround Sound with 5 speakers for high frequencies and 1 speaker for the low effects. More sophisticated technology methods can support even 6.1 or 7.1 speaker arrangements.

Beside this real surround technologies there are known virtual surround algorithms like the Sound Retrieval System (SRS). These principles create a surround effect with only two channels using psychoacoustic models to create the effect right in the ear of the audience.

The following article is focused on real surround principles only.

In the market place there are different proprietary methods known to create surround sound. Most time the different developers present a standard code that has to be licensed before using and modified to be executable on the desired hardware platforms.

The most prevalent methods are:

Dolby Digital (formerly known as AC3)
Dolby Digital EX
DTS Digital Surround
DTS Extended Surround (DTS-ES)
THX Surround EX

The most established methods are:

Dolby Digital (formerly known as AC3)
DTS Digital Surround

The following information is extracted from publications available at www.dolby.com.

DTS / AC3 in general

In the detailed description, both algorithms are quite different, but in general there are a number of similar operations. The input signal is always a digital PCM-coded multi-channel signal. The fundamental idea is to reduce the data size by recoding the signal with a sophisticated coding technique.

This approach does not reduce the data size in order to reduce storage cost, but primarily to allow more accurate audio signals than linear PCM at the same bit rates. Therefore the quality of the audio signal rises at a constant bit rate.

The PCM-coded signal has a data bit rate of 705 kbits/s for each channel with a sample resolution of 16 bits and a sampling rate of 44.1 kHz. With a new coding algorithm it is possible to keep the bit rate constant by using an input signal with a sample resolution of 24 bits and a sampling rate of 192 kHz.

The reduction in data represents the removal of objective and perceptual redundancies that are in the original PCM signal. The recoding algorithm consists of a lossless coding part to remove the objective redundancies and a lossy coding part to remove the perceptual redundancies by using psychoacoustic models. Psychoacoustic models are used to remove data that do not reduce the quality of acoustic perception, but are nevertheless irrelevant. This coding is lossy because the data cannot be reconstructed in the decoder system.

In a first step a filterbank splits the signal into a number of frequency sub-bands. In the DTS system a polyphase filterbank and in AC3 a TDAC (time domain aliasing cancellation) filterbank is used.

In both systems quantizer with global bit management units are used. These units are used to calculate the ideal bit rate for each channel. All channels are monitored to keep the global bit rate constant. Thus the single bit rate of each channel and each sub-band is controlled and tuned. The calculation uses the output filter coefficients of the filterbank to define the best bit rate for each sub-band and each channel.

The psychoacoustic analyses exploits an effect called auditory masking and in addition adapts the input signal to the spectral sensitivity curve of the human ear.

The effect of auditory masking describes a phenomenon in which a frequency with higher amplitude covers a frequency in a short spectral distance with lower amplitude. Thus the frequency with the lower amplitude is masked and cannot be heard and therefore coding of these masked frequencies is redundant. It can be seen that frequencies above a masking frequency are easier to mask than frequencies below.

The sensitivity of the human ear is frequency depending. Up to higher frequencies the sensitivity of the ear is reduced. Thus higher frequencies must produce stronger amplitude to be recognized. If the amplitude of a frequency lowers a perception threshold, the frequency is redundant and does not need to be encoded.

The quantizer then eliminates all irrelevant information from the signal and reduces the data rate.