Dynamic compression(Dynamic range compression, DRC) - narrowing (or expansion in the case of an expander) dynamic range phonograms. Dynamic range, is the difference between the quietest and loudest sound. Sometimes the quietest sound in a soundtrack will be a little louder than the noise level, and sometimes a little quieter than the loudest. Hardware devices and programs that perform dynamic compression are called compressors, distinguishing among them four main groups: compressors themselves, limiters, expanders and gates.

Tube analog compressor DBX 566

Downward and upward compression

Downcompression(Downward compression) reduces the volume of a sound when it begins to exceed a certain threshold, leaving quieter sounds unchanged. An extreme version of downward compression is limiter. Boost compression Upward compression, on the other hand, increases the volume of a sound if it is below a threshold without affecting louder sounds. At the same time, both types of compression narrow the dynamic range of the audio signal.

Downcompression

Boost compression

Expander and Gate

If a compressor reduces dynamic range, an expander increases it. When the signal level rises above the threshold level, the expander increases it further, thereby increasing the difference between loud and soft sounds. Devices like this are often used when recording a drum kit to separate the sounds of one drum from another.

A type of expander that is used not to amplify loud sounds, but to attenuate quiet sounds that do not exceed a threshold level (for example, background noise) is called Noise gate. In such a device, as soon as the sound level becomes less than the threshold, the signal stops passing. Typically a gate is used to suppress noise during pauses. On some models, you can make sure that the sound does not stop abruptly when it reaches a threshold level, but gradually fades out. In this case, the decay rate is set by the Decay control.

Gate, like other types of compressors, can be frequency dependent(i.e. treat certain frequency bands differently) and can operate in side-chain(see below).

Compressor operating principle

The signal entering the compressor is split into two copies. One copy is sent to an amplifier where the gain is controlled external signal, the second copy - generates this signal. It enters a device called a side-chain, where the signal is measured and, based on this data, an envelope is created that describes the change in its volume.
This is how most modern compressors are designed, this is the so-called feed-forward type. In older devices (feedback type), the signal level is measured after the amplifier.

There are various analog variable-gain amplification technologies, each with its own advantages and disadvantages: tube, optical using photoresistors, and transistor. When working with digital sound(in a sound editor or DAW) can use its own mathematical algorithms or emulate the operation of analog technologies.

Main parameters of compressors

Threshold

A compressor reduces the level of an audio signal if its amplitude exceeds a certain threshold value (threshold). It is usually specified in decibels, with a lower threshold (eg -60 dB) meaning that more audio will be processed than a higher threshold (eg -5 dB).

ratio

The amount of level reduction is determined by the ratio parameter: ratio 4:1 means that if the input level is 4 dB above the threshold, the output level will be 1 dB above the threshold.
For example:
Threshold = −10 dB
Input = −6 dB (4 dB above threshold)
Output = −9 dB (1 dB above threshold)

It is important to keep in mind that signal level suppression continues for some time after it falls below the threshold level, and this time is determined by the value of the parameter release.

Compression with a maximum ratio of ∞:1 is called limiting. This means that any signal above the threshold level is attenuated to the threshold level (except for a short period after a sudden increase in input volume). See “Limiter” below for more details.

Examples of different Ratio values

Attack and Release

A compressor provides some control over how quickly it responds to changes in signal dynamics. The Attack parameter determines the time it takes for the compressor to reduce the gain to a level determined by the Ratio parameter. Release determines the time during which the compressor, on the contrary, increases the gain, or returns to normal if the input signal level drops below the threshold value.

Attack and Release phases

These parameters indicate the time (usually in milliseconds) it will take to change the gain by a certain amount of decibels, usually 10 dB. For example, in this case, if Attack is set to 1 ms, it will take 1 ms to reduce the gain by 10 dB, and 2 ms to reduce the gain by 20 dB.

On many compressors the Attack and Release parameters can be adjusted, but on some they are pre-set and cannot be adjusted. Sometimes they are designated as “automatic” or “program dependent”, i.e. change depending on the input signal.

Knee

Another compressor parameter: hard/soft knee. It determines whether the start of compression will be abrupt (hard) or gradual (soft). Soft knee reduces the noticeability of the transition from the dry signal to the compressed signal, especially at high Ratio values ​​and sudden increases in volume.

Hard Knee and Soft Knee compression

Peak and RMS

The compressor can respond to peak (short-term maximum) values ​​or to the average level of the input signal. The use of peak values ​​can lead to sharp fluctuations in the degree of compression, and even distortion. Therefore, compressors apply an average function (usually RMS) to the input signal when comparing it to a threshold value. This gives a more comfortable compression, closer to human perception of loudness.

RMS is a parameter that reflects the average volume of a soundtrack. From a mathematical point of view, RMS (Root Mean Square) is the root mean square value of the amplitude a certain amount samples:

Stereo linking

A compressor in stereo linking mode applies the same gain to both stereo channels. This avoids stereo shifts that may result from individual processing of the left and right channels. This shift occurs if, for example, a loud element is panned off-center.

Makeup gain

Since the compressor reduces the overall signal level, it usually adds a fixed output gain option to achieve the optimal level.

Look-ahead

The look-ahead function is designed to solve problems associated with both too high and too low Attack and Release values. An attack time that is too long does not allow us to effectively intercept transients, and an attack time that is too short may not be comfortable for the listener. When using the look-ahead function, the main signal is delayed relative to the control signal, this allows you to start compression in advance, even before the signal reaches the threshold value.
The only drawback of this method is the time delay of the signal, which in some cases is undesirable.

Using Dynamic Compression

Compression is used everywhere, not only in musical soundtracks, but also wherever it is necessary to increase the overall volume without increasing peak levels, where inexpensive sound-reproducing equipment or a limited transmission channel is used (public address and communication systems, amateur radio, etc.) .

Compression is used when playing background music (in shops, restaurants, etc.), where any noticeable changes in volume are undesirable.

But the most important area of ​​application of dynamic compression is music production and broadcasting. Compression is used to give the sound "thickness" and "drive", to better combine instruments with each other, and especially when processing vocals.

Vocals in rock and pop music are often compressed to make them stand out from the accompaniment and add clarity. A special type of compressor tuned only to certain frequencies - a de-esser - is used to suppress sibilant phonemes.

In instrumental parts, compression is also used for effects that are not directly related to volume, for example, quickly decaying drum sounds can be made longer lasting.

Electronic dance music (EDM) often uses side-chaining (see below) - for example, the bass line may be driven by a kick drum or similar to prevent bass and drums from clashing and create a dynamic pulsation.

Compression is widely used in broadcast (radio, television, internet broadcasting) to increase perceived loudness while reducing the dynamic range of the source audio (usually CD). Most countries have legal restrictions on the maximum instantaneous volume that can be broadcast. Typically these limitations are implemented by permanent hardware compressors in the air chain. Additionally, increasing perceived loudness improves the "quality" of the sound from the perspective of most listeners.

see also Loudness war.

Consistently increasing the volume of the same song remastered for CD from 1983 to 2000.

Side-chaining

Another commonly encountered compressor switch is the “side chain”. In this mode, sound compression occurs not depending on its own level, but depending on the level of the signal entering the connector, which is usually called the side chain.

There are several uses for this. For example, the vocalist has a lisp and all the “s” stand out from the overall picture. You pass his voice through a compressor, and feed the same sound into the side chain connector, but passed through an equalizer. With an equalizer, you cut out all frequencies except those used by the vocalist when pronouncing the letter “s.” Typically around 5 kHz, but can range from 3 kHz to 8 kHz. If you then put the compressor in side chain mode, the voice will be compressed at those moments when the letter “s” is pronounced. This resulted in a device known as a de-esser. This way of working is called “frequency dependent”.

Another use of this function is called "ducker". For example, at a radio station, the music goes through a compressor, and the DJ's words come through a side chain. When the DJ starts chatting, the music volume automatically decreases. This effect can also be successfully used in recording, for example, to reduce the volume of keyboard parts while singing.

Brick wall limiting

The compressor and the limiter work approximately the same way; we can say that the limiter is a compressor with a high Ratio (from 10:1) and, usually, a low Attack time.

There is a concept of Brick wall limiting - limiting with a very high Ratio (20:1 and above) and a very fast attack. Ideally, it does not allow the signal to exceed the threshold level at all. The result will be unpleasant to the ear, but this will prevent damage to sound-reproducing equipment or excess bandwidth channel. Many manufacturers integrate limiters into their devices for this very purpose.

Clipper vs. Limiter, soft and hard clipping

© 2014 site

Or photographic latitude photographic material is the ratio between the maximum and minimum exposure values ​​that can be correctly captured in the photograph. Applied to digital photography, the dynamic range is actually equivalent to the ratio of the maximum and minimum possible values ​​of the useful electrical signal generated by the photosensor during exposure.

Dynamic range is measured in exposure stops (). Each step corresponds to doubling the amount of light. So, for example, if a certain camera has a dynamic range of 8 EV, this means that the maximum possible value of the useful signal of its matrix is ​​related to the minimum as 2 8: 1, which means that the camera is able to capture objects that differ in brightness within one frame no more than 256 times. More precisely, it can capture objects with any brightness, but objects whose brightness exceeds the maximum permissible value will appear dazzling white in the image, and objects whose brightness is below the minimum value will appear pitch black. Details and texture will be visible only on those objects whose brightness falls within the dynamic range of the camera.

To describe the relationship between the brightness of the lightest and darkest objects being photographed, the not entirely correct term “scene dynamic range” is often used. It would be more correct to talk about the brightness range or the contrast level, since dynamic range is usually a characteristic of the measuring device (in this case, the matrix of a digital camera).

Unfortunately, the brightness range of many beautiful scenes we encounter in real life can significantly exceed the dynamic range of a digital camera. In such cases, the photographer is forced to decide which objects should be worked out in full detail, and which can be left outside the dynamic range without compromising the creative intent. In order to make the most of your camera's dynamic range, you may sometimes need not so much a thorough understanding of how the photosensor works, but rather a developed artistic sense.

Factors limiting dynamic range

The lower limit of the dynamic range is set by the self-noise level of the photosensor. Even an unlit matrix generates a background electrical signal called dark noise. Also, interference occurs when charge is transferred to the analog-to-digital converter, and the ADC itself introduces a certain error into the digitized signal - the so-called. sampling noise.

If you take a photo in complete darkness or with a lens cap on, the camera will only record this meaningless noise. If you allow a minimal amount of light to reach the sensor, the photodiodes will begin to accumulate electric charge. The magnitude of the charge, and hence the intensity of the useful signal, will be proportional to the number of captured photons. In order for any meaningful details to appear in the image, it is necessary that the level of the useful signal exceeds the level of background noise.

Thus, the lower limit of the dynamic range or, in other words, the sensor sensitivity threshold can be formally defined as the level of the output signal at which the signal-to-noise ratio is greater than unity.

The upper limit of the dynamic range is determined by the capacitance of an individual photodiode. If during exposure any photodiode accumulates an electric charge of its maximum value, then the image pixel corresponding to the overloaded photodiode will turn out completely white, and further irradiation will not affect its brightness in any way. This phenomenon is called clipping. The higher the overload capacity of a photodiode, the greater the output signal it can produce before it reaches saturation.

For greater clarity, let us turn to the characteristic curve, which is a graph of the output signal versus exposure. The horizontal axis represents the binary logarithm of the radiation received by the sensor, and the vertical axis represents the binary logarithm of the magnitude of the electrical signal generated by the sensor in response to this radiation. My drawing is largely conventional and serves purely illustrative purposes. The characteristic curve of a real photosensor has a slightly more complex shape, and the noise level is rarely so high.

The graph clearly shows two critical turning points: in the first of them, the level of the useful signal crosses the noise threshold, and in the second, the photodiodes reach saturation. The exposure values ​​that lie between these two points make up the dynamic range. In this abstract example, it is equal, as is easy to see, to 5 EV, i.e. The camera can handle five doublings of exposure, which is equivalent to a 32-fold (2 5 = 32) difference in brightness.

The exposure zones that make up the dynamic range are unequal. The upper zones have a higher signal-to-noise ratio, and therefore appear cleaner and more detailed than the lower ones. As a result, the upper limit of the dynamic range is very significant and noticeable - clipping cuts off light at the slightest overexposure, while the lower limit is inconspicuously drowned in noise, and the transition to black is not nearly as sharp as to white.

The linear dependence of the signal on exposure, as well as the sharp rise to a plateau, are unique features of the digital photographic process. For comparison, take a look at the characteristic characteristic curve of traditional photographic film.

The shape of the curve and especially the angle of inclination strongly depend on the type of film and on the procedure for its development, but the main, striking difference between the film graph and the digital one remains unchanged - the nonlinear nature of the dependence of the optical density of the film on the exposure value.

The lower limit of the photographic latitude of negative film is determined by the density of the veil, and the upper limit is determined by the maximum achievable optical density of the photographic layer; for reversible films it is the other way around. Both in the shadows and in the highlights, smooth bends in the characteristic curve are observed, indicating a drop in contrast when approaching the boundaries of the dynamic range, because the slope of the curve is proportional to the contrast of the image. Thus, the exposure zones lying in the middle part of the graph have maximum contrast, while in the highlights and shadows the contrast is reduced. In practice, the difference between film and a digital matrix is ​​especially noticeable in the highlights: where in a digital image the highlights are burned out by clipping, on film the details are still visible, although low in contrast, and the transition to pure white looks smooth and natural.

In sensitometry, even two independent terms are used: actually photographic latitude, limited by a relatively linear portion of the characteristic curve, and useful photographic latitude, which, in addition to the linear section, also includes the base and shoulder of the chart.

It is noteworthy that when processing digital photographs, as a rule, a more or less pronounced S-shaped curve is applied to them, increasing the contrast in midtones at the cost of reducing it in shadows and highlights, which gives the digital image a more natural and pleasing to the eye view.

Bit depth

Unlike the matrix of a digital camera, human vision is characterized by, let's say, a logarithmic view of the world. Successive doublings of the amount of light are perceived by us as equal changes in brightness. Light numbers can even be compared to musical octaves, because double changes in sound frequency are perceived by ear as a single musical interval. Other senses work on this principle. Nonlinearity of perception greatly expands the range of human sensitivity to stimuli of varying intensity.

When converting a RAW file (it doesn’t matter - using the camera or in a RAW converter) containing linear data, the so-called. gamma curve, which is designed to non-linearly increase the brightness of a digital image, bringing it into line with the characteristics of human vision.

With linear conversion, the image is too dark.

After gamma correction, the brightness returns to normal.

The gamma curve stretches dark tones and compresses light ones, making the distribution of gradations more uniform. The result is a natural-looking image, but noise and sampling artifacts in the shadows inevitably become more noticeable, which is only exacerbated by the small number of brightness levels in the lower zones.

Linear distribution of brightness gradations.
Uniform distribution after applying the gamma curve.

ISO and dynamic range

Despite the fact that digital photography uses the same concept of photosensitivity of photographic material as in film photography, it should be understood that this happens solely due to tradition, since approaches to changing photosensitivity in digital and film photography differ fundamentally.

Increasing ISO sensitivity in traditional photography means replacing one film with another with coarser grain, i.e. There is an objective change in the properties of the photographic material itself. In a digital camera, the light sensitivity of the sensor is strictly determined by its physical characteristics and cannot be changed in the literal sense. When increasing ISO, the camera does not change the actual sensitivity of the sensor, but only amplifies the electrical signal generated by the sensor in response to irradiation and adjusts the digitization algorithm for this signal accordingly.

An important consequence of this is that the effective dynamic range decreases in proportion to the increase in ISO, because along with the useful signal, noise also increases. If at ISO 100 the entire range of signal values ​​is digitized - from zero to the saturation point, then at ISO 200 only half the capacity of the photodiodes is taken as the maximum. With each doubling of ISO sensitivity, the top step of the dynamic range is cut off, and the remaining steps are pulled into its place. This is why using ultra-high ISO values ​​makes no practical sense. With the same success, you can lighten the photo in a RAW converter and get a comparable noise level. The difference between increasing the ISO and artificially brightening the image is that when increasing the ISO, the signal is amplified before it enters the ADC, which means that quantization noise is not amplified, unlike the sensor’s own noise, while in a RAW converter it is subject to amplification including ADC errors. In addition, reducing the sampling range means more accurate sampling of the remaining input signal values.

By the way, lowering the ISO below the base value (for example, to ISO 50), available on some devices, does not at all expand the dynamic range, but simply attenuates the signal by half, which is equivalent to darkening the image in the RAW converter. This function can even be considered harmful, since using a subminimum ISO value provokes the camera to increase the exposure, which, while the sensor saturation threshold remains unchanged, increases the risk of clipping in the highlights.

True Dynamic Range

There are a number of programs like (DxO Analyzer, Imatest, RawDigger, etc.) that allow you to measure the dynamic range of a digital camera at home. In principle, this is not very necessary, since data for most cameras can be freely found on the Internet, for example, on the website DxOMark.com.

Should we believe the results of such tests? Quite. With the only caveat that all these tests determine the effective or, so to speak, technical dynamic range, i.e. the relationship between the saturation level and the noise level of the matrix. For a photographer, the most important thing is the useful dynamic range, i.e. the number of exposure zones that really allow you to capture some useful information.

As you remember, the dynamic range threshold is set by the noise level of the photosensor. The problem is that in practice the lower zones, which are technically already included in the dynamic range, still contain too much noise to be usefully used. Here a lot depends on individual disgust - everyone determines the acceptable noise level for themselves.

My subjective opinion is that details in the shadows begin to look more or less decent when the signal-to-noise ratio is at least eight. On this basis, I define useful dynamic range as technical dynamic range minus about three stops.

For example, if a DSLR camera, according to reliable tests, has a dynamic range of 13 EV, which is very good by today's standards, then its useful dynamic range will be about 10 EV, which, in general, is also quite good. Of course, we are talking about shooting in RAW, with minimum ISO and maximum bit depth. When shooting JPEG, dynamic range is highly dependent on contrast settings, but on average you should give up another two or three stops.

For comparison: color reversal films have a useful photographic latitude of 5-6 stops; black and white negative films give 9-10 stops with standard developing and printing procedures, and with certain manipulations - up to 16-18 stops.

To summarize the above, let's try to formulate a few simple rules, the observance of which will help you squeeze maximum performance out of your camera's sensor:

  • The dynamic range of a digital camera is only fully accessible when shooting in RAW.
  • Dynamic range decreases as light sensitivity increases, so avoid high ISO settings unless absolutely necessary.
  • Using a higher bit depth for RAW files does not increase true dynamic range, but it does improve tonal separation in the shadows due to more brightness levels.
  • Exposure to the right. The upper exposure zones always contain the maximum useful information with a minimum of noise and should be used most efficiently. At the same time, do not forget about the danger of clipping - pixels that have reached saturation are absolutely useless.

And most importantly: don't worry too much about the dynamic range of your camera. Its dynamic range is fine. Your ability to see light and manage exposure correctly is much more important. A good photographer will not complain about the lack of photographic latitude, but will try to wait for more comfortable lighting, or change the angle, or use the flash, in a word, will act in accordance with the circumstances. I'll tell you more: some scenes only benefit from the fact that they do not fit into the dynamic range of the camera. Often an unnecessary abundance of details simply needs to be hidden in a semi-abstract black silhouette, which makes the photo both more laconic and richer.

High contrast is not always a bad thing – you just need to know how to work with it. Learn to exploit the shortcomings of the equipment as well as its advantages, and you will be surprised how much your creative possibilities will expand.

Thank you for your attention!

Vasily A.

Post scriptum

If you found the article useful and informative, you can kindly support the project by making a contribution to its development. If you didn’t like the article, but you have thoughts on how to make it better, your criticism will be accepted with no less gratitude.

Please remember that this article is subject to copyright. Reprinting and quoting are permissible provided there is a valid link to the source, and the text used must not be distorted or modified in any way.

Compression is one of the most myth-ridden topics in sound production. They say that Beethoven even scared the neighbor's children with her:(

Okay, in fact, using compression is no more difficult than using distortion, the main thing is to understand the principle of its operation and have good control. This is what we will see together now.

What is audio compression

The first thing to understand before preparation is compression. working with the dynamic range of sound. And, in turn, is nothing more than the difference between the loudest and quietest signal levels:

So, compression is compression of the dynamic range. Yes, Just dynamic range compression, or in other words lowering the level of loud parts of the signal and increasing the volume of quiet parts. No more.

You may quite reasonably wonder why such hype is connected then? Why does everyone talk about recipes for correct compressor settings, but no one shares them? Why, despite the huge number of cool plugins, do many studios still use expensive, rare models of compressors? Why do some producers use compressors at extreme settings, while others do not use them at all? And which one of them is right in the end?

Problems solved by compression

The answers to such questions lie in the plane of understanding the role of compression in working with sound. And it allows:

  1. Emphasize the attack sound, making it more pronounced;
  2. “Setting” individual parts of instruments into the mix, adding power and “weight” to them;
  3. Make groups of instruments or an entire mix more cohesive, such a single monolith;
  4. Resolve conflicts between tools using sidechain;
  5. Correct the mistakes of the vocalist or musicians, leveling their dynamics;
  6. At certain setting act as an artistic effect.

As you can see, this is no less significant creative process than, say, coming up with melodies or creating interesting timbres. Moreover, any of the above problems can be solved using 4 main parameters.

Basic parameters of the compressor

Despite the huge number of software and hardware models of compressors, all the “magic” of compression occurs when correct setting main parameters: Threshold, Ratio, Attack and Release. Let's look at them in more detail:

Threshold or response threshold, dB

This parameter allows you to set the value from which the compressor will work (that is, compress the audio signal). So, if we set the threshold to -12dB, the compressor will only work in those parts of the dynamic range that exceed this value. If all our sound is quieter than -12db, the compressor will simply pass it through without affecting it in any way.

Ratio or compression ratio

The ratio parameter determines how much a signal exceeding the threshold will be compressed. A little math to complete the picture: let's say we set up a compressor with a threshold of -12dB, ratio 2:1 and fed it a drum loop in which the volume of the kick drum is -4dB. What will be the result of the compressor operation in this case?

In our case, the kick level exceeds the threshold by 8dB. This difference according to the ratio will be compressed to 4dB (8dB / 2). Combined with the unprocessed part of the signal, this will lead to the fact that after processing by a compressor, the volume of the kick drum will be -8db (threshold -12dB + compressed signal 4dB).

Attack, ms

This is the time after which the compressor will respond to exceeding the response threshold. That is, if the attack time is above 0ms - the compressor begins compression exceeding the threshold signal not immediately, but after a specified time.

Release or recovery, ms

The opposite of an attack - the value of this parameter allows you to specify how long after the signal level returns below the threshold the compressor will stop compressing.

Before we move on, I strongly recommend taking a well-known sample, placing any compressor on its channel and experimenting with the above parameters for 5-10 minutes to securely fix the material

All other parameters are optional. They can differ between different compressor models, which is partly why producers use different models for specific purposes (for example, one compressor for vocals, another for a drum group, a third for the master channel). I will not dwell on these parameters in detail, but will only give general information To understand what this is all about:

  • Knee or kink (Hard/Soft Knee). This parameter determines how quickly the compression ratio (ratio) will be applied: hard along a curve or smoothly. I note that in the Soft Knee mode the compressor does not operate linearly, but begins to smoothly (as far as this may be appropriate when we are talking about milliseconds) compress the sound already before the threshold value. To process groups of channels and the overall mix, soft knee is often used (as it works unnoticed), and to emphasize the attack and other features of individual instruments, hard knee is used;
  • Response Mode: Peak/RMS. The Peak mode is justified when you need to strictly limit amplitude bursts, as well as on signals with a complex shape, the dynamics and readability of which need to be fully conveyed. The RMS mode is very gentle on the sound, allowing you to thicken it while maintaining attack;
  • Foresight (Lookahead). This is the time during which the compressor will know what is coming to it. A kind of preliminary analysis of incoming signals;
  • Makeup or Gain. A parameter that allows you to compensate for the decrease in volume as a result of compression.

First and the most important advice, which eliminates all further questions about compression: if you a) understand the principle of compression, b) firmly know how this or that parameter affects the sound, and c) managed to try several in practice different modelsyou don't need any advice anymore.

I'm absolutely serious. If you carefully read this post, experimented with the standard compressor of your DAW and one or two plug-ins, but still did not understand in what cases you need to set large attack values, what ratio to use and in which mode to process the source signal - then you will continue to search the Internet for ready-made recipes, applying them thoughtlessly anywhere.

Compressor fine tuning recipes it's kind of like recipes for fine-tuning a reverb or chorus - it makes no sense and has nothing to do with creativity. Therefore, I persistently repeat the only correct recipe: arm yourself with this article, good monitor headphones, a plug-in for visual control of the waveform, and spend the evening in the company of a couple of compressors.

Take action!

This group of methods is based on the fact that transmitted signals undergo nonlinear amplitude transformations, and in the transmitting and receiving parts the nonlinearities are reciprocal. For example, if the nonlinear function Öu is used in the transmitter, u 2 is used in the receiver. Consistent application of reciprocal functions will ensure that the overall transformation remains linear.

The idea of ​​nonlinear data compression methods is that the transmitter can transmit a larger range of changes with the same amplitude of output signals passed parameter(i.e., greater dynamic range). Dynamic range- this is the ratio of the largest permissible signal amplitude to the smallest, expressed in relative units or decibels:

; (2.17)
. (2.18)

The natural desire to increase the dynamic range by decreasing U min is limited by the sensitivity of the equipment and the increasing influence of interference and self-noise.

Most often, dynamic range compression is carried out using a pair of reciprocal functions of logarithm and potentiation. The first operation of changing the amplitude is called compression(by compression), the second - expansion(stretching). The choice of these particular functions is associated with their greatest compression capabilities.

At the same time, these methods also have disadvantages. The first of these is that the logarithm of a small number is negative and in the limit:

that is, the sensitivity is very nonlinear.

To reduce these shortcomings, both functions are modified by displacement and approximation. For example, for telephone channels the approximated function has the form (type A):

with A=87.6. The gain from compression is 24 dB.

Data compression using nonlinear procedures is implemented by analog means with large errors. The use of digital tools can significantly improve the accuracy or speed of conversion. At the same time, the direct use of computer technology (that is, direct calculation of logarithms and exponents) will not give the best result due to low performance and accumulating calculation errors.

Due to accuracy limitations, data compression by compression is used in non-critical cases, for example, for transmitting speech over telephone and radio channels.

Efficient Coding

Efficient codes were proposed by K. Shannon, Fano and Huffman. The essence of codes is that they are uneven, that is, with an unequal number of bits, and the length of the code is inversely proportional to the probability of its occurrence. Another great feature of efficient codes is that they do not require delimiters, that is, special characters that separate adjacent code combinations. This is achieved by following a simple rule: shorter codes are not the beginning of longer ones. In this case, the continuous stream of bits is uniquely decoded because the decoder detects the shorter codewords first. Effective codes for a long time were purely academic, but have recently been successfully used in the formation of databases, as well as in compressing information in modern modems and software archivers.

Due to unevenness, the average code length is introduced. Average length - mathematical expectation of the code length:

moreover, l av tends to H(x) from above (that is, l av > H(x)).

The fulfillment of condition (2.23) becomes stronger as N increases.

There are two types of efficient codes: Shannon-Fano and Huffman. Let's look at how to obtain them using an example. Let's assume that the probabilities of the symbols in the sequence have the values ​​given in Table 2.1.

Table 2.1.

Symbol probabilities

N
p i 0.1 0.2 0.1 0.3 0.05 0.15 0.03 0.02 0.05

Symbols are ranked, that is, presented in a row in descending order of probabilities. After this, using the Shannon-Fano method, the following procedure is periodically repeated: the entire group of events is divided into two subgroups with the same (or approximately the same) total probabilities. The procedure continues until one element remains in the next subgroup, after which this element is eliminated, and the specified actions continue with the remaining ones. This happens until there is only one element left in the last two subgroups. Let's continue with our example, which is summarized in Table 2.2.

Table 2.2.

Shannon-Fano coding

N P i
4 0.3 I
0.2 I II
6 0.15 I I
0.1 II
1 0.1 I I
9 0.05 II II
5 0.05 II I
7 0.03 II II I
8 0.02 II

As can be seen from Table 2.2, the first symbol with probability p 4 = 0.3 participated in two procedures for dividing into groups and both times ended up in group number I. In accordance with this, it is encoded with a two-digit code II. The second element at the first stage of partition belonged to group I, at the second - to group II. Therefore, its code is 10. The codes of the remaining symbols do not need additional comments.

Typically, non-uniform codes are depicted as code trees. A code tree is a graph indicating allowed code combinations. The directions of the edges of this graph are pre-set, as shown in Fig. 2.11 (the choice of directions is arbitrary).

They navigate the graph as follows: create a route for the selected symbol; the number of bits for it is equal to the number of edges in the route, and the value of each bit is equal to the direction of the corresponding edge. The route is drawn up from the starting point (in the drawing it is marked with the letter A). For example, the route to vertex 5 consists of five edges, all but the last of which have direction 0; we get code 00001.

Let's calculate the entropy and average word length for this example.

H(x) = -(0.3 log 0.3 + 0.2 log 0.2 + 2 0.1 log 0.1+ 2 0.05 log 0.05+

0.03 log 0.03 + 0.02 log 0.02) = 2.23 bits

l avg = 0.3 2 + 0.2 2 + 0.15 3 + 0.1 3 + 0.1 4 + 0.05 5 +0.05 4+

0.03 6 + 0.02 6 = 2.9 .

As you can see, the average word length is close to entropy.

Huffman codes are constructed using a different algorithm. The coding procedure consists of two stages. At the first stage, single compressions of the alphabet are carried out sequentially. One-time compression - replacing the last two symbols (with the lowest probabilities) with one, with a total probability. Compressions are carried out until two characters remain. At the same time, a coding table is filled in, in which the resulting probabilities are entered, and the routes along which new symbols move at the next stage are depicted.

At the second stage, the actual encoding occurs, which begins from the last stage: the first of the two symbols is assigned code 1, the second - 0. After this, they move on to the previous stage. The codes from the subsequent stage are assigned to the symbols that did not participate in compression at this stage, and the code of the symbol obtained after gluing is twice assigned to the last two symbols and added to the code of the upper character 1, the lower one - 0. If the character is not further in gluing participates, its code remains unchanged. The procedure continues until the end (that is, until the first stage).

Table 2.3 shows Huffman coding. As can be seen from the table, coding was carried out in 7 stages. On the left are the symbol probabilities, on the right are the intermediate codes. The arrows show the movements of the newly formed symbols. At each stage, the last two symbols differ only in the least significant bit, which corresponds to the encoding technique. Let's calculate the average word length:

l avg = 0.3 2 + 0.2 2 + 0.15 3 ++ 2 0.1 3 + +0.05 4 + 0.05 5 + 0.03 6 + 0.02 6 = 2.7

This is even closer to entropy: the code is even more efficient. In Fig. Figure 2.12 shows the Huffman code tree.

Table 2.3.

Huffman coding

N p i code I II III IV V VI VII
0.3 0.3 11 0.3 11 0.3 11 0.3 11 0.3 11 0.4 0 0.6 1
0.2 0.2 01 0.2 01 0.2 01 0.2 01 0.3 10 0.3 11 0.4 0
0.15 0.15 101 0.15 101 0.15 101 0.2 00 0.2 01 0.3 10
0.1 0.1 001 0.1 001 0.15 100 0.15 101 0.2 00
0.1 0.1 000 0.1 000 0.1 001 0.15 100
0.05 0.05 1000 0.1 1001 0.1 000
0.05 0.05 10011 0.05 1000
0.03 0.05 10010
0.02

Both codes satisfy the requirement of unambiguous decoding: as can be seen from the tables, shorter combinations are not the beginning of longer codes.

As the number of characters increases, the efficiency of the codes increases, so in some cases larger blocks are encoded (for example, if we are talking about texts, some of the most frequently occurring syllables, words and even phrases can be encoded).

The effect of introducing such codes is determined by comparing them with a uniform code:

(2.24)

where n is the number of bits of the uniform code that is replaced by the effective one.

Modifications of Huffman codes

The classic Huffman algorithm is a two-pass algorithm, i.e. requires first collecting statistics on symbols and messages, and then the procedures described above. This is inconvenient in practice because it increases the time it takes to process messages and accumulate a dictionary. More often, one-pass methods are used, in which accumulation and encoding procedures are combined. Such methods are also called adaptive compression according to Huffman [46].

The essence of adaptive compression according to Huffman comes down to the construction of an initial code tree and its sequential modification after the arrival of each next symbol. As before, the trees here are binary, i.e. At most two arcs emanate from each vertex of the tree graph. It is customary to call the original vertex the parent, and the two subsequent vertices connected to it as children. Let's introduce the concept of vertex weight - this is the number of characters (words) corresponding to a given vertex, obtained when feeding the original sequence. Obviously, the sum of the children's weights is equal to the weight of the parent.

After introducing the next symbol of the input sequence, the code tree is revised: the weights of the vertices are recalculated and, if necessary, the vertices are rearranged. The rule for permuting vertices is as follows: the weights of the lower vertices are the smallest, and the vertices located on the left of the graph have the least weights.

At the same time, the vertices are numbered. The numbering starts from the lower (hanging, i.e., having no children) vertices from left to right, then moves to top level etc. before numbering the last, original vertex. In this case, the following result is achieved: the smaller the weight of a vertex, the lower its number.

The permutation is carried out mainly for hanging vertices. When permuting, the rule formulated above must be taken into account: vertices with greater weight have a higher number.

After passing the sequence (it is also called control or test), all hanging vertices are assigned code combinations. The rule for assigning codes is similar to the above: the number of bits of the code is equal to the number of vertices through which the route passes from the source to the given hanging vertex, and the value of a particular bit corresponds to the direction from the parent to the “child” (say, going to the left of the parent corresponds to the value 1, to the right - 0 ).

The resulting code combinations are stored in the memory of the compression device along with their analogues and form a dictionary. The use of the algorithm is as follows. The compressed sequence of characters is divided into fragments in accordance with the existing dictionary, after which each of the fragments is replaced with its code from the dictionary. Fragments not found in the dictionary form new hanging vertices, acquire weight and are also entered into the dictionary. In this way, an adaptive algorithm for replenishing the dictionary is formed.

To increase the efficiency of the method, it is desirable to increase the size of the dictionary; in this case the compression ratio increases. In practice, the size of the dictionary is 4 - 16 KB of memory.


Let us illustrate the given algorithm with an example. In Fig. Figure 2.13 shows the original diagram (it is also called the Huffman tree). Each vertex of the tree is shown by a rectangle in which two numbers are inscribed through a fraction: the first means the number of the vertex, the second means its weight. As you can see, the correspondence between the weights of the vertices and their numbers is satisfied.

Let us now assume that the symbol corresponding to vertex 1 appears a second time in the test sequence. The weight of the vertex has changed as shown in Fig. 2.14, as a result of which the rule for numbering vertices is violated. At the next stage, we change the location of the hanging vertices, for which we swap vertices 1 and 4 and renumber all the vertices of the tree. The resulting graph is shown in Fig. 2.15. The procedure then continues in the same way.

It should be remembered that each hanging vertex in the Huffman tree corresponds to a specific symbol or group of symbols. The parent differs from the children in that the group of symbols corresponding to it is one symbol shorter than that of its children, and these children are different last character. For example, the symbols "car" correspond to the parent; then children may have the sequences "kara" and "karp".

The given algorithm is not academic and is actively used in archiving programs, including when compressing graphic data (they will be discussed below).

Lempel–Ziv algorithms

These are the most commonly used compression algorithms today. They are used in most archiving programs (for example, PKZIP. ARJ, LHA). The essence of the algorithms is that a certain set of symbols is replaced during archiving by its number in a specially generated dictionary. For example, often found in business correspondence the phrase “The outgoing number for your letter...” can occupy position 121 in the dictionary; then, instead of transmitting or storing the mentioned phrase (30 bytes), you can store the phrase number (1.5 bytes in binary decimal form or 1 byte in binary).

The algorithms are named after the authors who first proposed them in 1977. Of these, the first is LZ77. For archiving, a so-called message sliding window is created, consisting of two parts. The first part, a larger format, serves to form a dictionary and has a size of about several kilobytes. The second, smaller part (usually up to 100 bytes in size) accepts the current characters of the text being viewed. The algorithm tries to find a set of characters in the dictionary that matches those received in the viewing window. If this is successful, a code is generated consisting of three parts: the offset in the dictionary relative to its initial substring, the length of this substring, and the character following this substring. For example, the selected substring consists of the characters "app" (6 characters in total), the next character is "e". Then, if the substring has an address (place in the dictionary) 45, then the entry in the dictionary looks like “45, 6. e”. After this, the contents of the window are shifted by position, and the search continues. This is how a dictionary is formed.

The advantage of the algorithm is an easily formalized algorithm for compiling a dictionary. In addition, it is possible to unzip without the original dictionary (it is advisable to have a test sequence) - the dictionary is formed during unzipping.

The disadvantages of the algorithm appear as the size of the dictionary increases - the search time increases. In addition, if a string of characters appears in the current window that is not in the dictionary, each character is written with a three-element code, i.e. The result is not compression, but stretching.

Best Features has the LZSS algorithm proposed in 1978. It has differences in sliding window support and compressor output codes. In addition to the window, the algorithm generates a binary tree similar to a Huffman tree to speed up the search for matches: each substring leaving the current window is added to the tree as one of the children. This algorithm allows you to further increase the size of the current window (it is desirable that its size is equal to a power of two: 128, 256, etc. bytes). Sequence codes are also formed differently: an additional 1-bit prefix is ​​introduced to distinguish uncoded characters from “offset, length” pairs.

An even greater degree of compression is obtained when using algorithms like LZW. The previously described algorithms have a fixed window size, which makes it impossible to enter phrases longer than the window size into the dictionary. In the LZW algorithms (and their predecessor LZ78), the viewing window has an unlimited size, and the dictionary accumulates phrases (and not a collection of characters, as before). The dictionary has an unlimited length, and the encoder (decoder) operates in phrase waiting mode. When a phrase that matches the dictionary is formed, a match code is issued (i.e., the code of this phrase in the dictionary) and the code of the character following it. If, as symbols accumulate, a new phrase is formed, it is also entered into the dictionary, like the shorter one. The result is a recursive procedure that provides fast encoding and decoding.

Additional opportunity compression provides compressed encoding of repeated characters. If in a sequence some characters follow in a row (for example, in the text these can be “space” characters, in a number sequence - consecutive zeros, etc.), then it makes sense to replace them with the pair “character; length” or “sign, length” ". In the first case, the code indicates the sign that the sequence will be encoded (usually 1 bit), then the code of the repeating character and the length of the sequence. In the second case (provided for the most frequently occurring repeating characters), the prefix simply indicates a repetition sign.