Data compression methods have a fairly long history of development, which began long before the advent of the first computer. This article will attempt to give a brief overview of the main theories, concepts of ideas and their implementations, without, however, claiming absolute completeness. More detailed information can be found, for example, in Krichevsky R.E. , Ryabko B.Ya. Witten I.H. , Rissanen J. , Huffman D.A. , Gallager R.G. , Knuth D.E. , Vitter J.S. and etc.
Information compression is a problem that has a fairly long history, much longer than the history of the development of computer technology, which (history) usually ran parallel to the history of the development of the problem of encoding and encrypting information. All compression algorithms operate on an input stream of information, the minimum unit of which is a bit, and the maximum unit is several bits, bytes or several bytes. The purpose of the compression process, as a rule, is to obtain a more compact output stream of information units from some initially non-compact input stream by means of some transformation. The main technical characteristics of compression processes and the results of their work are:
The degree of compression (compress rating) or the ratio (ratio) of the volumes of the original and resulting streams;
Compression rate - the time spent compressing a certain amount of information from an input stream before obtaining an equivalent output stream from it;
Compression quality is a value that shows how tightly the output stream is compressed by applying re-compression to it using the same or another algorithm.
There are several different approaches to the problem of information compression. Some have a very complex theoretical mathematical basis, others are based on the properties of the information flow and are algorithmically quite simple. Any approach and algorithm that implements data compression or compression is designed to reduce the volume of the output information stream in bits using its reversible or irreversible transformation. Therefore, first of all, according to the criterion related to the nature or format of the data, all compression methods can be divided into two categories: reversible and irreversible compression.
Irreversible compression means such a transformation of the input data stream in which the output stream, based on a certain information format, represents, from a certain point of view, an object quite similar in external characteristics to the input stream, but differs from it in volume. The degree of similarity between the input and output streams is determined by the degree of correspondence of certain properties of the object (i.e., compressed and uncompressed information, in accordance with some specific data format) represented by a given information stream. Such approaches and algorithms are used to compress, for example, data from raster graphics files with a low degree of byte repetition in the stream. This approach uses the structure property of the graphic file format and the ability to present a graphic image approximately similar in display quality (for perception by the human eye) in several (or rather n) ways. Therefore, in addition to the degree or magnitude of compression, the concept of quality arises in such algorithms, because Since the original image changes during the compression process, quality can be understood as the degree of correspondence between the original and resulting images, assessed subjectively based on the information format. For graphic files, this correspondence is determined visually, although there are also corresponding intelligent algorithms and programs. Irreversible compression cannot be used in areas where it is necessary to have an exact match between the information structure of the input and output streams. This approach is implemented in popular formats for presenting video and photo information, known as JPEG and JFIF algorithms and JPG and JIF file formats.
Reversible compression always leads to a reduction in the volume of the output information stream without changing its information content, i.e. - without loss of information structure. Moreover, from the output stream, using a reconstruction or decompression algorithm, the input can be obtained, and the recovery process is called decompression or decompression, and only after the decompression process is the data suitable for processing in accordance with its internal format.
In reversible algorithms, encoding as a process can be viewed from a statistical point of view, which is even more useful, not only for constructing compression algorithms, but also for assessing their effectiveness. For all reversible algorithms there is a concept of coding cost. Coding cost refers to the average length of a codeword in bits. Coding redundancy is equal to the difference between the cost and entropy of encoding, and a good compression algorithm should always minimize redundancy (remember that the entropy of information is the measure of its disorder). Shannon's fundamental theorem on information encoding says that “the cost of encoding is always no less than the entropy of the source, although it can be arbitrarily close to it.” Therefore, for any algorithm, there is always a certain limit on the degree of compression, determined by the entropy of the input stream.
Let us now move directly to the algorithmic features of reversible algorithms and consider the most important theoretical approaches to data compression associated with the implementation of encoding systems and methods of information compression.
Compression by series encoding method
The most well-known simple approach and algorithm for compressing information in a reversible way is Run Length Encoding (RLE). The essence of the methods in this approach is to replace chains or series of repeating bytes or their sequences with one coding byte and a counter for the number of their repetitions. The problem with all similar methods is only to determine the way in which the decompressing algorithm could distinguish an encoded series from other unencoded byte sequences in the resulting byte stream. The solution to the problem is usually achieved by placing marks at the beginning of the coded chains. Such marks can be, for example, characteristic bit values in the first byte of a coded series, the values of the first byte of a coded series, etc. These methods, as a rule, are quite effective for compressing raster graphics (BMP, PCX, TIF, GIF), because the latter contain quite a lot of long series of repeating byte sequences. The disadvantage of the RLE method is the rather low compression ratio or the cost of encoding files with a small number of series and, even worse, with a small number of repeating bytes in the series.
Compression without using the RLE method
The process of data compression without using the RLE method can be divided into two stages: modeling and, in fact, encoding. These processes and their implementing algorithms are quite independent and diverse.
Coding process and its methods
Coding usually means processing a stream of characters (in our case, bytes or nibbles) in some alphabet, and the frequencies of appearance of characters in the stream are different. The purpose of encoding is to convert this stream into a stream of bits of the minimum length, which is achieved by reducing the entropy of the input stream by taking into account symbol frequencies. The length of the code representing characters from the stream alphabet must be proportional to the amount of information in the input stream, and the length of the stream characters in bits may not be a multiple of 8 or even variable. If the probability distribution of frequencies of occurrence of symbols from the alphabet of the input stream is known, then an optimal coding model can be constructed. However, due to the existence of a huge number of different file formats, the task becomes much more complicated. The frequency distribution of data symbols is unknown in advance. In this case, in general, two approaches are used.
The first is to view the input stream and construct an encoding based on the collected statistics (this requires two passes through the file - one to view and collect statistical information, the second for encoding, which somewhat limits the scope of such algorithms, because, in this way, , eliminates the possibility of single-pass on-the-fly encoding used in telecommunication systems, where the volume of data is sometimes unknown, and its retransmission or parsing can take an unreasonably long time). In this case, the statistical scheme of the encoding used is written to the output stream. This method is known as static Huffman coding.
|
IntroductionWe use archivers all the time. On our website there is a detailed (albeit written long ago) description of the most popular archiver programs ( Archivists: A look from the outside), which we will not repeat here, but will deal only with the compression algorithms that are used in these programs. What's the problem here? Modern archivers give us the opportunity to choose from several compression algorithms. Here, for example, are the characteristics of some programs...
Formats supported by archivers
Archiver | Packing and Unpacking | Unboxing only |
---|---|---|
WinZip | ZIP | TAR, GZIP, BH, ARJ, LZH, ARC |
WinRar | RAR, ZIP | CAB, ARJ, LZH,TAR, GZ, ACE, UUE, BZ2, JAR, JSO |
WinAce | ACE, ZIP, LHA, MSCAB | RAR, ARC, ATJ, GZIP, TAR ZOO |
7-Zip | 7Z, ZIP, GZIP, TAR, BZIP2 | RAR, CAB, ARJ, CPIO, RPM, DEB, SPLIT |
Power Archiver | TAR, BH, CAB, LHA, ZIP | RAR, ACE, ARJ, GZIP, BZIP2, ARC, ZOO |
Depending on the circumstances, we use the archiver as a compressor, which is required to compress information for faster transmission over communication channels (mail and the Internet). In other cases, the archiving function itself is of greater importance, that is, converting information into a compact form (one file) in order to get rid of disassembly and, in addition, reduce the space occupied on disk due to the file table. Accordingly, the indicator of compression of the original information and the indicator of the speed of processing of the original information are of great interest. The purpose of our research is to determine the absolute and relative indicators of the degree of compression and performance of the algorithms (formats) that are made available to us by the archivers listed in the table...
The content of the study is planned as follows:
1. Creation of comprehensive and private (by file type) sets of information (folders) for testing (tests).
2. Conducting preliminary tests on a complex set and clarifying (based on the results) the plan for further local tests.
3. Processing and analysis of results with substantiation of recommendations for the practical application of different archiving algorithms (formats).
As an indicator of the degree of compression, the percentage ratio of the size of a compressed folder to its original size is taken, and as an indicator of performance, the processing speed is taken as the quotient of the original size in kilobytes divided by the processing time in seconds. Actually, measurements are performed only in relation to time (with a stopwatch). A time measurement error can distort the performance indicator when this indicator is very large (more than 1000 kb/sec). In other cases, the error can be ignored.
Definition of general characteristics of the main archival formats
For testing, we used material simulating a “custom basket” made up of files in DOC, HTM, JPG, MP3, PDF, TXT formats. In total, the basket contains 359 folders and 3337 files, and has a total size of 208893 KB (about 204 MB). The composition of this set is shown in the following table:Composition of a set of files for testing
Type | Number of folders | Number of files | Size, KB | On disk, KB |
---|---|---|---|---|
TXT | 0 | 2 | 34781 | 34783 |
HTM | 329 | 2869 | 30913 | 36962 |
DOC | 3 | 24 | 31443 | 31474 |
0 | 1 | 33691 | 33694 |
|
JPG | 26 | 430 | 40493 | 41382 |
MP3 | 1 | 11 | 37571 | 37589 |
|
||||
Total | 359 | 3337 | 208893 | 215884 |
Each test consisted of conducting an archiving cycle with recording the time the archiver worked from the moment the Add button was pressed until the window with the contents of the received archive file was opened.
Tested programs:
WinZip 8.1 SR-1
WinRar 3.30
WinAce 2.5
7Zip 3.13
Power Archiver 8.70 07b
System Configuration Information
Processor Intel Celeron 1700MHz
256 Mb (DDR SDRAM)
HDD ST360015A (60 Gb, 7200PRM)
Windows 2000 Pro, SP3
The test results are shown in the following tables:
Test results for ZIP format
Archiver / Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
|
||||
WinZip | |
|||
Without compression | 208893 | - | - | - |
Norm | 146408 | 2-00 | 70.0% | 1740 |
Maximum | 145884 | 2-45 | 69.8% | 1266 |
Fast | 147690 | 1-58 | 70.7% | 1770 |
Very fast | 149450 | 1-50 | 71.5% | 1899 |
|
||||
WinRar | |
|||
Usually | 146 078 | 2-22 | 69.9% | 1471 |
Maximum | 145881 | 3-07 | 69.8% | 1117 |
|
||||
WinAce | |
|||
Norm | 146 418 | 2-28 | 70.1% | 1411 |
Maximum | 145844 | 2-40 | 69.8% | 1305 |
|
||||
7-Zip | |
|||
Normal/Deflate | 145 480 | 3-22 | 69.6% | 1034 |
Ultra/Deflate | 145 341 | 5-55 | 69.6% | 588 |
Ultra/Deflate64 | 144924 | 6-10 | 69.4% | 565 |
|
||||
Power Archiver | |
|||
Norm | 146074 | 3-40 | 69.9% | 950 |
Maximum | 145948 | 3-42 | 69.9% | 941 |
In general, the compression obtained by the ZIP format is approximately the same order of magnitude, and depends little on the archiver - with the exception of the 7-ZIP archiver, in which, by changing the compression method, the indicator for the ZIP format can be slightly improved. The size of the dictionaries (WinRar and 7-ZIP archivers) was not changed specifically in this series of tests, but was set automatically (by default).
Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
Without compression | 208893 | - | - | - |
Store | 209129 | 0-58 | 100.1% | 3601 |
Fastest | 144017 | 6-00 | 68.9% | 580 |
Fast | 143281 | 6-22 | 68.6% | 547 |
Normal | 142830 | 6-40 | 68.4% | 522 |
Good | 139826 | 6-58 | 66.9% | 499 |
Best | 140023 | 7-25 | 67.0% | 469 |
Best (64kb) | 140685 | 5-40 | 67.3% | 614 |
In the mode settings, it is possible to change the dictionary size within the range of 64 - 4096 kilobytes. By default, the maximum size is set (4096 KB), with which the results in this table were obtained. Only in the Best line (64kb) the minimum size was set to 64 kilobytes. Obviously, the resulting change in compression and performance can serve as an analogue for all other rows of this table.
The Good and Best lines were tested and their values were fully confirmed, so an illogical transition between them cannot be considered a consequence of testing errors.
ACE format testing results
Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
Without compression | 208893 | - | - | - |
Normal | 132978 | 8-30 | 63.7% | 410 |
Maximum | 132918 | 8-42 | 63.6% | 400 |
Good | 132925 | 9-50 | 63.6% | 354 |
Fast | 133216 | 8-53 | 63.8% | 397 |
Super Fast | 133273 | 8-46 | 63.8% | 397 |
Store | 209136 | 1-48 | 100.1% | 1934 |
Changes in the operating mode of the WinAce archiver in our case have little effect on the compression performance - the spread is within tenths of a percent.
7z format testing results
Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
Without compression | 208893 | - | - | - |
Normal | 130964 | 9-24 | 64.2% | 362 |
Maximum | 130000 | 13-51 | 63.7% | 246 |
Fast | 141922 | 4-16 | 69.6% | 797 |
Ultra (1 MB) | 131392 | 8-47 | 64.4% | 387 |
Ultra (6 MB) | 130101 | 11-40 | 63.8% | 291 |
Ultra (12 MB) | 129871 | 12-47 | 63.7% | 266 |
Ultra (24 MB) | - | - | - | - |
Ultra (Deflate) | 141171 | 3-15 | 69.2% | 1046 |
Ultra (PPMd) | 140171 | 8-45 | 68.7% | 389 |
Ultra (Bzip2) | 135342 | 7-32 | 66.4% | 451 |
Note:
For the 7z format, the archiver allows you to install:
- Level (Fast, Normal, Maximum, Ultra),
- Method (LZMA, PPMd, Bzip2, Deflate),
- Dictionary size (32kb - 192 mb),
- Word size (8 - 255).
As you can see, a very large number of combinations of setting the archiver operating mode are possible, which can confuse the user. You can be guided by the following premises:
- The larger the dictionary size, the greater the compression and packaging time. The compression increases slowly, but the packing time increases very strongly.
- The same applies to word size.
- The optimal settings are set themselves (default settings), and you don’t have to change them unless necessary.
CAB Format Test Results
Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
Without compression | 208893 | - | - | - |
|
||||
PowerArchiver | |
|||
Medium | 140444 | 9-55 | 67.2% | 351 |
Maximum | 137152 | 15-55 | 65.6% | 219 |
|
||||
WinAce | |
|||
Norm | 144374 | 3-24 | 69.1% | 1024 |
Maximum | 138538 | 12-54 | 66.3% | 270 |
The CAB (cabinet file) format is based on the MS-Zip and LZX algorithms, supported and used by Microsoft. Format unpackers are available in Windows 98 and higher. The algorithm is open source and can be freely used by all programmers.
Test results for BH and LHA formats
Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
Without compression | 208893 | - | - | - |
|
||||
PowerArchiver, LHA format | |
|||
Norma | 147518 | 4-40 | 70.6% | 746 |
Maximum | 147518 | 4-47 | 70.6% | 728 |
|
||||
PowerArchiver, BH format | |
|||
Norma | 145912 | 2-16 | 69.8% | 1536 |
Maximum | 145718 | 2-34 | 69.8% | 1356 |
The indicators of the LHA and BH archive formats are at the level of the indicators of the ZIP archive format, and no advantages are visible.
In general, as you can see, the best compression performance is provided by the ACE and 7Z formats. The best performance indicators were shown by the ZIP and BH formats. Further tests are planned to be carried out according to the same principle, but with “baskets” of a homogeneous composition, with file formats: TXT, HTML, DOC, JPG, MP3, PDF.
Determining the compressibility of files of different formats
To ensure this series of tests, sets of completely uniform file formats were compiled, and duplicate files in the set were excluded. EXE and DLL files were taken from the Windows system folder without any selection. The fact is that EXE files are already compressed and further compression does not make sense. The characteristics of the sets are given in the following table:File formats in test sets
Format | Number of folders | Number of files | Total size, KB |
---|---|---|---|
TXT | 0 | 27 | 35096 |
HTM | 7 | 1371 | 25076 |
DOC | 1 | 33 | 37211 |
0 | 1 | 33691 |
|
JPG | 26 | 430 | 40493 |
MP3 | 2 | 11 | 37571 |
EXE | 0 | 316 | 32446 |
DLL | 0 | 184 | 40323 |
XLS | 6 | 15 | 17228 |
CHM | 0 | 69 | 33940 |
MPEG | 0 | 24 | 46606 |
WAV | 0 | 1 | 30804 |
BMP | 0 | 15 | 31713 |
AVI | 0 | 89 | 9261 |
During testing, only the normal (usual) mode of operation of the archiver was used. At the same time, each archive format was created by its own archiver (WinZip, WinRar, WinAce, 7-Zip); Power Archiver was used to pack it into the CAB format, which does not have its own (proprietary) format.
File compressibility depending on archive format
Format | ZIP | RAR | ACE | 7Z | CAB |
---|---|---|---|---|---|
TXT | 43.7% | 37.8% | 37.4% | 34.3% | 36.3% |
HTM | 29.2% | 28.3% | 9.09% | 7.75% | 15.0% |
DOC | 8.76% | 6.39% | 5.47% | 5.21% | 6.49% |
97.7% | 97.4% | 97.8% | 97.5% | 97.3% |
|
JPG | 98.5% | 98.5% | 85.0% | 85.1% | 97.9% |
MP3 | 98.1% | 97.9% | 98.1% | 97.9% | 97.7% |
EXE | 46.9% | 42.1% | 37.8% | 32.7% | 39.3% |
DLL | 45.6% | 39.6% | 37.6% | 34.3% | 39.6% |
XLS | 11.8% | 8.27% | 7.44% | 5.97% | 8.49% |
CHM | 98.6% | 98.8% | 99.0% | 99.6% | 98.6% |
MPEG | 95.3% | 94.7% | 94.8% | 94.5% | 94.4% |
AVI | 86.1% | 84.1% | 84.5% | 82.7% | 83.4% |
WAV | 92.2% | 62.8% | 62.6% | 87.0% | 92.1% |
BMP | 63.5% | 31.9% | 30.6% | 51.5% | 56.2% |
|
|||||
Average | 65.5% | 59.2% | 56.2% | 58.3% | 61.6% |
As a comment to the table, the following can be noted:
- The best compression for the main source file formats is provided by the 7z archive format.
- The best average figure is for the ACE archive format due to record compression of the WAV and BMP formats.
If we talk about the compressibility of source files, we can note the following: the compression rate depends on the source file format, sometimes implying internal data compression. If the file is pre-compressed according to its own algorithms, then its compressibility by the archiver is low. For example, a CHM file is a compressed version of an HTML file and, accordingly, their compressibility is different. We see the same in relation to Wav and MP3, BMP and JPG and so on.
Archiver operating speed, KB/s
Format | ZIP | RAR | ACE | 7Z | CAB |
---|---|---|---|---|---|
TXT | 2064 | 408 | 386 | 217 | 226 |
HTM | 2507 | 836 | 627 | 643 | 411 |
DOC | 7400 | 2862 | 1550 | 1378 | 886 |
2246 | 293 | 370 | 387 | 370 |
|
JPG | 2670 | 587 | 337 | 368 | 287 |
MP3 | 2348 | 458 | 368 | 335 | 332 |
EXE | 2318 | 773 | 601 | 416 | 433 |
DLL | 2016 | 858 | 672 | 474 | 434 |
XLS | 4300 | 1436 | 1148 | 507 | 224 |
CHM | 1886 | 556 | 365 | 357 | 323 |
MPEG | 2453 | 583 | 416 | 370 | 338 |
AVI | 1852 | 617 | 463 | 370 | 356 |
WAV | 2370 | 1711 | 1184 | 354 | 288 |
BMP | 2883 | 1269 | 933 | 401 | 373 |
|
|||||
Average | 2838 | 856 | 609 | 485 | 385 |
This table demonstrates an obvious rule - better compression almost always comes at the cost of packing speed.
Compressibility of different file formats. Addition
Format | ZIP | RAR | ACE | 7Z |
---|---|---|---|---|
VXD | 55.1% | 52.5% | 43.3% | 40.8% |
INF | 14.9% | 13.3% | 13.2% | 12.3% |
VBP | 78.3% | 72.6% | 26.0% | 18.5% |
GIF | 90.0% | 94.3% | 87.2% | 86.1% |
SCR | 88.8% | 88.0% | 88.1% | 87.9% |
DAT | 23.1% | 20.1% | 20.5% | 18.0% |
INI | 35.6% | 33.2% | 32.5% | 30.2% |
|
||||
Average | 55.1% | 53.4% | 44.4% | 42.0% |
This table provides additional data on the compressibility of file formats. Here testing was carried out without recording time on small sets (100-200 kb). As you can see, for all formats the best compression is provided by the 7z archive format.
Next, as an example, I will give the results of packaging a real distribution kit of the Norton Antivirus program. Packing was carried out in normal mode; additionally, self-extracting versions of the same archives were obtained. The result of this test is shown in the following table (the last column is the approximate time to load the packaged distribution over the network using a regular modem connection at a speed of 2.7 KB per second):
Archive format | Size, KB | Time | Compression | Loading time, hours-min. |
---|---|---|---|---|
Without compression | 47410 | - | - | 4-53 |
ZIP | 29045 | 0-21 | 61.3% | 2-59 |
RAR | 26619 | 1-15 | 56.1% | 2-44 |
ACE | 23838 | 1-30 | 50.3% | 2-27 |
7Z | 22871 | 1-50 | 48.2% | 2-21 |
CAB | 26804 | 2-22 | 56.5% | 2-45 |
EXE (RAR) | 26671 | 1-15 | 56.3% | 2-45 |
EXE (ACE) | 23903 | 1-30 | 50.4% | 2-28 |
EXE (7Z) | 22941 | 1-52 | 48.4% | 2-22 |
The table results clearly demonstrate that:
When transferring files over the network, packaging is almost mandatory.
Packaging with good compression can reduce file transfer time, in our case by half an hour.
The use of promising formats ACE and 7Z is quite justified now in the form of self-extracting archives. It is advisable for distributors of software products to take this circumstance into account on the Internet.
The 7-ZIP archiver is a good program with a high compression ratio and has the necessary minimum of user convenience. In particular, you can delete and view individual files without unpacking the archive. At the same time, files are opened by associated applications of the system. You can supplement the archive with separate files.
Conclusion
Archive programs remain an indispensable tool for packaging and compressing digital information. The processed information significantly saves storage space and transmission time over communication channels in the network. The most popular and used packaging formats today are ZIP and RAR. Other formats, for example, ARJ, ICE, PAC, ARC and some others, were gradually replaced and forgotten. But packaging technology does not stand still. Archivers are in demand, so programmers are constantly searching for more efficient compression methods. This is evidenced by the results of our experiment. In reality, there are at least two archive formats (ACE and 7z), which are significantly superior in compression to the usual ZIP and RAR. The use of these formats will significantly reduce the time it takes to transfer files over the Internet, which meets the interests of many users...Update dated May 24, 2004
In this section we will look at the impact of the Solid option on the performance of archivers. Let us remind you that packaging with the Solid option results in the fact that a file cannot be added to the archive and a separate file cannot be extracted from it; the archive is packed and unpacked only as a whole. In general, this can cause some inconvenience when using such archives. But sometimes such inconveniences may be of secondary importance compared to the advantages.Additional testing was done exactly as described in the main section on the same sets of material. Taking into account additional testing, the table "RAR format testing results" of the main text began to look like this...
RAR format testing results
Mode | Size, KB | Time, min.-sec. | Compression | Speed, KB/s |
---|---|---|---|---|
Without compression | 208893 | - | - | - |
Store | 209129 | 0-58 | 100.1% | 3601 |
Fastest | 144017 | 6-00 | 68.9% | 580 |
Fast | 143281 | 6-22 | 68.6% | 547 |
Normal | 142830 | 6-40 | 68.4% | 522 |
Normal (Solid) | 131664 | 9-14 | 63.0% | 377 |
Good | 139826 | 6-58 | 66.9% | 499 |
Good (Solid) | 129314 | 8-24 | 61.9% | 414 |
Best | 140023 | 7-25 | 67.0% | 469 |
Best (Solid) | 129527 | 8-36 | 62.0% | 405 |
Best (64kb) | 140685 | 5-40 | 67.3% | 614 |
Setting up the WinRar archiver includes:
1. Selecting a compression method (Normal, Store, Fastest, Fast, Good, Best).
2. Choice of modification:
- Add and replace files,
- Add and update files,
- Fresh existing files only,
- Syncronize axchive contents.
3. Select option:
- Deleting files after archiving,
- Create SFX archive,
- Create solid archive,
- Put autohenlicity verification,
- Put recovery record,
- Test archived files,
- Lock archive.
It is easy to see that there are more than a hundred combinations of settings that determine the operating mode of the archiver. Accordingly, the range of results for this format and this archiver turned out to be quite large - compression ratio: 61.9 - 68.9%, speed: 377 - 614 KB/sec.
The WinAce archiver also has the Solid option. But in this archiver the option (Make solid archive) is always enabled (by default) and therefore included in the test results. Thus, injustice was committed only for the RAR format and the WinRar archiver.
Taking into account the new circumstances, the leaderboard for compression ratio looks like this:
1. RAR (Good, Solid) - 61.9%.
2. 7-Zip (Maximum) - 62.2%.
3. ACE (Good) - 63.6%.
The updated table of packaging results for a real Norton Antivirus distribution package ("Example of Norton Antivirus distribution package packaging") now looks like this...
Norton Antivirus distribution packaging example
Archive format | Size, KB | Time | Compression | Loading time, hours-min. |
---|---|---|---|---|
Without compression | 47410 | - | - | 4-53 |
ZIP | 29045 | 0-21 | 61.3% | 2-59 |
RAR | 26619 | 1-15 | 56.1% | 2-44 |
RAR (Normal, Solid) | 22745 | 1-21 | 48.0% | 2-20 |
RAR (Good, Solid) | 22680 | 1-28 | 47.8% | 2-20 |
ACE | 23838 | 1-30 | 50.3% | 2-27 |
7Z | 22871 | 1-50 | 48.2% | 2-21 |
CAB | 26804 | 2-22 | 56.5% | 2-45 |
EXE (RAR) | 26671 | 1-15 | 56.3% | 2-45 |
EXE (RAR, Normal, Solid) | 22797 | 1-29 | 48.1% | 2-21 |
EXE (ACE) | 23903 | 1-30 | 50.4% | 2-28 |
EXE (7Z) | 22941 | 1-52 | 48.4% | 2-22 |
The results of this table also confirm that the WinRar archiver can provide maximum compression, and is a leader in this indicator. Compared to the ZIP format, downloading the same distribution in RAR format can be done 39 minutes shorter...
In the table with the results of testing the 7z format, our reader Alexander Rykhlov discovered an error in calculating the compression ratio. Thank you very much to Alexander, and the corrected table “7z format testing results” now looks like this...
Note: in Ultra mode (LZMA) when setting the Dictionary size to 24 megabytes, the speed decreased so much that the test became impossible.
Conclusion
The brewing sensation that the WinRar archiver was not as good as many users believed did not materialize. Our testing has confirmed that the technical characteristics of this archiver are indeed the highest today. The 7-Zip archiver has very similar indicators, but in terms of the degree of development and user qualities, the latter is still somewhat inferior to the leader. To obtain maximum compression in the WinRar archiver, you must enable the Solid option (it is disabled by default), other settings (Normal, Good, etc.) have a lower value.
General information about archiving files
Process conceptarchiving files One of the most widely used types of service programs is archiving programs, intended for archiving, packaging files by compressing the information stored in them. Information compression - this is the process of converting information stored in a file to a form in which redundancy in its representation is reduced and, accordingly, less memory is required for storage. Compression of information in files is carried out by eliminating redundancy in various ways, for example, by simplifying codes, eliminating constants from them bits or representing repeating symbols or a repeating sequence of symbols in terms of repetition rate and corresponding symbols. Various algorithms for such information compression are used. Either one or several files can be compressed, which are placed in a compressed form in a so-called archive file or archive. Archive file- this is a specially organized file containing one or more files in compressed or uncompressed form and service information about file names, the date and time of their creation or modification, sizes, etc. The purpose of packing files is usually to ensure a more compact placement of information on disk, reducing the time and, accordingly, the cost of transmitting information via communication channels in computer networks. In addition, packaging a group of files into one archive file significantly simplifies their transfer from one computer to another, reduces the time of copying files to disks, allows you to protect information from unauthorized access, and helps protect against infection by computer viruses. File compression level characterized by the coefficient Ks, defined as the ratio of the compressed file volume Vc to the volume of the source file Vo, expressed as a percentage: Kc=(Vc/Vo)*100% The compression ratio depends on the program used, compression method and source file type. The most well-compressed files are graphic images, text files and data files, for which the compression ratio can reach 5 - 40%; files of executable programs and load modules are compressed less - 60 - 90%. Archive files are almost not compressed. Archiving programs differ in the compression methods they use, which consequently affects the compression ratio. Archiving (packaging)- placing (downloading) source files into an archive file in compressed or uncompressed form. Unzipping (unpacking) - the process of restoring files from an archive exactly as they were before they were loaded into the archive. When unpacking, files are extracted from the archive and placed on disk or in RAM; Programs that pack and unpack files are called archiving programs Large archive files can be placed on several disks (volumes). Such archives are called multi-volume. A volume is an integral part of a multi-volume archive. When creating an archive from several parts, you can write its parts onto several floppy disks. Main types of archiver programs Currently, several dozen archiver programs are used, which differ in the list of functions and operating parameters, but the best of them have approximately the same characteristics. Among the most popular programs are: ARJ, PKPAK, LHA, ICE, HYPER, ZIP, RAK, ZOO, EXPAND, developed abroad, as well as AIN and RAR, developed in Russia. Typically, packing and unpacking files are performed by the same program, but in some cases this is carried out by different programs, for example, the PKZIP program packs files, and PKUNZIP unpacks files. Archiving programs also allow you to create archives from which you can extract the contents contained in These files do not require any programs, since the archive files themselves may contain an unpacking program. Such archive files are called self-extracting. Self-extracting archive file - This is a bootable, executable module that is capable of independently unzipping the files contained in it without using an archiver program. The self-extracting archive is called SFX - archive (SelF - eXtracting). Archives of this type in MS DOS are usually created in the form of an .EXE file. Many archiver programs unpack files, uploading them to disk, but there are also those that are designed to create a packaged executable module (program). As a result of such packaging, a program file is created with the same name and extension, which, when loaded into RAM, self-extracts and runs immediately. At the same time, it is also possible to convert the program file back to the unpacked format. Such archivers include the PKLITE, LZEXE, UNP programs. The EXPAND program, which is part of the utilities of the MS DOS operating system and the Windows shell, is used to unpack software product files supplied by Microsoft. RAR and AIN archiver programs, in addition to the usual compression mode, have a solid mode, in which archives are created with an increased compression ratio and a special organizational structure. In such archives, all files are compressed as one data stream, i.e. The search area for repeating character sequences is the entire set of files loaded into the archive, and therefore the unpacking of each file, if it is not the first, is associated with the processing of others. Archives of this type are preferable to use for archiving a large number of files of the same type. Ways to manage the archiver program The archiver program is controlled in one of two ways:- using the MS DOS command line, in which a launch command is generated containing the name of the archiver program, the control command and its configuration keys, as well as the names of the archive and source files; similar management is typical for archivers ARJ, AIN, ZIP, RAK, LHA, etc.;
- using a built-in shell and dialog panels that appear after starting the program and allow control using menus and function keys, which creates a more comfortable working environment for the user. The RAR archiver program has this control.
- create archive files from individual or all files of the current directory and its subdirectories, loading up to 32,000 files into one archive;
- add and replace files in the archive;
- extract And delete files from the archive;
- protect each archived file with a 32-bit cyclic code, test the archive, checking the safety of information in it;
- receive help with work in 3 international languages;
- enter comments to files into the archive;
- remember file paths in the archive;
- save several generations (versions) of the same file in an archive;
- reorder the archive file by file size, name, extension, date and time of modification, compression ratio, etc.;
- search for strings in archived files;
- restore files from destroyed archives;
- create self-extracting archives both on one volume and on several volumes;
- view the contents of text files contained in the archive;
- ensure the protection of information in the archive and access to files placed in the archive using a password.
Group number |
Team group |
Team |
Archive function |
Archiving |
add files to archive |
||
replace files in the archive with new versions |
|||
add only new files to the archive |
|||
move files to archive |
|||
Extracting from the archive |
extract files from archive to current directory |
||
extract files from the archive and place them in directories in accordance with the specified access paths |
|||
Deleting from the archive |
delete files from archive |
||
Service functions |
full testing of the archive |
||
output the contents of the archive without specifying the path to the files |
|||
output the contents of the archive indicating the path to the files |
|||
copy archive with new parameters |
|||
find a text string in an archive |
Purpose |
|
Adding files from the current directory and all its subdirectories, indicating the path to the files | |
Creating a multi-volume archive file | |
Password protection of the created archive: g<пароль>- password is entered on the command line g? - enter an invisible password when executing |
|
Adding/replacing files, with the exception of files whose names are indicated after the key | |
Request to perform an operation for each file: To confirm, you must enter the symbol "Y" for refusal - symbol "N" |
|
Creating a self-extracting archive | |
Specifying the archiving method: m0 - no compression; m1 - normal compression (default); m2 - highest compression; m3 - fast compression and less compression; m4 - fastest compression and lowest compression; |
|
The answer is “Yes” to all questions from the archiver. | |
Pause when viewing archive content after the screen is full |
Modifier |
Purpose of the modifier |
Indicates that archive files of a multi-volume archive will occupy all free space on disks (volumes) | |
Allows you to execute any number of DOS commands before creating a new volume, for example viewing, cleaning or formatting a floppy disk on which the next archive file will be written; After executing the commands, you must enter the EXIT command to continue archiving | |
Prohibits sharing archived files between volumes | |
Provides a sound signal before installing the next volume | |
Allows you to reserve free space on the first volume; the number written after the symbol r indicates the size of this space | |
360, 720, 1200 |
Modifier options for specifying archive volume sizes |
- possibility of working in two modes - full screen interactive interface and regular command line interface;
- support for other types of archives; in full-screen mode, RAR provides the ability to work with archives of other types (.ZIP, .ARJ, LZH), view their contents, change and convert;
- use of the highly efficient solid compression method to obtain a high compression ratio (10 - 50% higher than usual);
- the ability to create self-extracting and multi-volume archives;
- password protection of archives.
- password encryption;
- adding file and archive comments;
- the possibility of partial or complete recovery of damaged archives;
- protecting the archive from changes;
- the ability to add information to the archive about the creator of the archive, the time and date of the last changes made to the archive.
- in command line mode;
- in full screen interface mode.
Function name |
Purpose |
|
Add a file to the archive, if the archive does not exist it will be created | ||
View file | ||
Update files in the archive - only changed files are added, old copies of which are in the archive | ||
Create Archive Volumes | ||
Transfer files to archive | ||
Add files that are not in the archive and update those whose old copies are already in the archive | ||
Recover a damaged archive | ||
Exit RAR. Key |
||
Create a continuous (solid) archive | ||
View file | ||
Create an archive divided into SFX volumes | ||
Create solid - archive divided into volumes | ||
Create solid - archive divided into SFX volumes |
Function name |
Purpose |
|
Displaying help information | ||
Test archive | ||
View file | ||
Extract the file from the archive with full paths | ||
Add a comment to the archive | ||
Extract files to current directory | ||
Convert to SFX - archive | ||
Delete files from archive | ||
Configuration/Save Configuration | ||
Exit from the archive | ||
View a file with a built-in program if an external one is available | ||
Extract files to specified directory | ||
Add comments to files | ||
Lock the archive from changes |
Work directory Information compression is the process of converting information stored in a file into a form that reduces redundancy in its presentation and, accordingly, requires less storage space.
Compression of information in files is carried out using the deviceeliminating redundancy in various ways, for example, by simplifying the codes, excluding constant bits of symbols or a repeating sequence of symbols, introducing a symbol repetition factor, etc. Various algorithms for such information compression are used. Either one or several files can be compressed and placed in a compressed form into an archive file or archive.
Archive file
(archive, or archive file) - this is specialan organized file containing one or morehow many files in compressed or uncompressed form and service informationinformation about file names, date and time of their creation or modificationcation, size, etc.
The purpose of packing files
usually ensuring a more compact placement of information on disk, reducing the time and, accordingly, the cost of transmitting information over communication channels in computer networks. Besides, packing in one arfile group file significantly simplifies their transfer from one computer to another, reduces the time of copying files to disks, allows you to protect information from unauthorized access, and helps protect against infection by computer viruses. The degree of compression depends on the archiving program used, the compression method, and the type of source file. Text files and data files are most well compressed, for which the compression ratio can reach 80-90%; files of executable programs and load modules are compressed less - 5-40%. Archive files are almost not compressed. Archiving programs differ in the compression methods they use, which consequently affects the compression ratio.
Unzipping
(unpacking)- file recovery processfrom the archive exactly in the form they had before loading into the archiveKhiv. When unpacking, files are extracted from the archive and placed on disk or in RAM. Large archive files can be placed in several volumes. Such archives are called multi-volume.
Volume
- it is an integral part of a multi-volume archive. When creating an archive from several parts, you can write its parts onto several floppy disks. Information compression is the process of converting information stored in a file into a form that reduces redundancy in its presentation and, accordingly, requires less memory for storage. Compression of information in files is accomplished by eliminating redundancy in various ways, such as by simplifying codes or representing repeated characters, or a repeating sequence of characters as a repetition factor and corresponding characters. Various algorithms for such information compression are used. Either one or several files can be compressed, which in compressed form are placed in a so-called archive file or archive. An archive file is a specially organized file containing one or more files in compressed or uncompressed form and service information about file names, date and time of their creation or modification, sizes, etc. The purpose of file packaging is usually to ensure a more compact placement of information on disk, reducing the time and, accordingly, the cost of transmitting information over communication channels in computer networks. In addition, packaging a group of files into one archive file significantly simplifies their transfer from one computer to another, reduces the time of copying files to disks, allows you to protect information from unauthorized access, and helps protect against infection by computer viruses. The degree of file compression is characterized by the coefficient Kc, defined as the ratio of the volume of the compressed file Vc to the volume of the original file Vo, expressed as a percentage: Kc=(Vc/ V0)*100% The degree of compression depends on the program used, the compression method, and the type of source file. The most well-compressed files are graphic images, text files and data files, for which the compression ratio can reach 5 - 40%; files of executable programs and load modules are compressed less - 60 - 90%. Archive files are almost not compressed. Archiving programs differ in the compression methods they use, which consequently affects the compression ratio. Archiving (packaging) - placing (downloading) source files into an archive file in compressed or uncompressed form. Unzipping (unpacking) is the process of restoring files from an archive exactly as they were before they were loaded into the archive. When unpacking, files are extracted from the archive and placed on disk or in RAM. Programs that pack and unpack files are called archiver programs. Large archive files can be placed on several disks (volumes). Such archives are called multi-volume. A volume is an integral part of a multi-volume archive. When creating an archive from several parts, you can record parts of it into several parts. The most popular archive formats Gif" width="25" height="25" />.php?viewcat=4"> Discuss on the forumFirst menu item Configuration allows you to call up a configuration dialog to configure basic RAR parameters (Fig. 11.3). The window contains five groups of parameters: Interface options - interface settings; Sort names - setting the file sorting option; Include file mask - setting the file inclusion mask; Compression - compression method settings; Other options - configure other parameters. Fig. 11.3. View of the window for setting configuration parameters of the RAR archiver A parameter marked with a cross means that the corresponding function is enabled. Move from one parameter to another by pressing the arrow keys. To change the parameter value in the current field, click .Technology of working with the archiver Let's look at the sequence of actions when performing the most frequently performed archiving procedures after loading the RAR program to work in full screen mode. Creating a new archive from several files 1.Select the drive by pressing the key combination .
Archive programs are designed to archive (pack) files by compressing the information stored in them in order to save disk space.
MAIN TYPES OF ARCHIVE PROGRAMS.
One of the most widespread types of service programs are programs designed for archiving, packaging files by compressing the information stored in them.
ZIP has been one of the most popular and widespread archive formats since the days of DOS, based on compression algorithms proposed in the 80s of the last century by Israeli mathematicians Lempel and Ziv. It is distinguished by an acceptable degree of information compression and fairly high performance. Today it is a de facto standard on the Internet, and almost all archiving programs must support it.
RAR - developed by Russian programmer Evgeny Roshal and allows you to get a compressed file size that is much smaller than ZIP, but the price for this is a longer archive processing process. In general, the RAR format is much better optimized than others for solving complex problems using a large number of files and gigabyte disk spaces.
ARJ is a somewhat outdated format, which is still perhaps distinguished by the widest customization options.
CAB is used in Microsoft products as a standard for packing files, and its algorithm, not published anywhere and kept by the company under seven seals, is a fairly advanced product with a high compression ratio.
GZIP, TAR - are most widespread in systems based on Unix and its most popular variety, Linux.
ACE is a fairly new format with a high compression ratio that is gaining increasing popularity.
Many programs that are quite popular in the world of archivers are based on one format or another and have similar names. For example, for Windows OS the most popular archivers are WinRAR, WinZIP, WinACE. In addition, they all have tools for working with other archive formats. Despite this, problems may arise with compatibility of archive formats in different programs. In many cases, a successful solution to the problem of compatibility of archives of various types is to create archives in the form of self-extracting programs (EXE files), which include all the necessary mechanisms for extracting information from the archive, thus eliminating the need to have a corresponding archive unpacking program on the computer.