[A86] Re: compression


[Prev][Next][Index][Thread]

[A86] Re: compression




Compression can be very simplistic, or it can be very complicated.  I
suggest reading a book on it, and/or checking out some websites.  There are
essentially three parts to compression (perhaps more): encoding, compression
and transformation.

Encoding is a way to represent data in the smallest possible number of bits,
in other words, at the actual entropy of the data.  Huffman encoding, which
is technically not compression, is probably the most common encoding method.
An easy example of this is if you were only store lowercase letters of the
alphabet.  There are 26 different letters, thus using very simplistic
encoding it would take at most 5 bits for each character.  However, you are
most likely going to have more of one letter than another.  If you have ten
E's for every Z, then you would be saving space by encoding E's with 3 bits
even if that meant encoding Z's with 9 bits.  Huffman encoding works very
similiar to this.  Huffman uses a whole number of bits per character, say 2
bits, whereas Arithmetic Coding uses a fractional number, say 1.8.

There are many, many methods to actually compress data, and can be vastly
different from each other.  The zlib (gzip) method and jpeg method are very
different, for example.

Transformations are methods of rearranging data before it is compressed, so
that it can be compressed more efficiently.  An example of this is the
Burrows-Wheeler Transform, which is used by bzip2.  bzip2 on the average
produces better results than gzip.  For example, the 2.4.4 kernel source is
2.1M gzip'd, and 1.7M bzip2'd.

Normally, data will be transformed, compressed, and then encoded.

The following site has links to algorithms, formats and libraries for many
different compression types and schemes:

http://dogma.net/DataCompression/

I own the following book, and highly recommend it to anyone who wants to
learn about data compression.  It has easy to understand explanations of
many algorithms, and the included disk has the source code to many working
example programs.  I think that Kirk Meyer wrote Lite86 after reading this
book.

http://www.amazon.com/exec/obidos/ASIN/1558514341

Btw, just out of curiousity, does any in the Phoenix area?

> Can someone explain some of the algorithms used for compression of data?





References: