[A83] Re: Groups


[Prev][Next][Index][Thread]

[A83] Re: Groups



I've thought of compression myself too. The best algorithm would probably be 
LZ77. It's simple and effective, with fast decompression but slow 
compression. Here's a description (by one "Joe D", describing the HPI archive 
format used by Cavedog Entertainment's Total Annihilation):

The compression algorithm is a very basic sliding window compression scheme
from the LZ77 family using a 4095 byte history and matches from 2 to 17
bytes long.

The first byte is kind of a "tag" byte which determines if the next eight
pieces of data are literal bytes or history matches.  Starting with the
least-significant bit, this tag byte is scanned to figure out what to do.

When the current bit is a zero, the next byte of the input is transfered
directly to the output and added to end of the history buffer.

When the current bit is a one, the next two bytes taken from the input are
used as a offset/length pair.  The upper 12 bits are the offset into the
history buffer and the lower 4 bits are the length. [NOTE: Different numbers 
of bits may be more effective.] If the offset is zero, the end of the input 
data has been reached and the decompressor simply exits.

Since we can assume that there will be no matches with a length of zero
or only one byte, any match is a mimimum of two bytes so we just add two
to the length which extends our range from 0-15 to 2-17 bytes.

[NOTE: 2 byte matches make little sense either, because the match code is 
itself 2 bytes. Therefore, it would work better to add 3, getting 3-18 byte 
matches.]

The match is then copied from the history buffer to the output and also added
onto the end of the history buffer to keep it in sync with the output.

[NOTE: The history number can't go over 4095 bytes. When it reaches that 
number, things are dumped from the beginning. It makes sense for the history 
buffer to actually be a pointer to the output, 4095 bytes back from the 
current position, or at the beginning if less than 4095 bytes have been 
decompressed.]

When all eight bits of the tag byte have been used, the mask is reset and
the next tag byte is loaded.