[A83] Re: Is compression possible?


[Prev][Next][Index][Thread]

[A83] Re: Is compression possible?






Patrick Davidson wrote:

> You can look at the source code to see what it is doing.  I don't know the
> precise name for the algorithm, but it uses simple offset-length
> enconding.  The decompression code certainly does look very well optimized
> in this case.  As for the compressor, that certainly would be larger, but
> not much larger since the compression is not terribly complex.  Probably a
> larger concern should be that the compression might be very slow on-calc,
> since you have to compare data at each byte boundary with the data at each
> previous boundary back 2048 bytes.  So for a 10K program, 20 million
> comparisons, or 400 million clock cycles (about a minute) if each takes 20
> clock cycles (which would seem fairly good).

A window size of 256 bytes would probably suffice for calc programs though. With
an average size of probably around 10k, I dare say a 2k window would yeild that
much better compression. a 256 byte window also allows eveything to fit neatly
into a byte (if you assume 0 to be 256), and reduces the size of the phrase
tokens... say, 1 bit for the token type, 8 bits window offset and 5 bits for the
length.

So.. for a 10k program with an 8 bit window... that's still ~51 million cycles :P

Also.... memory need not be as big an issue as first thought. Under LZSS, the
maximum bloat would be 112.5% (ie: no matching phrases, therefore 9 bits to store
each byte). Phrase tokens aren't output unless the length of the data it
represents is larger than the length of the token itself.

So, for a 10k program, that 10240 * 1.125 = 11520 bytes maximum (not including
header info). During compression, that would mean we'd need an extra 1280 bytes
for a 10k program in order to be 'safe'. This is assuming it is possible to
directly modify a program variable (which I've never tried before).

It should be possible... slow, but possible ;)

At least the decompression will be fast.




References: