A85: Re: Assembly-85 Digest V1 #218


[Prev][Next][Index][Thread]

A85: Re: Assembly-85 Digest V1 #218




>Date: Sat, 21 Mar 1998 12:26:18 -0500 (EST)
>From: Jon Olson <morph@jmss.com>
>Subject: Re: A85: Compress levels

>In case the guy doesn't release the source to hufflib - here's how
>huffman compression works.

>Basically, first it finds out what the most common bit combination >is. 
For
>example, let's say we're working with an alphanumeric string.

>ABCDEFABCDEABCDABCABA

>Huffman would go through and find the most common letter, second, 
>third,
>etc.

>In this string, A is the most common letter, so it would be assigned 
>the
>bit pattern: 0
>that way it only takes up 1 bit.

>Next most comon is B, B would be assigned 10
>C 110
>D 1110
>E 11110
>F 111110

>This way, the most commonly used letters (or in your case, bytes) >will
>only take up 1 or 2 bits instead of the full 8. It's fairly efficient
>compression. (This is part of what pkZip uses).

>- -- Jon Olson


actually, that is not the way huffman works:

using the above ABCDEFABCDEABCDABCABA we get the following:
A was used 6 times
B:5
C:4
D:3
E:2
F:1

put them in order:

F:1
E:2
D:3
C:4
B:5
A:6

now start equating numbers (my method...huffman has it's own, but mine 
is much faster and compresses better):

letter:times used:encoder bit pattern:tree code

F:1::1   FE:3:0,1:011  C:4::1
E:2::1   D:3::1        B:5::1
D:3::1   C:4::1        A:6::1
C:4::1   B:5::1        FED:6:00,01,1:00111
B:5::1   A:6::1
A:6::1

A:6::1               CB:9:0,1:011
FED:6:00,01,1:00111  AFED:12:0,100,101,11:0100111
CB:9:0,1:011

CBAFED:21:00,01,10,1100,1101,111:00110100111
          C  B  A  F    E    D 
A=10
B=01
C=00
D=111
E=1101
F=1100

ok...what happens is huffman placed both the lowest numbers together, 
added the times used, placed a 0 in front of all the encoded bit 
patterns of the left equate and a 1 in front of all the ones on the 
right equate, and then he put a 0+left tree equate+right tree equate.

this maybe difficult and people usually use the method the guy above 
stated, but this is the way huffman got his coding.  so, here's the bits 
using the encoding process:

ABCDEFABCDEABCDABCABA
10 01 00 111 1101 1100 10 01 00 111 1101 10 01 00 111 10 01 00 10 01 10

place the tree in front of the code stripping the first 0, place the 
bytes used afterward in order of the tree, and place the file size after 
that:

tree: 0110100111
bytes:CBAFED
size:21
code:10010011 11101110 01001001 11110110 01001111 00100100 110

huffman needs these to do the decoding...i'll post the decoding later 
unless someone else does it first (or you figure it out yourself).

-Rob

______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com