public class LZSS extends AbstractCompressor
Compression Info, 10-11-95
Jeff Wheeler
The compression algorithms used here are based upon the algorithms developed and published by Haruhiko Okumura in a paper entitled "Data Compression Algorithms of LARC and LHarc." This paper discusses three compression algorithms, LSZZ, LZARI, and LZHUF. LZSS is described as the "first" of these, and is described as providing moderate compression with good speed. LZARI is described as an improved LZSS, a combination of the LZSS algorithm with adaptive arithmetic compression. It is described as being slower than LZSS but with better compression. LZHUF (the basis of the common LHA compression program) was included in the paper, however, a free usage license was not included.
The following are copies of the statements included at the beginning of each source code listing that was supplied in the working paper.
LZSS, dated 4/6/89, marked as "Use, distribute and modify this program freely."
LZARI, dated 4/7/89, marked as "Use, distribute and modify this program freely."
LZHUF, dated 11/20/88, written by Haruyasu Yoshizaki, translated by Haruhiko Okumura on 4/7/89. Not expressly marked as redistributable or modifiable.
Since both LZSS and LZARI are marked as "use, distribute and modify freely" we have felt at liberty basing our compression algorithm on either of these.
Working samples of three possible compression algorithms are supplied in Okumura's paper. Which should be used?
LZSS is the fastest at decompression, but does not generated as small a compressed file as the other methods. The other two methods provided, perhaps, a 15% improvement in compression. Or, put another way, on a 100K file, LZSS might compress it to 50K while the others might approach 40-45K. For STEP purposes, it was decided that decoding speed was of more importance than tighter compression. For these reasons, the first compression algorithm implemented is the LZSS algorithm.
(adapted from Haruhiko Okumura's paper)
This scheme was proposed by Ziv and Lempel [1]. A slightly modified version is described by Storer and Szymanski [2]. An implementation using a binary tree has been proposed by Bell [3].
The algorithm is quite simple.If the ring buffer is 4096 bytes, the position can be stored in 12 bits. If the length is represented in 4 bits, the <position, length> pair is two bytes long. If the longest match is no more than two characters, then just one character is sent without encoding. The process starts again with the next character. An extra bit is sent each time to tell the decoder whether the next item is a character of a <position, length> pair.
[1] J. Ziv and A. Lempel, IEEE Transactions IT-23, 337-343 (1977).
[2] J. A. Storer and T. G. Szymanski, J. ACM, 29, 928-951 (1982).
[3] T.C. Gell, IEEE Transactions COM-34, 1176-1182 (1986).
The GNU Lesser General Public License for details.
Modifier and Type | Field and Description |
---|---|
private short[] |
dad
leftSon, rightSon, and dad are the Japanese way of referring to a tree
structure.
|
private short[] |
leftSon |
private short |
matchLength
The number of characters in the ring buffer at matchPosition that match a
given string.
|
private short |
matchPosition
The position in the ring buffer.
|
private static int |
MAX_STORE_LENGTH
This is the maximum length of a character sequence that can be taken from
the ring buffer.
|
private static short |
NOT_USED
Used to mark nodes as not used.
|
private ByteArrayOutputStream |
out
The output stream containing the result.
|
private short[] |
rightSon |
private static short |
RING_SIZE
This is the size of the ring buffer.
|
private static short |
RING_WRAP
This is used to determine the next position in the ring buffer, from 0 to
RING_SIZE - 1.
|
private byte[] |
ringBuffer
A text buffer.
|
private static int |
THRESHOLD
It takes 2 bytes to store an offset and a length.
|
input
BUF_SIZE
Constructor and Description |
---|
LZSS(InputStream input)
Create an LZSS that is capable of transforming the input.
|
Modifier and Type | Method and Description |
---|---|
ByteArrayOutputStream |
compress()
Compresses the input and provides the result.
|
private void |
deleteNode(short node)
Remove a node from the tree.
|
private void |
initTree()
Initializes the tree nodes to "empty" states.
|
private void |
insertNode(short pos)
Inserts a string from the ring buffer into one of the trees.
|
ByteArrayOutputStream |
uncompress()
Uncompresses the input and provides the result.
|
ByteArrayOutputStream |
uncompress(int expectedSize)
Uncompresses the input and provides the result.
|
private static final short RING_SIZE
private static final short RING_WRAP
private static final int MAX_STORE_LENGTH
Note that the 12 bits used to store the position and the 4 bits used to store the length equal a total of 16 bits, or 2 bytes.
private static final int THRESHOLD
private static final short NOT_USED
private byte[] ringBuffer
This ring buffer is not maintained as part of the compressed text. Instead, it is reconstructed dynamically. That is, it starts out empty and gets built as the text is decompressed.
The ring buffer contain RING_SIZE bytes, with an additional MAX_STORE_LENGTH - 1 bytes to facilitate string comparison.
private short matchPosition
private short matchLength
private short[] dad
For i = 0 to RING_SIZE-1, rightSon[i] and leftSon[i] will be the right and left children of node i.
For i = 0 to RING_SIZE-1, dad[i] is the parent of node i.
For i = 0 to 255, rightSon[RING_SIZE + i + 1] is the root of the tree for strings that begin with the character i. Note that this requires one byte characters.
These nodes store values of 0...(RING_SIZE-1). Memory requirements can be reduces by using 2-byte integers instead of full 4-byte integers (for 32-bit applications). Therefore, these are defined as "shorts."
private short[] leftSon
private short[] rightSon
private ByteArrayOutputStream out
public LZSS(InputStream input)
input
- to compress or uncompress.public ByteArrayOutputStream compress() throws IOException
Compressor
IOException
- if an exception is encounteredpublic ByteArrayOutputStream uncompress() throws IOException
Compressor
IOException
- if an exception is encounteredpublic ByteArrayOutputStream uncompress(int expectedSize) throws IOException
Compressor
expectedSize
- the size of the result bufferIOException
- if an exception is encounteredprivate void initTree()
private void insertNode(short pos)
The string to be inserted is identified by the parameter pos, A full MAX_STORE_LENGTH bytes are inserted. So, ringBuffer[pos ... pos+MAX_STORE_LENGTH-1] are inserted.
If the matched length is exactly MAX_STORE_LENGTH, then an old node is removed in favor of the new one (because the old one will be deleted sooner).
pos
- plays a dual role. It is used as both a position in the ring
buffer and also as a tree node. ringBuffer[pos] defines a
character that is used to identify a tree node.private void deleteNode(short node)
node
- the node to remove