LZSS

java.lang.Object
- org.crosswire.common.compress.AbstractCompressor
- - org.crosswire.common.compress.LZSS

All Implemented Interfaces:

Compressor
```
public class LZSS
extends AbstractCompressor
```
The LZSS compression is a port of code as implemented for STEP. The following information gives the history of this implementation.
Compression Info, 10-11-95
Jeff Wheeler

Source of Algorithm

The compression algorithms used here are based upon the algorithms developed and published by Haruhiko Okumura in a paper entitled "Data Compression Algorithms of LARC and LHarc." This paper discusses three compression algorithms, LSZZ, LZARI, and LZHUF. LZSS is described as the "first" of these, and is described as providing moderate compression with good speed. LZARI is described as an improved LZSS, a combination of the LZSS algorithm with adaptive arithmetic compression. It is described as being slower than LZSS but with better compression. LZHUF (the basis of the common LHA compression program) was included in the paper, however, a free usage license was not included.

The following are copies of the statements included at the beginning of each source code listing that was supplied in the working paper.

LZSS, dated 4/6/89, marked as "Use, distribute and modify this program freely."

LZARI, dated 4/7/89, marked as "Use, distribute and modify this program freely."

LZHUF, dated 11/20/88, written by Haruyasu Yoshizaki, translated by Haruhiko Okumura on 4/7/89. Not expressly marked as redistributable or modifiable.

Since both LZSS and LZARI are marked as "use, distribute and modify freely" we have felt at liberty basing our compression algorithm on either of these.

Selection of Algorithm

Working samples of three possible compression algorithms are supplied in Okumura's paper. Which should be used?

LZSS is the fastest at decompression, but does not generated as small a compressed file as the other methods. The other two methods provided, perhaps, a 15% improvement in compression. Or, put another way, on a 100K file, LZSS might compress it to 50K while the others might approach 40-45K. For STEP purposes, it was decided that decoding speed was of more importance than tighter compression. For these reasons, the first compression algorithm implemented is the LZSS algorithm.

About LZSS Encoding

(adapted from Haruhiko Okumura's paper)

This scheme was proposed by Ziv and Lempel [1]. A slightly modified version is described by Storer and Szymanski [2]. An implementation using a binary tree has been proposed by Bell [3].
The algorithm is quite simple.
1. Keep a ring buffer which initially contains all space characters.
2. Read several letters from the file to the buffer.
3. Search the buffer for the longest string that matches the letters just read, and send its length and position into the buffer.
If the ring buffer is 4096 bytes, the position can be stored in 12 bits. If the length is represented in 4 bits, the <position, length> pair is two bytes long. If the longest match is no more than two characters, then just one character is sent without encoding. The process starts again with the next character. An extra bit is sent each time to tell the decoder whether the next item is a character of a <position, length> pair.

[1] J. Ziv and A. Lempel, IEEE Transactions IT-23, 337-343 (1977).
[2] J. A. Storer and T. G. Szymanski, J. ACM, 29, 928-951 (1982).
[3] T.C. Gell, IEEE Transactions COM-34, 1176-1182 (1986).
Regarding this port to Java and not the original code, the following license applies:
Author:

DM Smith

See Also:
The GNU Lesser General Public License for details.

Field Summary

Fields
Modifier and Type	Field and Description
`private short[]`	`dad` leftSon, rightSon, and dad are the Japanese way of referring to a tree structure.
`private short[]`	`leftSon`
`private short`	`matchLength` The number of characters in the ring buffer at matchPosition that match a given string.
`private short`	`matchPosition` The position in the ring buffer.
`private static int`	`MAX_STORE_LENGTH` This is the maximum length of a character sequence that can be taken from the ring buffer.
`private static short`	`NOT_USED` Used to mark nodes as not used.
`private ByteArrayOutputStream`	`out` The output stream containing the result.
`private short[]`	`rightSon`
`private static short`	`RING_SIZE` This is the size of the ring buffer.
`private static short`	`RING_WRAP` This is used to determine the next position in the ring buffer, from 0 to RING_SIZE - 1.
`private byte[]`	`ringBuffer` A text buffer.
`private static int`	`THRESHOLD` It takes 2 bytes to store an offset and a length.

Fields inherited from class org.crosswire.common.compress.AbstractCompressor
input

Fields inherited from interface org.crosswire.common.compress.Compressor
BUF_SIZE

Constructor Summary

Constructors
Constructor and Description

LZSS(InputStream input)
Create an LZSS that is capable of transforming the input.

Constructors
Constructor and Description
`LZSS(InputStream input)` Create an LZSS that is capable of transforming the input.

Method Summary

Methods
Modifier and Type	Method and Description
`ByteArrayOutputStream`	`compress()` Compresses the input and provides the result.
`private void`	`deleteNode(short node)` Remove a node from the tree.
`private void`	`initTree()` Initializes the tree nodes to "empty" states.
`private void`	`insertNode(short pos)` Inserts a string from the ring buffer into one of the trees.
`ByteArrayOutputStream`	`uncompress()` Uncompresses the input and provides the result.
`ByteArrayOutputStream`	`uncompress(int expectedSize)` Uncompresses the input and provides the result.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - RING_SIZE
```
private static final short RING_SIZE
```
    This is the size of the ring buffer. It is set to 4K. It is important to note that a position within the ring buffer requires 12 bits.
    
    See Also:
    Constant Field Values
  - RING_WRAP
```
private static final short RING_WRAP
```
    This is used to determine the next position in the ring buffer, from 0 to RING_SIZE - 1. The idiom s = (s + 1) & RING_WRAP; will ensure this. This only works if RING_SIZE is a power of 2. Note this is slightly faster than the equivalent: s = (s + 1) % RING_SIZE;
    
    See Also:
    Constant Field Values
  - MAX_STORE_LENGTH
```
private static final int MAX_STORE_LENGTH
```
    This is the maximum length of a character sequence that can be taken from the ring buffer. It is set to 18. Note that a length must be 3 before it is worthwhile to store a position/length pair, so the length can be encoded in only 4 bits. Or, put yet another way, it is not necessary to encode a length of 0-18, it is necessary to encode a length of 3-18, which requires 4 bits.
    Note that the 12 bits used to store the position and the 4 bits used to store the length equal a total of 16 bits, or 2 bytes.
    
    See Also:
    Constant Field Values
  - THRESHOLD
```
private static final int THRESHOLD
```
    It takes 2 bytes to store an offset and a length. If a character sequence only requires 1 or 2 characters to store uncompressed, then it is better to store it uncompressed than as an offset into the ring buffer.
    
    See Also:
    Constant Field Values
  - NOT_USED
```
private static final short NOT_USED
```
    Used to mark nodes as not used.
    
    See Also:
    Constant Field Values
  - ringBuffer
```
private byte[] ringBuffer
```
    A text buffer. It contains "nodes" of uncompressed text that can be indexed by position. That is, a substring of the ring buffer can be indexed by a position and a length. When decoding, the compressed text may contain a position in the ring buffer and a count of the number of bytes from the ring buffer that are to be moved into the uncompressed buffer.
    This ring buffer is not maintained as part of the compressed text. Instead, it is reconstructed dynamically. That is, it starts out empty and gets built as the text is decompressed.
    
    The ring buffer contain RING_SIZE bytes, with an additional MAX_STORE_LENGTH - 1 bytes to facilitate string comparison.
  - matchPosition
```
private short matchPosition
```
    The position in the ring buffer. Used by insertNode.
  - matchLength
```
private short matchLength
```
    The number of characters in the ring buffer at matchPosition that match a given string. Used by insertNode.
  - dad
```
private short[] dad
```
    leftSon, rightSon, and dad are the Japanese way of referring to a tree structure. The dad is the parent and it has a right and left son (child).
    For i = 0 to RING_SIZE-1, rightSon[i] and leftSon[i] will be the right and left children of node i.
    
    For i = 0 to RING_SIZE-1, dad[i] is the parent of node i.
    
    For i = 0 to 255, rightSon[RING_SIZE + i + 1] is the root of the tree for strings that begin with the character i. Note that this requires one byte characters.
    
    These nodes store values of 0...(RING_SIZE-1). Memory requirements can be reduces by using 2-byte integers instead of full 4-byte integers (for 32-bit applications). Therefore, these are defined as "shorts."
  - leftSon
```
private short[] leftSon
```
  - rightSon
```
private short[] rightSon
```
  - out
```
private ByteArrayOutputStream out
```
    The output stream containing the result.
- Constructor Detail
  - LZSS
```
public LZSS(InputStream input)
```
    Create an LZSS that is capable of transforming the input.
    
    Parameters:
    input - to compress or uncompress.
- Method Detail
  - compress
```
public ByteArrayOutputStream compress()
                               throws IOException
```
    Description copied from interface: Compressor
    
    Compresses the input and provides the result.
    
    Returns:
    the compressed result
    
    Throws:
    
    IOException - if an exception is encountered
  - uncompress
```
public ByteArrayOutputStream uncompress()
                                 throws IOException
```
    Description copied from interface: Compressor
    
    Uncompresses the input and provides the result.
    
    Returns:
    the uncompressed result
    
    Throws:
    
    IOException - if an exception is encountered
  - uncompress
```
public ByteArrayOutputStream uncompress(int expectedSize)
                                 throws IOException
```
    Description copied from interface: Compressor
    
    Uncompresses the input and provides the result.
    
    Parameters:
    expectedSize - the size of the result buffer
    
    Returns:
    the uncompressed result
    
    Throws:
    
    IOException - if an exception is encountered
  - initTree
```
private void initTree()
```
    Initializes the tree nodes to "empty" states.
  - insertNode
```
private void insertNode(short pos)
```
    Inserts a string from the ring buffer into one of the trees. It loads the match position and length member variables for the longest match.
    The string to be inserted is identified by the parameter pos, A full MAX_STORE_LENGTH bytes are inserted. So, ringBuffer[pos ... pos+MAX_STORE_LENGTH-1] are inserted.
    
    If the matched length is exactly MAX_STORE_LENGTH, then an old node is removed in favor of the new one (because the old one will be deleted sooner).
    
    Parameters:
    pos - plays a dual role. It is used as both a position in the ring buffer and also as a tree node. ringBuffer[pos] defines a character that is used to identify a tree node.
  - deleteNode
```
private void deleteNode(short node)
```
    Remove a node from the tree.
    
    Parameters:
    node - the node to remove

Class LZSS

Source of Algorithm

Selection of Algorithm

About LZSS Encoding

Field Summary

Fields inherited from class org.crosswire.common.compress.AbstractCompressor

Fields inherited from interface org.crosswire.common.compress.Compressor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

RING_SIZE

RING_WRAP

MAX_STORE_LENGTH

THRESHOLD

NOT_USED

ringBuffer

matchPosition

matchLength

dad

leftSon

rightSon

out

Constructor Detail

LZSS

Method Detail

compress

uncompress

uncompress

initTree

insertNode

deleteNode