Match finders. More...
#include "lz_encoder.h"
#include "lz_encoder_hash.h"
Defines | |
#define | EMPTY_HASH_VALUE 0 |
#define | MUST_NORMALIZE_POS UINT32_MAX |
#define | header(is_bt, len_min, ret_op) |
#define | header_find(is_bt, len_min) |
#define | header_skip(is_bt, len_min) header(is_bt, len_min, continue) |
#define | call_find(func, len_best) |
Functions | |
uint32_t | lzma_mf_find (lzma_mf *mf, uint32_t *count_ptr, lzma_match *matches) |
Find matches starting from the current byte. | |
static void | normalize (lzma_mf *mf) |
Normalizes hash values. | |
static void | move_pos (lzma_mf *mf) |
Mark the current byte as processed from point of view of the match finder. | |
static void | move_pending (lzma_mf *mf) |
Match finders.
#define EMPTY_HASH_VALUE 0 |
Hash value to indicate unused element in the hash. Since we start the positions from dict_size + 1, zero is always too far to qualify as usable match position.
#define MUST_NORMALIZE_POS UINT32_MAX |
Normalization must be done when lzma_mf.offset + lzma_mf.read_pos reaches MUST_NORMALIZE_POS.
Referenced by normalize().
#define header | ( | is_bt, | ||
len_min, | ||||
ret_op | ||||
) |
uint32_t len_limit = mf_avail(mf); \ if (mf->nice_len <= len_limit) { \ len_limit = mf->nice_len; \ } else if (len_limit < (len_min) \ || (is_bt && mf->action == LZMA_SYNC_FLUSH)) { \ assert(mf->action != LZMA_RUN); \ move_pending(mf); \ ret_op; \ } \ const uint8_t *cur = mf_ptr(mf); \ const uint32_t pos = mf->read_pos + mf->offset
Calculate len_limit and determine if there is enough input to run the actual match finder code. Sets up "cur" and "pos". This macro is used by all find functions and binary tree skip functions. Hash chain skip function doesn't need len_limit so a simpler code is used in them.
#define header_find | ( | is_bt, | ||
len_min | ||||
) |
header(is_bt, len_min, return 0); \ uint32_t matches_count = 0
Header for find functions. "return 0" indicates that zero matches were found.
#define header_skip | ( | is_bt, | ||
len_min | ||||
) | header(is_bt, len_min, continue) |
Header for a loop in a skip function. "continue" tells to skip the rest of the code in the loop.
#define call_find | ( | func, | ||
len_best | ||||
) |
do { \ matches_count = func(len_limit, pos, cur, cur_match, mf->depth, \ mf->son, mf->cyclic_pos, mf->cyclic_size, \ matches + matches_count, len_best) \ - matches; \ move_pos(mf); \ return matches_count; \ } while (0)
Calls hc_find_func() or bt_find_func() and calculates the total number of matches found. Updates the dictionary position and returns the number of matches found.
uint32_t lzma_mf_find | ( | lzma_mf * | mf, | |
uint32_t * | count_ptr, | |||
lzma_match * | matches | |||
) |
Find matches starting from the current byte.
References lzma_mf_s::find, lzma_mf_s::match_len_max, mf_avail(), mf_ptr(), lzma_mf_s::nice_len, and lzma_mf_s::read_ahead.
static void normalize | ( | lzma_mf * | mf | ) | [static] |
Normalizes hash values.
The hash arrays store positions of match candidates. The positions are relative to an arbitrary offset that is not the same as the absolute offset in the input stream. The relative position of the current byte is lzma_mf.offset + lzma_mf.read_pos. The distances of the matches are the differences of the current read position and the position found from the hash.
To prevent integer overflows of the offsets stored in the hash arrays, we need to "normalize" the stored values now and then. During the normalization, we drop values that indicate distance greater than the dictionary size, thus making space for new values.
References lzma_mf_s::hash_size_sum, MUST_NORMALIZE_POS, lzma_mf_s::offset, lzma_mf_s::read_pos, and lzma_mf_s::sons_count.
Referenced by move_pos().
static void move_pos | ( | lzma_mf * | mf | ) | [static] |
Mark the current byte as processed from point of view of the match finder.
References normalize(), lzma_mf_s::offset, lzma_mf_s::read_pos, and lzma_mf_s::write_pos.
static void move_pending | ( | lzma_mf * | mf | ) | [static] |
When flushing, we cannot run the match finder unless there is nice_len bytes available in the dictionary. Instead, we skip running the match finder (indicating that no match was found), and count how many bytes we have ignored this way.
When new data is given after the flushing was completed, the match finder is restarted by rewinding mf->read_pos backwards by mf->pending. Then the missed bytes are added to the hash using the match finder's skip function (with small amount of input, it may start using mf->pending again if flushing).
Due to this rewinding, we don't touch cyclic_pos or test for normalization. It will be done when the match finder's skip function catches up after a flush.
References lzma_mf_s::pending, lzma_mf_s::read_pos, and lzma_mf_s::write_pos.