Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
kernels/ | 03-May-2024 | - | 1,552 | 941 | ||
ops/ | 03-May-2024 | - | 120 | 84 | ||
python/ops/ | 03-May-2024 | - | 85 | 39 | ||
BUILD | D | 03-May-2024 | 3.5 KiB | 168 | 152 | |
README.md | D | 03-May-2024 | 2.6 KiB | 74 | 55 | |
__init__.py | D | 03-May-2024 | 1 KiB | 27 | 7 |
README.md
1# Entropy coder 2 3This module contains range encoder and range decoder which can encode integer 4data into string with cumulative distribution functions (CDF). 5 6## Data and CDF values 7 8The data to be encoded should be non-negative integers in half-open interval 9`[0, m)`. Then a CDF is represented as an integral vector of length `m + 1` 10where `CDF(i) = f(Pr(X < i) * 2^precision)` for i = 0,1,...,m, and `precision` 11is an attribute in range `0 < precision <= 16`. The function `f` maps real 12values into integers, e.g., round or floor. It is important that to encode a 13number `i`, `CDF(i + 1) - CDF(i)` cannot be zero. 14 15Note that we used `Pr(X < i)` not `Pr(X <= i)`, and therefore CDF(0) = 0 always. 16 17## RangeEncode: data shapes and CDF shapes 18 19For each data element, its CDF has to be provided. Therefore if the shape of CDF 20should be `data.shape + (m + 1,)` in NumPy-like notation. For example, if `data` 21is a 2-D tensor of shape (10, 10) and its elements are in `[0, 64)`, then the 22CDF tensor should have shape (10, 10, 65). 23 24This may make CDF tensor too large, and in many applications all data elements 25may have the same probability distribution. To handle this, `RangeEncode` 26supports limited broadcasting CDF into data. Broadcasting is limited in the 27following sense: 28 29- All CDF axes but the last one is broadcasted into data but not the other way 30 around, 31- The number of CDF axes does not extend, i.e., `CDF.ndim == data.ndim + 1`. 32 33In the previous example where data has shape (10, 10), the following are 34acceptable CDF shapes: 35 36- (10, 10, 65) 37- (1, 10, 65) 38- (10, 1, 65) 39- (1, 1, 65) 40 41## RangeDecode 42 43`RangeEncode` encodes neither data shape nor termination character. Therefore 44the decoder should know how many characters are encoded into the string, and 45`RangeDecode` takes the encoded data shape as the second argument. The same 46shape restrictions as `RangeEncode` inputs apply here. 47 48## Example 49 50```python 51data = tf.random_uniform((128, 128), 0, 10, dtype=tf.int32) 52 53histogram = tf.bincount(data, minlength=10, maxlength=10) 54cdf = tf.cumsum(histogram, exclusive=False) 55# CDF should have length m + 1. 56cdf = tf.pad(cdf, [[1, 0]]) 57# CDF axis count must be one more than data. 58cdf = tf.reshape(cdf, [1, 1, -1]) 59 60# Note that data has 2^14 elements, and therefore the sum of CDF is 2^14. 61data = tf.cast(data, tf.int16) 62encoded = coder.range_encode(data, cdf, precision=14) 63decoded = coder.range_decode(encoded, tf.shape(data), cdf, precision=14) 64 65# data and decoded should be the same. 66sess = tf.Session() 67x, y = sess.run((data, decoded)) 68assert np.all(x == y) 69``` 70 71## Authors 72Sung Jin Hwang (github: [ssjhv](https://github.com/ssjhv)) and Nick Johnston 73(github: [nmjohn](https://github.com/nmjohn)) 74