The Wayback Machine - https://web.archive.org/web/20030228094950/http://planetmath.org:80/encyclopedia/ArithmeticEncoding.html
PlanetMath
 Math for the people, by the people. Encyclopedia | Books | Papers | Expositions | Requests | Docs | Random
Login
Main Menu
the math
Encyclopædia
Papers
Books
Expositions

meta
Requests (49)
Orphanage (3)
Unclass'd (219)
Unproven (148)
Corrections (72)

talkback
Polls
Forums
Feedback
Bug Reports

information
Docs
Classification
News
Legalese
History
ChangeLog
TODO List
arithmetic encoding (Definition)

Arithmetic coding is a technique for achieving near-optimal entropy encoding.

An arithmetic encoder takes a string of symbols as input and produces a rational number in the interval $ [0, 1)$ as output. As each symbol is processed, the encoder will restrict the output to a smaller interval.

Let $ N$ be the number of distinct symbols in the input; let $ x_1, x_2 \dots x_N$ represent the symbols, and let $ P_1, P_2 \dots P_N$ represent the probability of each symbol appearing. At each step in the process, the output is restricted to the current interval $ [y, y + R)$. Partition this interval into $ N$ disjoint subintervals:

$\displaystyle I_1 = [y, y + P_1R)$
$\displaystyle I_2 = [y+P_1R, y + P_1R + P_2R)$
$\displaystyle \vdots$
$\displaystyle I_N = \left[y+\sum_{i=1}^{N-1}P_iR, y + R\right)$

Therefore the size of $ I_i$ is $ P_iR$. If the next symbol is $ x_i$, then restrict the output to the new interval $ I_i$.

Note that at each stage, all the possible intervals are pairwise disjoint. Therefore a specific sequence of symbols produces exactly one unique output range, and the process can be reversed.

Since arithmetic encoders are typically implemented on binary computers, the actual output of the encoder is generally the shortest sequence of bits representing the fractional part of a rational number in the final interval.

Suppose our entire input string contains $ M$ symbols: then $ x_i$ appears exactly $ P_iM$ times in the input. Therefore, the size of the final interval will be

$\displaystyle R_f = \prod_{i=1}^N P_i^{P_iM}$
The number of bits necessary to write a binary fraction in this range is
$\displaystyle -\log_2 R_f$ $\displaystyle =$ $\displaystyle -\log_2 \prod_{i=1}^N P_i^{P_iM}$  
  $\displaystyle =$ $\displaystyle \sum_{i=1}^N -\log_2 P_i^{P_iM}$  
  $\displaystyle =$ $\displaystyle \sum_{i=1}^N -P_iM\log_2 P_i$  

By Shannon's theorem, this is the total entropy of the original message. Therefore arithmetic encoding is near-optimal entropy encoding.


"arithmetic encoding" is owned by vampyr.
(view preamble)


See Also: Huffman coding

Other names:  arithmetic coding, arithmetic encoder, arithmetic coder


Cross-references: Shannon's theorem, fraction, size, contains, entire, binary, range, sequence, disjoint, partition, restricted, rational number, string, entropy encoding
There are 2 references to this object.

This is version 1 of arithmetic encoding, born on 2002-03-08
Object id is 2781, canonical name is ArithmeticEncoding.
Accessed 1060 times total.

Classification:
AMS MSC68P30 (Computer science :: Theory of data :: Coding and information theory)
 94A24 (Information and communication, circuits :: Communication, information :: Coding theorems)

Pending Errata and Addenda
None.
Discussion

No messages.

Interact
rate | post | correct | update request | add derivation | add example | add (any)