# Compressing arrays and various number of terms

## Main Question or Discussion Point

Behold this array of numbers:
3, 5, 11, 4

Sometimes, the programming limits the number of things that can be recorded in one location to a single number, such as 93240, but sometimes I also need to record an array of numbers such as the one shown above. An example of such platforms include the ti-84 calculator.

You may already know how to do this, but here is how I did it (with limited success) in case you are interested:
-----------------------------------------------------------------------------------------
I got around storing a combination of any number of numbers by using this trick:
1. raise 2 to the power of the input number
2. assign the exponentiation's output to k
3. raise 2 to the power of the second input
4. add the second power to k and reassign
5. rinse and repeat until the user stops inputting numbers, now we have a number k.

The example shown above would turn into:
8+32+2048+16=2104

when we are trying to get the stuff the user put in there, we just need to find the biggest power of 2 in k:
1. biggest power in k is 2048 (=2^11)

2. subtract this power from k and reassign:
k-2048 = 56 sto>k

3. look for the next biggest power in k:
biggest power in k is 32 (=2^5)

so on and so forth until k goes to 0.

By now we have all of the initial inputs extracted from k. The problem with this method include:
-Combination (not permutation) of positive whole numbers only
-input must not repeat
--------------------------------------------------------------------------------------------

Now, would there be a way to do something similar (storing an array of numbers in one number) and include the order of the array and allow repetition, and maybe allow non-whole numbers?

Related Programming and Computer Science News on Phys.org
What you have written: take n bit number and set k bits corresponding to k appeared values to 1, has two restrictions:
- it kills the order of elements (worth lg(k!) bits of information),
- it cannot store more than one appearance of elements.
As you have written, it is indeed a combination.

In practice you often split this set into different densities k = n*p, where p is density/probability of symbol:
2^n = sum_k binomial(n,k)

The number of combination is asymptotically
binomial(n , pn) ~ 2 ^ (nh(p))
where h(p) = -p lg(p) - (1-p) lg(1-p) is Shannon entropy (just insert n! ~ (n/e)^n Stirling approximation to get it).
The approach you have written uses 1bit/position, it is optimal as long as p = 1/2.
For different (general) probabilities, think about using a more sophisticated entropy coder - benchmarks and links: https://sites.google.com/site/powturbo/entropy-coder

Svein
The one-to-one mapping (used in Gödel's theorem)

• Use a sequence of prime numbers: 2, 3, 5, 7, 11, ... as bases
• Assume you want to store the numbers n1, n2, n3, ...
• Then create θ=2n1⋅3n2⋅5n3⋅ ...
• Now θ is a unique coding of the sequence n1, n2, n3, ...
• To decode, repeatedly divide by the prime numbers in sequence: Divide by two until no factor of two is left. The number of times you could divide by 2 gives you the first number (n1). Then do the same thing with 3, 5, ...

• 24forChromium
Svein, this is a theoretical way (and you forgot assumption of finite nonzero numbers).
Practical ways are just data compressors.
Assuming the numbers are independent values from some probability distribution, entropy coder is what allows to approach the minimal average number of bits/value: Shannon entropy of the probability distribution. It uses on average lg(1/p) bits to encode symbol of probability p.

If you want to encode a sequence of [0 ... n-1] values into a single natural number, you can use base-n numeral system: for example iterate
x -> nx+s
where s is the current [0 ... n-1] symbol. This way the final x contains information from all symbols.

However, this assumes uniform probability distribution among these n symbols - otherwise it is not optimal.
It can be optimized/asymmetrized for other general probability distribution by using asymmetric numeral systems - still encoding into a single natural number (or more practically: into a stream of bits), but approaching the Shannon entropy for a general probability distribution.
As number x contains ~lg(x) bits of information, symbol of probability p contains lg(1/p) bits of information, the approximate rule of adding this information to information already stored in x is
x -> ~ x/p
we had p = 1/n in standard numeral system.

Entropy coders - limited to a finite alphabet, still allows to encode as large number as we want - for example unary code: write as many 1 as the number, then write 0.

Svein
Svein, this is a theoretical way (and you forgot assumption of finite nonzero numbers).
Finite numbers, yes. Non-zero, no (as 70=1, you get no factor of 7 in the composite number). And yes - I know about statistical encoding, but I sort of followed up the direction given in the OP.

Oh, I've meant finite amount of nonzero numbers - otherwise the representing number would be infinite.

Now, would there be a way to do something similar (storing an array of numbers in one number) and include the order of the array and allow repetition, and maybe allow non-whole numbers?
So we can do it optimally using entropy coder - e.g. asymmetric numeral systems allows to encode them into a single natural number in nearly optimal way (asymptotically Shannon entropy).
If the order is not important, indeed (bucket?) sorting is a good choice.
Then you can go through successive values and use entropy coder for the numbers of repetitions (including zero) - assuming some expected probability distribution of the numbers of repetitions.

The one-to-one mapping (used in Gödel's theorem)

• Use a sequence of prime numbers: 2, 3, 5, 7, 11, ... as bases
• Assume you want to store the numbers n1, n2, n3, ...
• Then create θ=2n1⋅3n2⋅5n3⋅ ...
• Now θ is a unique coding of the sequence n1, n2, n3, ...
• To decode, repeatedly divide by the prime numbers in sequence: Divide by two until no factor of two is left. The number of times you could divide by 2 gives you the first number (n1). Then do the same thing with 3, 5, ...
sounds like a pretty good idea, although there must be a pre-set list of prime numbers and the maximum number of prime numbers determines the maximum number of terms, still better than what I have though.

Like unary coding, the Gödel's representation is optimal for some probability distribution - let's find it.
So x number contains lg(x) bits of information (usually with some floor, but it works very well also without it).
ni appearances of symbol i multiplies x by (pi)ni, so it corresponds to ni*lg(pi) bits of information.

Event of probability P corresponds to lg(1/P) bits of information, so ni appearances of i corresponds to (pi)- ni probability.
This coding is optimal for this probability distribution, and so not optimal for different ones - where the optimum can be easily approached by an entropy coder.
Using coding optimal for probability distribution Qi to encode probability distribution Pi costs Kullback-Leibler divergence bits/symbol more than optimum: sum_i P_i lg(P_i / Q_i).

Last edited:
Like unary coding, the Gödel's representation is optimal for some probability distribution - let's find it.
So x number contains lg(x) bits of information (usually with some floor, but it works very well also without it).
ni appearances of symbol i multiplies x by (pi)ni, so it corresponds to ni*lg(pi) bits of information.

Event of probability P corresponds to lg(1/P) bits of information, so ni appearances of i corresponds to (pi)- ni probability.
This coding is optimal for this probability distribution, and so not optimal for different ones - where the optimum can be easily approached by an entropy coder.
Using coding optimal for probability distribution Qi to encode probability distribution Pi costs Kullback-Leibler divergence bits/symbol more than optimum: sum_i P_i lg(P_i / Q_i).
I really appreciate the effort you put into your response, unfortunately, your level of understanding is far too advanced for me to conceptualize, there are a lot of information upon which you based your explanation that I do not have, so much so that I don't believe it's possible for me to catch up with you without intensive studying. I am just an amateur programmer, maybe even that is too much, I would understand if you can explain the step by step process through which you encode and decode the array.

Ok, let me be more clear, here is some asymmetric binary system - for binary alphabet, optimized for frequency p = Pr(y=1).

Encoding step from number (state) x of binary symbol y of probability p = Pr(y=1):

if y = 0 then x := ceil((x+1)/(1-p)) - 1
if y = 1 then x := floor(x/p)

Decoding step - it takes such larger x and decodes symbol y and returns to smaller x:

y = ceil((x+1)*p) - ceil(x*p) // in other words: 0 if fract(x*p) < 1-p, else 1
if y = 0 then x := x - ceil(x*p)
if y = 1 then x := ceil(x*p)

One can check that they are inverse of each other.
Now to encode a combination, let say 01000110000010 with four 1s and ten 0s (k=4, n=14), you start with x=1 and apply the encoding step for successive values: 0, 1, 0, ... and p being the frequency (probability) of 1s: p = 4/14 here.
Then to decode from the final large x you use decoding steps for the same p, getting the symbols in reverse order and finally returning to x=1.
This way you asymptotically (large n) need Shannon entropy: h(p) = -p lg(p) - (1-p) lg(1-p) bits/symbol, what is the minimal required as
binomial(n,k) ~ 2nh(k/n)
what you can derive from Stirling formula: n! ~ (n/e)^n.

For p=1/2 it is nearly what you have written (standard binary numeral system) - optimal when 1s are in approximately half of positions.
encoding step: x = 2x + y
decoding step: y = mod(x,2); x = floor(x/2)

Generally, you can use it by dividing your information into binary choices, estimating probabilities for each of them, and use above encoding steps with the assumed probabilities.

Huffman coding is optimized for probabilities being a natural power o 1/2.
Approximating probabilities means suboptimal compression rate - we use Kullback-Leibler more bits/symbol than required.
It is really terrible for so called skewed distributions: with a dominating probability, e.g. p = 0.99.
While Huffman has to use 1 bit for such symbol, its real informational content is lg(1/p) bits, what can be close to zero.
Asymmetric numeral systems allows to use nearly accurate probabilities with comparable speed (large alphabet, no multiplication), e.g. https://github.com/Cyan4973/FiniteStateEntropy currently used for example in all iPhones and Macs (default Apple compressor: LZFSE = Lempel-Ziv + Finite State Entropy).