# Number compression for prime number storage  Page 1 of 1 [ 16 posts ]

therealbnut
Emu Egg  Joined: 17 Apr 2013
Gender: Male
Posts: 1

I realise this is a fairly old post, but I stumbled on this looking for something similar and thought you might be interested in my findings:

Assuming you want really large prime numbers, *and* you want random access to them:

I would encode every b primes as the number of "digits" base 256, and the number of bytes required to store that number, then a pointer to those bytes (arbitrary-precision arithmetic style). Storing these n primes indices in an array of bins. Then each n+kth prime (k < b) can be stored and linked from the i-th bin.

Each of these k values can be stored as an arbitrary precision integer, delta encoded prime gap, or prime gap with entropy encoding. The entropy encoding based on probabilities of the difference between the actual gap and an log(n) approximation of it. If the last method is used then the encoding should be as good as the log(n) approximation, and theoretically should be much smaller. This probability can likely be approximated well using prime gap lower and upper bounds (Wikipedia "Prime Gap" for good bounds) and the log(n) prime gap approximation.

The nth prime number can be found by finding (constant time) the ith bin responding to that n (one division). Then decoding and summing each delta and adding it to the prime stored in the bin until b+k=n. Overall I think this should be O(b), so b shouldn't be too large, but is constant.

A lookup to see if a number (P) is prime can be done by making upper and lower bound estimates based on a prime number approximation algorithm of your choice, then a binary search within that interval based on the number of digits, then bytes. This should be very fast, probably O(log(log(n))). Once the ith index before P is found, you have to sequentially decode the k primes until you find one after P (not prime), or P itself (is prime). Overall O(b log(log(n))).

The math involved for the probabilities etc. might be time consuming for large values, but that depends on the approximations chosen, and should be at worst (I guess) around O(log(n)log(log(n))). Overall the storage should be (if I'm right) roughly ~ O((n/b) log(n)+b log(log(n))) for n primes, compared to O(n log(n)) otherwise.