I What is the optimal ID key size for minimizing data usage and avoiding overflow?

  • I
  • Thread starter Thread starter volican
  • Start date Start date
volican
Messages
41
Reaction score
0
Hi, say I have billions of things that I would like to individually name (give a unique identifying number), what is the best way to go about that? I would like to minimise the amount of data that is needed to uniquely identify each object, as it then takes less memory to store and transmit. So far what I have been thinking along the lines of concatenating the date and time of creation. This is fine but if things are made concurrently this would not work. Any ideas?
 
Physics news on Phys.org
Just use numbers from 1 to N?

More context would help.
 
If you have a central store that knows the ID already issued (or how many) then, as mfb says, it is hard to do it more compactly than simply increase a counter. With a 4 byte (32 bit) count you can identify up to 4 billion unique IDs. Using 8 bytes (64 bit) this increases to 18 * 1018.

However, if by concurrent mean you will have distributed registration of ID's (on a computer), then other schemes like GUIDs that uses 16 bytes may be more appropriate. While these are a practical solution for use by computer systems you should be able to extract the "mathematical" methodology behind this if that is what you want.

Since you are talking about storing and transmitting IDs in a forum for mathematical set theory it is really not clear what you are after. More context, as mfb said, would definitely increase the chances for others to give you relevant help.
 
Some of this discussion of IDs came up in the early years of computing where we started with Baudot code (5-bits) then ASCII (7-bit) then EBCDIC (8-bit) and extended ASCII (8-bit) then WideChar and Unicode (16-bit) ... The complaint often was too much storage is needed when we want to save documents written in language X. At the time, the byte was popular as a unit of storage and so 8-bit extended ASCII was a natural encoding (actually there were many coding standards that dictated the sorting of characters in the locale) to cover all the European alphabets.

The notion of indexing comes into play when you assign a number to a thing like a letter in an alphabet. Chinese, Japanese and Korean challenged the dominance of ASCII since they needed 2 bytes to represent a character. This means that a text document must use 2-byte characters to represent Chinese and English (0-byte + 8-bit ASCII) in the same document in memory and use a variable length coding like UTF-8 (1 byte for ASCII and 2, 3 ... bytes for Chinese, nulls and certain control characters not allowed) when saving to a file. Unicode further challenged the 2-byte notion when all the alphabets of the world are added into the mix.

The point here is that the byte size of your ID will determine how long it will be useful before it "overflows" ie runs out of number space. We have seen this happen with seconds, milliseconds and microsecond timestamps of 32-bit, 64-bit and 128-bit size causing the Y2K and related overflow events. You'll need to consider the maximal number of concurrent events that can occur to then determine the best ID key size that won't overflow too soon.

seconds + upto 2, 4, 8 ... 256 events
or
millisecs + upto 2, 4, 8 ... 256 events
or
microseconds + upto 2, 4, 8 ... 256 events
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top