Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I Unique identifier problem

  1. Jul 31, 2017 #1
    Hi, say I have billions of things that I would like to individually name (give a unique identifying number), what is the best way to go about that? I would like to minimise the amount of data that is needed to uniquely identify each object, as it then takes less memory to store and transmit. So far what I have been thinking along the lines of concatenating the date and time of creation. This is fine but if things are made concurrently this would not work. Any ideas?
  2. jcsd
  3. Jul 31, 2017 #2


    User Avatar
    2017 Award

    Staff: Mentor

    Just use numbers from 1 to N?

    More context would help.
  4. Jul 31, 2017 #3

    Filip Larsen

    User Avatar
    Gold Member

    If you have a central store that knows the ID already issued (or how many) then, as mfb says, it is hard to do it more compactly than simply increase a counter. With a 4 byte (32 bit) count you can identify up to 4 billion unique IDs. Using 8 bytes (64 bit) this increases to 18 * 1018.

    However, if by concurrent mean you will have distributed registration of ID's (on a computer), then other schemes like GUIDs that uses 16 bytes may be more appropriate. While these are a practical solution for use by computer systems you should be able to extract the "mathematical" methodology behind this if that is what you want.

    Since you are talking about storing and transmitting IDs in a forum for mathematical set theory it is really not clear what you are after. More context, as mfb said, would definitely increase the chances for others to give you relevant help.
  5. Aug 1, 2017 #4


    Staff: Mentor

    Some of this discussion of IDs came up in the early years of computing where we started with Baudot code (5-bits) then ASCII (7-bit) then EBCDIC (8-bit) and extended ASCII (8-bit) then WideChar and Unicode (16-bit) ... The complaint often was too much storage is needed when we want to save documents written in language X. At the time, the byte was popular as a unit of storage and so 8-bit extended ASCII was a natural encoding (actually there were many coding standards that dictated the sorting of characters in the locale) to cover all the European alphabets.

    The notion of indexing comes into play when you assign a number to a thing like a letter in an alphabet. Chinese, Japanese and Korean challenged the dominance of ASCII since they needed 2 bytes to represent a character. This means that a text document must use 2-byte characters to represent Chinese and English (0-byte + 8-bit ASCII) in the same document in memory and use a variable length coding like UTF-8 (1 byte for ASCII and 2, 3 ... bytes for Chinese, nulls and certain control characters not allowed) when saving to a file. Unicode further challenged the 2-byte notion when all the alphabets of the world are added into the mix.

    The point here is that the byte size of your ID will determine how long it will be useful before it "overflows" ie runs out of number space. We have seen this happen with seconds, milliseconds and microsecond timestamps of 32-bit, 64-bit and 128-bit size causing the Y2K and related overflow events. You'll need to consider the maximal number of concurrent events that can occur to then determine the best ID key size that won't overflow too soon.

    seconds + upto 2, 4, 8 ... 256 events
    millisecs + upto 2, 4, 8 ... 256 events
    microseconds + upto 2, 4, 8 ... 256 events
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted