Integer types (short, long, long long, etc) or more

ChrisVer · Dec 14, 2016

I have one pretty basic question... What are the types short, long etc used when creating for example integers used for? Whenever I read the description, it says "it's able to store bigger numbers", but to be honest I don't see any practical use of/visualize such a description...
Why would I care if the output number is 0.000012424242512312421524512512521414242412 or 0.0000124?

S.G. Janssens · Dec 14, 2016

ChrisVer said:

Why would I care if the output number is 0.000012424242512312421524512512521414242412 or 0.0000124?

These are not integers. You would probably represent these as floating point numbers, most likely in double precision. When written in standard form, double precision floats usually store about 15 decimals. In all but special cases they are an approximation.

By comparison, integers (large or small) are exact representations. They occur, for example, in number theoretical computations where exact representation does matter. It happens that long integers are not even "long" enough. In those cases one sometimes makes use of special "big integer" classes.

phinds · Dec 14, 2016

ChrisVer said:

I have one pretty basic question... What are the types short, long etc used when creating for example integers used for? Whenever I read the description, it says "it's able to store bigger numbers", but to be honest I don't see any practical use of/visualize such a description...
Why would I care if the output number is 0.000012424242512312421524512512521414242412 or 0.0000124?

I think you are missing the point. Your example of it being OK w/ you that a number shows up a s .0000124 actually says that you only care if you get an answer that is accurate to 3 decimal places. Think about where we would be if science were restricted to computations that were only accurate to 3 decimal places.

Mark44 · Dec 14, 2016

ChrisVer said:

I have one pretty basic question... What are the types short, long etc used when creating for example integers used for?

These types are used to store integral values (no fractional part). The different types are used for different ranges of values. If storage is at a premium (as for example, in embedded systems), a programmer might want to use variables of type short or even char, as these typically are stored in two bytes and one byte, respectively. On many systems these days, the int and long types are four bytes. The long long type, if supported, is typically eight bytes.

The numbers in your examples are floating point values, not integer values. For this one, 0.000012424242512312421524512512521414242412, none of the standard floating point types (float, double, long double) would be able to store that number exactly.

ChrisVer · Dec 14, 2016

yes, the numbers was just typed hastily without any meaning (I wasn't even thinking about integers), but I was just trying to make my point clear that the first (large) number would be "irrelevant" compared to the second (smaller) number. As if I was told that the mass of a planet is 5*10^24 kg or 5.0000424*10^24 kg.

S.G. Janssens · Dec 14, 2016

ChrisVer said:

I was just trying to make my point clear that the first (large) number would be "irrelevant" compared to the second (smaller) number. As if I was told that the mass of a planet is 5*10^24 kg or 5.0000424*10^24 kg.

It really depends on your particular application whether the difference is irrelevant or not. If it is, you can get away with using a less precise representation (like single precision floats). However, in some applications the far decimals in standard form do matter, even if it were just for the reason that they may become more significant in a calculation that consists of multiple steps.

phinds · Dec 14, 2016

ChrisVer said:

yes, the numbers was just typed hastily without any meaning (I wasn't even thinking about integers), but I was just trying to make my point clear that the first (large) number would be "irrelevant" compared to the second (smaller) number. As if I was told that the mass of a planet is 5*10^24 kg or 5.0000424*10^24 kg.

So now you're down to one significant digit. I realize you are making the point that for some number it IS ok to have a even just a single digit of accuracy but I ask you again, where do you think we would be if we could only get a few significant digits in all of our computations? If that were all we ever needed, we might still have 8-bit computers.

Mark44 · Dec 14, 2016

ChrisVer said:

yes, the numbers was just typed hastily without any meaning (I wasn't even thinking about integers),

But you asked about data types that are integral.

ChrisVer said:

but I was just trying to make my point clear that the first (large) number would be "irrelevant" compared to the second (smaller) number. As if I was told that the mass of a planet is 5*10^24 kg or 5.0000424*10^24 kg.

The first number you gave is only slightly larger than the second one.

Your question seems to be about precision. In your later examples, the mass is given with higher precision in the second example. Depending on what you intend to do with these values, that higher precision in the second example might not be irrelevant. BTW, both of these numbers could be represented using either float or double variables. If floats were used, you would lose some precision in the second number.

jedishrfu · Dec 14, 2016

Another aspect from the computing point of view is that programs that work with small numbers that fit within a short format can use shorts rather than ints or longs to save space in memory and on disk and also save on processing cycles. In the early days of programming, saving space and time was of prime concern. Similarly for ints vs floats (fixed point vs floating point arithmetic) and floats vs doubles (less vs more precision).

The Y2K crisis is an example of early business programmers using too small a format for years and then having to figure out what to do when the year 2000 rolled up and they would be able to tell the difference between 1900 and 2000 with only two digits saved.

https://en.wikipedia.org/wiki/Year_2000_problem

Another one in the unix seconds crisis where a 32 bit int was used to store the seconds since January 1st, 1970 not realizing that a rollover would occur and that satellite systems and other clocks relied on seconds value. People feared that GPS satellites around the world would fail and that planes using GPS would come crashing down.

https://en.wikipedia.org/wiki/Unix_time

PC systems also had a similar seconds issue in that they based their seconds on January 1st, 1980.

There was also a great debate over character sets and whether 8-bit or 16-bit or 32-bit should be used. Companies in the Americas and Europe didn't have a need for anything beyond 8-bit, didn't want to waste valuable memory and resisted adoption of Unicode which helped Asian countries like Japanese, China and Korea represent their languages on the computer. UTF-8 coding came out of that tension for saving data to file and squeezing out unnecessary 0 bytes. The actual history is a bit more convoluted than I've described here with countries adopting different encodings for locale specific character data and maps to translate from one charset to another.

As you can the choices we make today can come back to bite us tomorrow so choose wisely.

newjerseyrunner · Dec 14, 2016

Computers represent data using binary. The size of the structure changes the number of digits that your binary can have in it.

0b10100110 (166) obviously fits inside of 8 bits, so it can be an "unsigned char"
0b110100100 (420) requires 9 bits to be represented, so it can't be. The minimum is a short.

When deciding what type to use, thing about the range of values that you need to store. You have no guarantee that this is constant, but on MOST compilers, these are the limits:

-128 to 127 - char
0 to 255 - unsigned char
-32,768 to 32,767 - short
0 to 65,535 - unsigned short
-2,147,483,648 to 2,147,483,647 - int
0 to 4,294,967,295 - unsigned int
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 - long
0 to 18,446,744,073,709,551,615 - unsigned long

long long is USUALLY the same thing as long. shorts are rarely used, I only ever use them in bitmasks with 20 year legacy systems. If you want something specific, there are usually integer types that are defined as a specific number of bytes (__int64 or int64_t will ALWAYS be 64 bit.)

Mark44 · Dec 14, 2016

newjerseyrunner said:

-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 - long
0 to 18,446,744,073,709,551,615 - unsigned long

For my compiler (VS 2015), long is the same size as int.

newjerseyrunner · Dec 14, 2016

Mark44 said:

For my compiler (VS 2015), long is the same size as int.

Not uncommon, MinGW does the same, and the longs are often where I see people get into trouble.

I see this a lot:

Code:

long uniqueId = static_cast<long>(mypointer);

That code might work, it might not. The problem is that if it works, people leave it there, then when they try to port their code, they get really frustrated trying to debug the crash.

Mark44 · Dec 14, 2016

newjerseyrunner said:
I see this a lot:
Code:
long uniqueId = static_cast<long>(mypointer);
That code might work, it might not. The problem is that if it works, people leave it there, then when they try to port their code, they get really frustrated trying to debug the crash.

Yeah, definitely not portable to assume that a pointer is a certain size.

rcgldr · Dec 14, 2016

jedishrfu said:

PC systems also had a similar seconds issue in that they based their seconds on January 1st, 1980.

The original PC didn't have a clock, just a timer that ran at 1.19318 mhz, normally programed to generate an interrupt every 65536 cycles to provide ~ 18.206482 ticks per second, or ~ 0.0549255 ms per tick. The AT had a real time clock where date was stored and retrieved as year / month / date, and time was stored and retrieved as hour / minute / second. The real time clock could also return day of week. It also provided a 1khz ticker via IRQ 8.

MSDOS (FAT-16 partition) file time from directory entry:

Code:

    offset 0x0016 - bits  0 -  4 = second_of_minute / 2  (2 second increment)
                  - bits  5 - 10 = minutes
                  - bits 11 - 15 = hours
    offset 0x0018 - bits  0 -  4 = day of month
                  - bits  5 -  8 = month
                  - bits  9 - 15 = year - 1980, so good until year 2108

- - - - -

Windows file time contains a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (UTC). (Good until well beyond year 30000).

Getting back to the original question, assume you use an integer to index an array. 16 bits is good enough for 65,536, 32 bits is good enough for 4 billion, but modern systems can have more than 4GB of ram, so a larger index would be needed, and the next logical size would be 64 bits. In the case of the IDE / SATA interface for hard drives, the logical block address was increased from 32 bits to 48 bits, while a PC BIOS interface optionally uses 64 bits for logical block address. Some system support 128 bit integers.

jedishrfu · Dec 14, 2016

Here's more on the epoch time used by various operating systems:

http://www.computerhope.com/jargon/e/epoch.htm

PC-DOS used January 1, 1980 vs MS Windows which used the earlier January 1, 1601 date.

Its true the first IBM PCs didn't have clock hardware but the OS did maintain a software based clock that had to be set at boot time.

I'll stop here lest we hijack the thread from the OP.

jedishrfu · Dec 14, 2016

Mark44 said:

For my compiler (VS 2015), long is the same size as int.

I think the C compiler standard specified that an int can't be any shorter than a short and can't be any longer than a long. This allowed the compiler developer some leeway in porting the compiler to a new architecture to optimize around the natural word size of the hardware.

jack action · Dec 14, 2016

ChrisVer said:

I was just trying to make my point clear that the first (large) number would be "irrelevant" compared to the second (smaller) number. As if I was told that the mass of a planet is 5*10^24 kg or 5.0000424*10^24 kg.

What you are talking about is accuracy vs precision and you are saying that you are not fussy about precision:

Your needs will depend on what you do with the data. Imagine the following equation:

##y = x - 5 \times 10^{24}##

If you have ##x = 5.0000424 \times 10^{24}## or ##x = 5.0000212 \times 10^{24}##, then ##y = 42.4 \times 10^{18}## or ##y = 21.2 \times 10^{18}##. One answer is twice the other. Such a big error might be relevant in your results.

It all depends on what you have to do with the numbers.

nsaspook · Dec 15, 2016

ChrisVer said:

I have one pretty basic question... What are the types short, long etc used when creating for example integers used for? Whenever I read the description, it says "it's able to store bigger numbers", but to be honest I don't see any practical use of/visualize such a description...
Why would I care if the output number is 0.000012424242512312421524512512521414242412 or 0.0000124?

Integer types in embedded programming, or operating-system kernels:
With languages like C there is often a requirement that it be used to directly interface with hardware with operations on specific memory locations on a series of memory registers. Fixed-width integers of varying byte sized memory locations with defined bit-fields allow us to efficiently access memory-mapped I/O and other structures with the precise field alignment, grouping and bit ordering needed.
https://www.ethz.ch/content/dam/ethz/special-interest/mavt/dynamic-systems-n-control/idsc-dam/Lectures/Embedded-Control-Systems/OtherNotes/Special_Topics_for_Embedded_Programming.pdf

Different types are also sometimes needed for alignment requirements and padding in data structures that are used to communicate information between machines or hardware.
https://en.wikibooks.org/wiki/C_Programming/C_Reference/stdint.h
http://www.catb.org/esr/structure-packing/

jbunniii · Dec 15, 2016

jedishrfu said:

I think the C compiler standard specified that an int can't be any shorter than a short and can't be any longer than a long. This allowed the compiler developer some leeway in porting the compiler to a new architecture to optimize around the natural word size of the hardware.

This is correct. Also, there are at least some minimum size guarantees for short and long. Short is at least 16 bits, long is at least 32 bits, and short <= int <= long. Note, this means that they could even all be the same size, as long as that size is at least 32 bits.

As for why we need to be able to hold longer numbers (addressing the OP's question), 32 bits just isn't that much. The maximum value you can store in 32 bits is just over 4 billion (half that if you need a sign bit). In addition to the obvious fact that computers have more storage than that these days, this limitation can be restrictive even in small embedded applications. For a simple example, consider a 32-bit timer which counts the number of milliseconds since the last time the device was reset. This will roll over in 4 billion seconds, which sounds like a lot, but actually it's only about 49 days. Many devices aren't reset very often, if at all, so this timer will roll over,and the software that uses the timer needs to accommodate this.

For an example where double precision floating point (64 bits) is borderline not enough, consider a GPS receiver, where a frequently occurring value is a timestamp, which is often represented as the number of seconds (including fractional seconds) since the start of the current week. A week contains 604800 seconds, and GPS timestamps need precision at least down to the nanosecond (one nanosecond is 0.3 meters at the speed of light), so by the end of the week a timestamp may look like 587492.382749382, and every one of those 15 digits is needed. A double precision floating point value can only represent 15 decimal digits, so it's a good thing the speed of light isn't a bit faster.

Integer types (short, long, long long, etc) or more

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect