Both signed and unsigned int storing signs?

dE_logics · Sep 5, 2009

In this code -

Code:

#include<stdio.h>
main()
{
	signed int a = -9485;
	unsigned int b = -9485;
	printf("%d", a);
	printf("%d", b);
}

I get the output as -

Code:

-9485-9485

And I was wondering b won't store the negative sign.

junglebeast · Sep 5, 2009

It's not storing the sign in an unsigned integer.

"%d" is the formatting code for a signed integer...so whatever input you give it will therefore be cast into a signed integer before being output to the console. "%u" is the formatting code to print an unsigned integer.

If you are using C++ then you can print out the value without converting it to a different type by using std::count.

dE_logics · Sep 5, 2009

junglebeast said:

It's not storing the sign in an unsigned integer.

"%d" is the formatting code for a signed integer...so whatever input you give it will therefore be cast into a signed integer before being output to the console. "%u" is the formatting code to print an unsigned integer.

If you are using C++ then you can print out the value without converting it to a different type by using std::cout.

I applied %u on b and this is what I get -

-9485
4294957811

This is the new code -

Code:

#include<stdio.h>
main()
{
	signed int a = -9485;
	unsigned int b = -9485;
	printf("%d \n", a);
	printf("%u", b);
}

shoehorn · Sep 5, 2009

junglebeast said:

If you are using C++ then you can print out the value without converting it to a different type by using std::cout.

Actually, it's more complex than that. Consider the following:

Code:

#include <iostream>
using namespace std;


int main()
{
    int a = -9485;
    unsigned int b = -9485;

    cout << "a = " << a << endl;
    cout << "b = " << b << endl;
    
    
    return 0;	
}

Your compiler should hit you with a warning (but not typically an error!) when you attempt to compile this because of the implicit conversion in this line:

Code:

unsigned int b = -9485;

Ordinarily, there's no difficulty with implicit conversions between signed and unsigned ints if the signed int is [itex]\ge[/itex] zero. But implicitly converting between a negative int and an unsigned int is something that's not defined by the standard and so is compiler dependent.

To understand what's really going on in the C case, look at what the C99 specs have to say on the subject:

C99 said:

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

For what it's worth, C++ has much, much stricter type enforcement than C. GCC will happily compile something like

Code:

unsigned int b = -9485;

without giving you any warning that it's dangerous unless you pass the '-Wconversion' flag at compile time. Implicit type conversions like this can be a nightmare in C because compilers will often just let you hunt for the bug on your own. Note also that you can lead yourself to a whole world of pain by casting the conversion explicitly; something like

Code:

int a = -9485;
unsigned int b = (unsigned int)(a);

will quite happily compile in C without giving you any warnings at all even if you pass the '-Wconversion' flag to the compiler. The reason is that the compiler will believe you know what you're doing if you've gone to the trouble of explicitly casting from signed to unsigned int. (Something similar happens in C++ if you go to the trouble of static_cast-ing from signed to unsigned.)

dE_logics · Sep 5, 2009

the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

oook...so that made the output chaos.

GCC will happily compile

Yep it did!

So, as I thought before, it won't happen that the sign will be simply removed...so we can't add this advantage to an unsigned datatype so as to simply convert a negative to positive.

The only advantage that we have with an unsigned type is that it has a larger range for a positive value.

junglebeast · Sep 5, 2009

dE_logics said:

So, as I thought before, it won't happen that the sign will be simply removed...so we can't add this advantage to an unsigned datatype so as to simply convert a negative to positive.

The only advantage that we have with an unsigned type is that it has a larger range for a positive value.

There are two advantages to using an unsigned type. First, you can represent numbers that are twice as large. Second, you no longer have to worry about handling negative values. For example, if you want to check that an integer coordinate is within some range [0,width] which is signed, then you need to check (val >= 0 && val < width) but if its unsigned, you can just check (val < width). Of course, there are additional disadvantages to using signed types as well...such as making your code more error-prone, and making loops that count downwards more difficult.

In C++ the conversion just reinterprets the bits. Imagine you have 4 bits and you want to store a signed integer. You need one bit to store the sign, this leaves only 3 bits to store the actual number...so your number must be in the range of 0-7. With an unsigned int, you can use all 4 bits to store the number, so it can range from 0-15. If you were to convert the unsigned 15 to signed, the last bit would be multiplication by -1 instead of +8, giving you a result of -7.

Hurkyl · Sep 5, 2009

junglebeast said:

In C++ the conversion just reinterprets the bits.

Actually, that's not strictly true -- the C standard allows a small variety of possibilities. One's complement, for example.

As a general rule, I would advise against writing code that relies on bit patterns of signed numbers or on the behavior of signed overflow or other similar things.

(A brief internet search suggests that casting to signed an unsigned value that is too big is actually undefined behavior -- and so is completely unreliable, except for the fact that most desktop computers implement integer arithmetic the same way and C implementations cater to that)

D H · Sep 5, 2009

Hurkyl said:

casting to signed an unsigned value that is too big is actually undefined behavior

Any "conversion to or from an integer type produces a value outside the range that can be represented" is undefined behavior. Long ago, I used to use

unsigned int some_variable_name = -1;

as an easy way to set some_variable_name to the largest possible unsigned integer. This worked on many different computers and with many different compilers -- until someone ported my code to some strange machine. My lesson-learned: Never invoke undefined behavior, intentionally or unintentionally. Doing so not only opens the doorway to Murphy's law, it opens the doorway and begs for Murphy to please come in.

Regarding
unsigned int foo; ... printf ("%d\n", foo);

Printf does not know what was passed to it as it takes a variable number of arguments. It uses the format list to interpret what was passed to it. For example,

int foo; double bar; ... printf ("foo=%f, bar=%d\n", foo, bar);

will compile and will yield some rather inscrutable output.

Hurkyl · Sep 5, 2009

For the record, 0u-1u would be a portable way to do that. I've nagged myself into doing that somewhat pedantically, and I feel a little better knowing it really is worthwhile!

junglebeast · Sep 5, 2009

Hurkyl said:

Actually, that's not strictly true -- the C standard allows a small variety of possibilities. One's complement, for example.

As a general rule, I would advise against writing code that relies on bit patterns of signed numbers or on the behavior of signed overflow or other similar things.

(A brief internet search suggests that casting to signed an unsigned value that is too big is actually undefined behavior -- and so is completely unreliable, except for the fact that most desktop computers implement integer arithmetic the same way and C implementations cater to that)

We are now talking about two different languages. Anyway, it can be a measurable reduction in computation time to use unsigned wraparound in this manner -- which is why I use it. When you have to wait around for hours for a job to complete, it starts to matter.

Note that integer conversation between signed and unsigned types is not undefined behavior in general,

Section 4.7:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo
2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation,
this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
3 If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field
width); otherwise, the value is implementation-defined.

Also in C++, you can always use a reinterpret_cast.

shoehorn · Sep 5, 2009

junglebeast said:

Also in C++, you can always use a reinterpret_cast.

You can, but using reinterpret_cast for anything - particularly something as dumb as casting where the result of the cast is explicitly undefined in the standard - is liable to get you fired more quickly than punching the boss.

If you really insist on using casts, the superior type-safety of C++ coupled with boost::*cast is the only sane way to go, in my opinion.

dE_logics · Sep 5, 2009

junglebeast said:

(val < width)

Width?...the width of the data type?...how exactly do we do this...looks pretty useful.

D H said:

int foo; double bar; ... printf ("foo=%f, bar=%d\n", foo, bar);

will compile and will yield some rather inscrutable output.

After a warning (in gcc)...I tired that.

If foo was 5, it would be printed as 4.999999998...or something like that.

dE_logics · Sep 5, 2009

But still the main question is sort of persisting...

"unsigned int b = -9485;"

Why was this able to store the number -9485 (when printed with %d using printf), where I would normally expect some chaotic value...or actually the maximum possible value as stored in b.

I bet this has to do with the bit pattern, which I don't know of...but I think these are ways to represent a certain data type in a range of memory addresses.

D H · Sep 6, 2009

junglebeast said:

Also in C++, you can always use a reinterpret_cast.

Also in our forum we have icons:

Many places have programming standards that explicitly forbid reinterpret_cast. OK, so just do some_type x = *(some_type*)(&y). Problem solved.
That won't work if another programming standard is to compile with no warnings -- and with -Wold-style-cast (or its equivalent) enabled.

dE_logics said:

Width?...the width of the data type?...how exactly do we do this...looks pretty useful.

sizeof()

After a warning (in gcc)...I tired that.

If foo was 5, it would be printed as 4.999999998...or something like that.

No, it would print as something completely bizarre. printf is a variable arguments function. This means arguments are passed to it in the most primitive form, the way C worked thirty years ago. The only things that were passed between functions in the original implementation of C were ints, doubles, and pointers. Printf interprets what was put on the stack according to the format list. If the format doesn't match up with what is actually placed on the stack things can get quite bizarre. Example:

Code:

int main ()
#include <stdio.h>
{
   printf ("%g\n%d\n", 5, 1.0/3.0);
   return 0;
}

Gives me
1.11391e-313
1431655765

dE_logics said:

But still the main question is sort of persisting...

"unsigned int b = -9485;"

Why was this able to store the number -9485 (when printed with %d using printf), where I would normally expect some chaotic value...or actually the maximum possible value as stored in b.

I bet this has to do with the bit pattern, which I don't know of...but I think these are ways to represent a certain data type in a range of memory addresses.

That assignment worked because of some rather arcane rules of C, and because the compiler was being "nice" to you. -9485 as an int has a certain bit pattern. So, just reinterpret that bit pattern as an unsigned int. This reinterpreted value will not be -9485.

Why printf printed -9485 with format "%d"? When you call any function all that is put on the stack is a value of some sort, a bunch of bits. In most cases, the compiler knows what should be put on the stack because of the function prototype. For example, sqrt expects a double as an argument. If you call sqrt(4), the compiler kindly converts that 4 (an int) into 4.0 (a double). printf is a special kind of function. It's prototype is int printf(char*,...). Those ellipses mean it is a variadic function. The compiler doesn't do any conversion. It just puts the arguments on the stack in their native type. The format string tells printf how to interpret the things on the stack. printf works great if the things you put on the stack agree with the types implied by the format string. It doesn't work so great if the format string and the arguments don't quite match.

dE_logics · Sep 6, 2009

because the compiler was being "nice" to you.

It appears generally that gcc is nice to human beings.

has a certain bit pattern.

So I presume that bit pattern is stored in the memory in a certain format...for e.g. the first bit of int is considered as for the sign.

Sorta did not get the behavior of printf...can you please explain the behavior directly without going out to the actual printf function?

From what I understood, the main question is still a mystery to me.

D H · Sep 6, 2009

I'll try again, with sqrt and printf. You can call sqrt(4), rather than sqrt(4.0), because the compiler knows that the argument to sqrt has to be a double. It knows this because you typed #include <math.h> in your source code, and that file in turn has the statement double sqrt(double);. Because the argument to sqrt has to be a double, the compiler inserts code to convert that 4 (an int) into a double (4.0) before calling sqrt.

What about printf? Suppose i and x are ints and doubles, respectively. This is perfectly valid code, and it does just what you expect:
printf ("i=%d\n", i);
printf ("x=%g\n", x);
The above wouldn't work if the compiler converted arguments the way it does with sqrt. The compiler doesn't do any such conversion because the prototype for printf is int printf(char *, ...);. The first argument is the format string. It has to be a string; the prototype says so. Those ellipses ("...") tells the compiler that that first argument is the only thing it needs to worry about. printf can take anywhere from zero arguments (printf("Hello world");) to dozens, maybe hundreds of arguments (there's some machine limit) after the format string -- and the compiler had better not do any conversion at all. printf is a special kind of function.

dE_logics · Sep 6, 2009

So if I ask printf to print a float with %d, it will assume the bit pattern of float to be that of an integer.

Since printf does not ask the compiler to convert the type, it does not, as a result we get chaos.

So for e.g -

Code:

short t = -1;

The first bit of the short type will contain information about the sign.

Here also -

Code:

unsigned short k = -1

The first bit will have the sign, since %u does not expect that, it will misinterpret the sign to be a part of the number, as a result we will get awkward output...right?

If we print k as an integer, the first bit is taken as a sign and so we get the right prints.

Is that right?

Both signed and unsigned int storing signs?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect