Signed and unsigned integer expressions?

AI Thread Summary
The discussion centers on a warning encountered in C++ regarding comparisons between signed and unsigned integer expressions, specifically in a string manipulation program. The warning arises from the comparison of a signed integer index to the unsigned return value of the `size()` method, which can lead to unexpected behavior if the signed integer is negative. It is suggested that using `std::size_t` for the index variable would be a safer approach to avoid potential crashes when the string size exceeds the limits of a signed integer. Additionally, alternative coding practices, such as using iterators or range-based for loops, are recommended for better code safety and readability. Understanding these warnings is crucial for preventing future issues in programming.
Lord Anoobis
Messages
131
Reaction score
22
So I just completed a simple exercise that converts all letters of a string to upper case letters. The program works but it comes with the warning: comparison between signed and unsigned integer expressions [Wsign-compare].
What does this mean? While whatever is causing the warning does not seem to have any effect on the operation of the program, I would like to what is happening here in case it shows up at some point in the future and DOES cause problems. Here's the prog...
Code:
#include <iostream>
#include <string>
using namespace std;

//Changes letters to upper case
void converter(string& sent_par);

int main ()
{
  string sentence;
  cout << "Enter a sentence: ";
  getline (cin, sentence, '\n');
  converter (sentence);
  cout << endl << sentence << endl;

  return 0;
}

void converter (string& sent_par)
{
  for (int i = 0; i <= sent_par.size(); i++)
  {
  sent_par[i] = toupper(sent_par[i]);
  }
}
 
Technology news on Phys.org
I ran the program in code::blocks and it worked perfectly (I saved it as TXT.cpp).

Capture.PNG
 
Titan97 said:
I ran the program in code::blocks and it worked perfectly.(I saved it as TXT.cpp)
View attachment 87782
Interesting. Is that Code::blocks 13.12?
 
Yes.
 
Titan97 said:
Yes.
Ghost in the machine of some variety or other then.
 
Should I post a video of it?
 
Titan97 said:
Should I post a video of it?
Nah, not necessary. Thanks for the input.
 
Due to how 2-complement signed numbers are designed you will probably be be OK when comparing signed and unsigned numbers if their values are in overlapping (intersecting) range of number. If they are not you will get stuff like -1 being equal to 4294967295 (for 32 bit numbers).
 
I would guess the warning comes from the expression i <= sent_par.size()where the two sides of the comparison is signed and unsigned, respectively? If so, the warning is to make you aware that the compiler cannot in general make this comparison without promoting the signed int to an unsigned int which bound to go wrong when the number is negative. For a general comment on how to handle such comparison, see [1].

In your case the code will work when sent_par has fewer elements than can be expressed in a signed int, which is around half the theoretical maximum size that size() method [1] can return. If it has more than that your code will probably crash at it tries to overwrite memory outside the sent_par string. A safer approach would be to declare i of type size_t.

[1] http://jwwalker.com/pages/safe-compare.html
[2] http://www.cplusplus.com/reference/string/string/size/
 
  • Like
Likes Lord Anoobis and harborsparrow
  • #10
As @Filip_Larsen noted, the warning results from comparing i to sent_par.size(). The first is a signed int, typically a 32 bit signed integer; the latter is of type std::size_t, typically a 64 bit unsigned integer. The results from comparing a small negative signed integer to a not-so-small unsigned integer can be rather surprising. For example, for any reasonably sized string, -1 < some_string.length() is false.

That said, there are better ways to write your function converter(string&). With modern c++, you should rethink what you are doing whenever you find yourself writing a loop of the form for (int ii = 0; ii < some_limit; ++ii). Some alternative formulations follow.

1. Iterators.
Using iterators instead of indices makes your code much more generic (and also, much more amenable to standard algorithms; see alternative #3). An implementation of your function converter(string&) using iterators:
Code:
void converter (std::string& sent_par)
{
  for (std::string::iterator it = sent_par.begin(); it != sent_par.end(); ++it)
  {
    *it = std::toupper(*it);
  }
}
2. Range-based for loop (C++11 and higher).
This new feature pretty much eliminates the ugliness/verbosity of c++ iterators. An implementation of your function converter(string&) using a range-based for loop:
Code:
void converter (std::string& sent_par)
{
  for (auto& c : sent_par)
  {
    c = std::toupper(c);
  }
}
3. The std::transform function.
You'll need to use #include <algorithm> to use this function. There's a lot of good stuff in the c++ algorithm library. An implementation of your function converter(string&) using std::transform:
Code:
void converter (std::string& sent_par)
{
  std::transform (sent_par.begin(), sent_par.end(), sent_par.begin(), std::toupper);
}
 
Last edited:
  • Like
Likes Lord Anoobis
  • #11
It is really good that you followed up on such a compiler warning. A lot of times, those warnings are not important--but they are always there for a reason. Understanding that reason can save your butt, someday.
 
  • #12
harborsparrow said:
It is really good that you followed up on such a compiler warning. A lot of times, those warnings are not important--but they are always there for a reason. Understanding that reason can save your butt, someday.

D H said:
As @Filip_Larsen noted, the warning results from comparing i to sent_par.size(). The first is a signed int, typically a 32 bit signed integer; the latter is of type std::size_t, typically a 64 bit unsigned integer. The results from comparing a small negative signed integer to a not-so-small unsigned integer can be rather surprising. For example, for any reasonably sized string, -1 < some_string.length() is false.

Filip Larsen said:
I would guess the warning comes from the expression i <= sent_par.size()where the two sides of the comparison is signed and unsigned, respectively? If so, the warning is to make you aware that the compiler cannot in general make this comparison without promoting the signed int to an unsigned int which bound to go wrong when the number is negative. For a general comment on how to handle such comparison, see [1].

In your case the code will work when sent_par has fewer elements than can be expressed in a signed int, which is around half the theoretical maximum size that size() method [1] can return. If it has more than that your code will probably crash at it tries to overwrite memory outside the sent_par string. A safer approach would be to declare i of type size_t.

[1] http://jwwalker.com/pages/safe-compare.html
[2] http://www.cplusplus.com/reference/string/string/size/


As you say, sometimes the warnings which show up are in a way spurious. In this case though, while things ran as required, the information provided by Filip Larsen and D H show that the issue is actually rather insidious.

Being a beginner I've never encountered
Code:
size_t
before now. Thanks a lot for this info, folks.
 
Last edited by a moderator:
  • #13
Whoops. Messed that up a bit.
 
  • #14
Lord Anoobis said:
Whoops. Messed that up a bit.
That's okay. I fixed it for you -- or so I think. Let me know if I didn't get it right.

Lord Anoobis said:
Being a beginner I've never encountered
Code:
size_t
before now. Thanks a lot for this info, folks.
You'll run into std::size_t again and again and again. It is pervasive throughout the C++ standard library. Ditto @harborsparrow, it truly is a good thing you followed up on this warning. It is also a good thing that you even saw this warning. That means you have your compilation options set at a reasonably high level.Note well: I used std::size_t rather than size_t. I'm pedantic, and I never, ever use using namespace std. There would be a good deal of unhappiness (and worse) if I saw that construct (using namespace std) in code that I ask someone to write for me. This construct is very widely considered to be extremely bad style amongst professional c++ programmers. Unfortunately, many introductory c++ texts use this construct everywhere.

Quoting from the Zen of Python, "Namespaces are one honking great idea -- let's do more of those!" What this means is (a) it's a great idea to get in the habit of creating namespaces, and (b) it's a bad idea to use constructs such as using namespace whatever that explicitly subvert the very concept of separate namespaces.
 
  • #15
D H said:
That's okay. I fixed it for you -- or so I think. Let me know if I didn't get it right.You'll run into std::size_t again and again and again. It is pervasive throughout the C++ standard library. Ditto @harborsparrow, it truly is a good thing you followed up on this warning. It is also a good thing that you even saw this warning. That means you have your compilation options set at a reasonably high level.Note well: I used std::size_t rather than size_t. I'm pedantic, and I never, ever use using namespace std. There would be a good deal of unhappiness (and worse) if I saw that construct (using namespace std) in code that I ask someone to write for me. This construct is very widely considered to be extremely bad style amongst professional c++ programmers. Unfortunately, many introductory c++ texts use this construct everywhere.

Quoting from the Zen of Python, "Namespaces are one honking great idea -- let's do more of those!" What this means is (a) it's a great idea to get in the habit of creating namespaces, and (b) it's a bad idea to use constructs such as using namespace whatever that explicitly subvert the very concept of separate namespaces.

Makes sense for when programs get more intricate I suppose. On the other hand, it makes sense to get into the habit of such practices early, yes? Bad habits being hard to break and so on. As for the fixing part, close enough. "As you say, sometimes the warnings which show up are in a way spurious. In this case though, while things ran as required, the information provided by Filip Larsen and D H show that the issue is actually rather insidious." That bit should be plain text as well but unless you feel it really should be altered, so be it.
 
  • #16
@Titan97, his code is indeed wrong, check your compiler settings. Compilers default to being fairly flexible, but good code should pass the strictest settings it has.

Forget size_t, that's just going to confused you. size_t is unsigned long (usually.)

Think of how your data is represented: bits. So here is an 8 bit value
00100101
That's 37 in decimal. How do negative numbers work? I suggest learning that, here is what it looks like:
10100101
The first digit of an 8 bit int is how you know if it's positive or negative. This actually only gives you 7 bits that you can use for storing value, since you need one for the sign. If you know your value is always positive and won't go negative, you can then use that last bit to store data.

So to summarize:
unsigned int - 32 bits of data (0 - 4294967295)
int - 31 bits of data + 1 sign bit (-2147483647 to 2147483647)

That's why you should warn when doing comparisons and why you should cast it, they have slightly different ranges.
 
  • #17
newjerseyrunner said:
@Titan97, his code is indeed wrong, check your compiler settings. Compilers default to being fairly flexible, but good code should pass the strictest settings it has.
There's nothing wrong per se with using a loop of the form for (int = 0; i < object.size(); i++) for an object of a reasonable size. Problems arise if the object is huge and it's size is greater than the maximum value an int can attain. In the code at hand, this problem would arise if the user enters a line that is over two billion characters long. This is rather unlikely.

A bigger problem with comparing signed to unsigned values occurs when the signed value is negative. The signed integer is widened to a signed integer with the same width as the unsigned integer and then converted to an unsigned integer by the standard promotion and conversion rules. Those promotion and conversion rules means that -1 is equivalent to 18446744073709551615 when compared to a std::size_t value on a modern computer with 64 bit integers. This in turn means that -1 is greater than or equal to any unsigned integer. This is massively confusing and massively counterintuitive. This rather than overflow is the primary reason this warning exists.

Compilers are rather dumb. The compiler warned about i <= object.size() because it has a rule against always warning about comparing signed and unsigned integers. This warning is almost certainly superfluous in this case. However, the compiler did not warn that the code used <= instead of the idiomatic <. Whether some_string[some_string.size()] is valid depends on the compiler and and on the version of C++. (Compare with std::vector some_vector. In that case, some_vector[some_vector.size()] most definitely is invoking undefined behavior.) He's a bit lucky that his code worked. It could have erased his hard drive, which is the canonical response to invoking undefined behavior.

Forget size_t, that's just going to confused you. size_t is unsigned long (usually.)
Not on my computer, and not on several other computers I use. There, std::size_t is unsigned long long.
 
  • #18
D H said:
There's nothing wrong per se with using a loop of the form for (int = 0; i < object.size(); i++) for an object of a reasonable size. Problems arise if the object is huge and it's size is greater than the maximum value an int can attain. In the code at hand, this problem would arise if the user enters a line that is over two billion characters long. This is rather unlikely.


Expanding on that, you have issues with unsigned numbers when you are iterating backwards:

Code:
for(unsigned int i = 0; i < 200; ++i)  //Fine
for(unsigned int i = 200; i > 0; --i)  //Infinite loop, i can never be less than zero

D H said:
Not on my computer, and not on several other computers I use. There, std::size_t is unsigned long long.

Is unsigned long long different than unsigned long? I've seen them different on weird machines, but usually not.

The standard specifies this:
unsigned shorts and ints must be at least 16 bits
unsigned longs must be at least 32 bits AND contain all valid pointers
unsigned long longs must be at least 64 bits AND contain all valid pointers

try this program if you want to see what's what
Code:
#include <iostream>
#define PRINT_SIZE(x) std::cout << #x << ": " << sizeof(x) * 8 << " bits" << std::endl
int main(int argc, char ** argv){
     PRINT_SIZE(unsigned char);
     PRINT_SIZE(unsigned short);
     PRINT_SIZE(unsigned int);
     PRINT_SIZE(unsigned long);
     PRINT_SIZE(unsigned long long);
     PRINT_SIZE(char);
     PRINT_SIZE(short);
     PRINT_SIZE(int);
     PRINT_SIZE(long);
     PRINT_SIZE(long long);
     PRINT_SIZE(std::size_t);
     PRINT_SIZE(void *);
     return 0;
}

My output on my 64 bit mac
Code:
unsigned char: 8 bits
unsigned short: 16 bits
unsigned int: 32 bits
unsigned long: 64 bits
unsigned long long: 64 bits
char: 8 bits
short: 16 bits
int: 32 bits
long: 64 bits
long long: 64 bits
std::size_t: 64 bits
void *: 64 bits
 
  • #19
newjerseyrunner said:
Is unsigned long long different than unsigned long?
Almost always. Try this on your mac, a windows machine, and a linux machine:
Code:
#include <iostream>
#include <typeinfo>
#include <cstddef>

int main ()
{
    std::cout << typeid(unsigned long).name() << '\n';
    std::cout << typeid(unsigned long long).name() << '\n';
    std::cout << typeid(std::size_t).name() << '\n';
}

unsigned long and unsigned long long can be different types, and they are on your mac. (This killed me with some templates.) The underlying type of std::size_t is very system/compiler dependent.
 
Last edited:
  • #20
D H said:
Almost always. Try this on your mac, a windows machine, and a linux machine:
Code:
#include <iostream>
#include <typeinfo>
#include <cstddef>

int main ()
{
    std::cout << typeid(unsigned long).name() << '\n';
    std::cout << typeid(unsigned long long).name() << '\n';
    std::cout << typeid(std::size_t).name() << '\n';
}

unsigned long and unsigned long long are different types (this killed me with some templates), and the underlying type of std::size_t is very system/compiler dependent.

Their typeids will always be different, for long and long long, I was referring to their layout in memory. For templates the type is important, for holding numbers, the bit size in important. C++ can't convert between two template types (the issue isn't the type, it's how templates work, they have different function pointers), but it can easily implicitly convert between number types (you shouldn't, but you can.)

My output of your program showed std::size_t as the same type as unsigned long, not long long on my mac.
 
  • #21
newjerseyrunner said:
The standard specifies this:
unsigned shorts and ints must be at least 16 bits
unsigned longs must be at least 32 bits AND contain all valid pointers
unsigned long longs must be at least 64 bits AND contain all valid pointers
Neither the C nor the C++ standard specifies that (the "AND" part). There is no requirement that any of the integer types be able to represent a pointer. If in some implementation, some integer type is capable of containing a pointer, the implementation should define the type uintptr_t (or std::uintptr_t in c++), but even that is optional.

The restrictions on size_t are amazingly small. The only requirement is that this is the type of the sizeof operator, and the standards allow implementations to limit arrays and allocated memory to very small sizes.
 
  • #22
You are right, my mistake, I'm not sure where I read that, but it's not in the standard text.

If anyone is curious where I'm looking: http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4527.pdf
That's the C++14 standard they are working on. I can't find the C++11 one.

It's worth noting though, the C++ Standards committee doesn't write any compilers. The standard is released usually long before compilers catch up to them. Clang for example, still doesn't support a select set of features from C++11. Compilers are also allowed to implement their own features: long long was just put in the standard, but most C++ compilers have had it for quite some time.
 
Back
Top