Converting unsigned integers to signed integers can lead to hidden bugs

‌‌

It’s no big surprise that strange things can happen during conversions of any kind, and working with unsigned-to-signed integers is no exception. But these types of conversions are especially tricky because they can lead to bugs that aren’t always noticeable and may end up affecting your C++ code. In order to better understand what we’re dealing with here, let’s first take a look at the bug in action, and then a possible solution for fixing the problem.

What can go wrong

In C++ there are two types of integer variables known as signed and unsigned. As briefly mentioned, when you convert large, unsigned integers to signed versions, weird things can (and often do) happen. With the fundamental type int, if you don’t add any additional keywords, you get a signed integer by default. On the other hand, adding the unsigned modifier (unsigned int), gets you an unsigned integer.

Unsigned integers are common when you have C++ code that interoperates with the STL library. In fact, if you want to get the item count in a std::vector, the vector’s size method returns a value of a type equivalent to size_t, which is an unsigned integer.

You can have other portions of your C++ code that may require quantities expressed in the form of signed integers. For example, many Win32 APIs (or the ATL’s CString class) expect length and size parameters in the form of a (signed) int. So, at some point in your code, you’ll end up with an integer quantity expressed as an unsigned integer (e.g. a size_t-value from an STL container) and you’ll have to pass it to APIs or class methods, expecting the same quantity but expressed in the form of a signed integer. This means that you have to convert from an unsigned integer to a signed integer. And that’s when things get tricky.

To get a better idea, consider the following compilable C++ code snippet:

#include <cstddef>
#include <iostream>
using namespace std;

int main()
{
    constexpr int Giga = 1024 * 1024 * 1024;
    size_t x = 3;
    size_t u = x * Giga;

    cout << "u = " << u << " (" << (u / Giga) << " GB)\n";

    int i = u; // (*)

    cout << "i = " << i << " (" << (i / Giga) << " GB)\n";
}

In this code, the line marked with an asterisk, the value of a variable ‘u’ of type size_t, which is an unsigned integer, is converted to a signed integer stored in the variable ‘i’. If you compile this code with Microsoft’s Visual C++ compiler in 32-bit mode, and at a reasonably high warning level (/W4), you’ll get a warning-free (and error-free) output. Seems like everything’s fine, right? Not exactly…

If you run that code, here’s what you’ll get:

u = 3221225472 (3 GB)
i = -1073741824 (-1 GB)

As you can see, we started with a (positive) unsigned integer (3GB). after the conversion from unsigned to signed integer, you end up with a negative value! This is due to the fact that while 3,221,225,472 (3GB) is a value that can be safely stored in a 32-bit unsigned integer, the same value exceeds the maximum positive value that can be stored in a signed 32-bit integer (this value being 2,147,483,647). Of course, getting 3GB converted to a negative value of -1GB doesn’t make sense in this context.

Now, if you imagine such unsigned-to-signed integer conversions immersed in more complex code bases, with several computations that add up, you can figure out what kind of subtle bugs can hide in your C++ code.

Checking integer limits before converting

To avoid these bugs, we can define a custom function for the conversion from unsigned to signed integers. Inside the body of this function, we can check if the input unsigned value is too big for a variable of type int. In this case, instead of continuing with a bogus conversion, we’ll throw a C++ exception to signal the conversion error.

The prototype of this function can look like this:

int SizeToInt(size_t u);

Inside the function, the first thing we need to do is to check if the value of the unsigned variable ‘u’ can be safely stored in a signed integer, without being converted to a negative bogus number.

In pseudo code, the logic looks like this:

If (u > maximum value that can be stored in a int)
  Throw an exception
Else
  Safely convert from unsigned to signed

So, how can you get the maximum value for a variable of type int?

Well, there’s a standard component available in the C++ standard library called std::numeric_limits. This is a class template that can be used to query various properties of arithmetic types, including int. Basically, you pass the name of the type as the T template parameter of numeric_limits<T>, and then you can call the class’ methods to query the desired properties. For example, “numeric_limits<T>::max()” will return the maximum value of the type T. Since we want the maximum value that can be stored in a variable of type int, we can simply invoke “numeric_limits<int>::max()”.

So, we can use this C++ code to prevent bogus unsigned to signed conversions:

#include <limits>    // for std::numeric_limits
#include <stdexcept> // for std::overflow_error

int SizeToInt(size_t u)
{
    if (u > std::numeric_limits<int>::max())
    {
        throw std::overflow_error(
            "size_t value cannot be stored in a variable of type int.");
    }

    return static_cast<int>(u);
}

However, note that the Microsoft Visual C++ compiler emits a warning complaining about a signed/unsigned mismatch regarding the use of “operator >” in the ‘if’ condition. In fact, ‘u’ is a variable of type size_t, which is unsigned; instead, the value returned by the numeric_limits::max call is of type int, which is signed.

We can easily fix that warning with the addition of a simple static_cast:

if (u > static_cast<size_t>(std::numeric_limits<int>::max()))

Now, if you replace the “int i = u;” statement with a call to the previously defined SizeToInt() function, an exception will be thrown, instead of storing a bogus negative value in the destination int variable. It’s also important to note that if the previous line is replaced using a more modern C++11’s brace-init syntax “int i{u};” the MSVC compiler emits an explicit error:

error C2397: conversion from 'size_t' to 'int' requires a narrowing conversion

Again, the use of our custom SizeToInt conversion function can fix that compiler error.

Neutralizing Windows’s max preprocessor macro

If we want this code to be usable in Windows C++ applications, we need to add a final touch to our SizeToInt function. Note that if you simply try to #include the main Windows SDK header <Windows.h>, the function’s code will generate several apparent weird compile-time errors. These weren’t present previously, they just popped up after including <Windows.h> -- so, what’s problem?

The problem is that Windows SDK headers define a custom max preprocessor macro (a matching min macro is defined as well), which conflicts with the C++’s std::numeric_limits<T>::max call, generating compiler errors. To prevent this conflict, you can use an additional pair of parentheses in the numeric_limits::max call, like this: “(std::numeric_limits<int>::max)()”. By doing this, the Windows’s max preprocessor macro won’t interfere with the numeric_limits::max method call.

So, the final implementation of our SizeToInt function looks like this:

#include <limits>     // for std::numeric_limits
#include <stdexcept>  // for std::overflow_error

inline int SizeToInt(size_t u)
{
    if (u > static_cast<size_t>((std::numeric_limits<int>::max)()))
    {
        throw std::overflow_error(
            "size_t value cannot be stored in a variable of type int.");
    }

    return static_cast<int>(u);
}

Working with 64-bit builds

I developed the previous scenario assuming 32-bit builds with the Microsoft Visual C++ compiler. But interesting things can happen happen in 64-bit builds, too. In this context, the size_t type is 64-bit wide, but the int type is still 32 bits. If you have a 5GB value stored in a variable of type size_t and convert that to an int, you end up with a 1GB value. When this happens, a large 64-bit unsigned integer number (5GB) is converted to a positive, but smaller, signed 32-bit integer (1GB). Again, this can lead to subtle bugs in code, with “magic” numbers popping up in a meaningless way, and more complex calculations leading to bogus results.

If an unsigned integer can’t be meaningful when converted to a signed one, it’s better throwing an exception than having subtle bugs hiding in code. Note that the previously developed SizeToInt function also works fine in 64-bit builds.

Takeaway

As you can see, even assuming the same number of bits used to store an integer, signed and unsigned integers have different positive value limits. Converting a large positive unsigned number to a signed one can lead to bogus results (like negative numbers!) and hard-to-find bugs. The key point of our custom conversion function is to check the value of the input size_t unsigned variable, and then throwing an exception if this value is too big to be stored in a variable of type (signed) int. It’s essential to note that the conversion is only done when it’s meaningful. Having an exception thrown is certainly better than hard-to-spot bugs that pop up from big, unsigned integer numbers converted to negative (or smaller) signed integers.

 

Ready to learn more? Check out my courses here.

Contributor

Giovanni Dicanio

Giovanni Dicanio is a computer programmer specializing in C, C++ and the Windows OS. He is a Pluralsight author and Visual C++ MVP, with computer programming experience dating back to the glorious Commodore 64 and Commodore Amiga 500 golden days. Giovanni enjoys several aspects of development: from design to actual code writing, to debugging and code reviews. Feel free to contact him at giovanni.dicanio AT gmail.com or visit his website. Besides programming and course authoring, he enjoys helping others on forums and communities devoted to C++.