Simplifying lexicographical comparisons with C++
- select the contributor at the end of the page -
Old-school C++ code can be a serious pain--thankfully, things have gotten easier in C++11. Let's take a look at some new standard library features that can be used to write simpler, clearer and more maintainable C++ code.
We’ll focus on a sample implementation of operator< to sort instances of a hypothetical Time class using standard lexicographical ordering (more on that later). And we’ll begin with some of that old-school C++ code with a boilerplate operator< overload implementation. We’ll then take a look at how this code can be simplified using some new C++11 library features like std::tie, std::tuple and std::make_tuple. Ready? Let’s dive in.
Sorting with lexicographical order
Let’s suppose we have a Time class with data members storing hours, minutes and seconds, and corresponding public getters (i.e. read-only accessors).
Here’s a possible implementation:
class Time { |
We want to make instances of this Time class sortable. For example, we may want to call std::sort on a container of Time instances. In order to do that we could introduce an ordering relation implementing a proper overload for operator<, like this:
bool operator<(const Time& a, const Time& b) |
This overload takes two const references to the two Time instances we want to compare, and returns true if Time a is less than Time b, and returns false otherwise. In this way, we’ve introduced an ordering relation for instances of Time, making them sortable by STL algorithms.
Ordering time values
Now, how can we implement the actual ordering relation? In other words, according to which rules does Time a precede (or follow) another Time b? The intuitive approach for ordering time values is to first order by hours, then by minutes, and then seconds (for example, 09:20:12 < 10:41:03). This approach is also known as lexicographical order: It’s basically a generalization of the alphabetic order of words, based on the alphabetic order of its component letters.
To get a better handle on this concept, consider what we do when we sort words. Take, for example, cat versus dog: Which comes first? We look at the first letters of the two words: Since c precedes d in alphabetical order, cat obviously comes before dog. And what about crocodile versus cat, which both start with the letter c? In these instances, we must look at the following letters in each words; since r follows a, crocodile will follow cat in our ordering.
Of course, when working with instances of the Time class, the components are not single alphabetic letters, but hours, minutes and seconds.
When implementing this Time lexicographical comparison, your first thought might be to write a series of if statements to compare the two input Time instances first by hours, then by minutes and, finally, by seconds. Conceptually, there’s nothing wrong with this kind of boilerplate code—except that it’s boilerplate, and potentially bug-prone.
Here’s a sample implementation:
friend bool operator<(const Time& lhs, const Time& rhs) { |
Simplifying the overloading of operator< using std::tie
Fortunately, there’s a better approach you can take. C++11 introduced a convenient function named std::tie, which can be used to create a std::tuple object based on some input values. In this case, the n-tuple is a 3-tuple or a triple (or a triplet), made by the particular Time’s hours, minutes and seconds, which are the three components we use for the comparison process. And since the STL already defines overloads for operator< (and other comparison operators, as well) to lexicographically compare tuples, we can just call operator< with the tuples returned by std::tie.
So, the idea is:
1. Given two instances of the Time class, for each instance use std::tie to create a tuple based on the Time’s hours, minutes and seconds.
2. Call operator< for the returned tuples; this operator< overload is already defined by the STL, so we don’t have any boilerplate or bug-prone code to write.
The first thing we need is to #include the <tuple> STL header for being able to use std::tuple and std::tie in our C++ code:
#include <tuple> // For std::tie, std::tuple |
And then our Time overload for operator< can be written as simply as:
// Inside the Time class definition: |
Wow—it’s just a simple single return statement, instead of a long series of if statements! As you can see, this code is much easier to write, read and understand than a bug-prone boilerplate sequence of ifs. Moreover, we raised the semantic level over a sequence of if statements, since we’re clearly expressing the intent of a lexicographical comparison of tuples (triples) instead of explicitly writing a series of if statements to define the “internal mechanics” of the comparison itself.
Note: If you’re unfamiliar with the lhs and rhs notation, these are acronyms for “left-hand side” and “right-hand side.” Left and right here refer to their respective positions in the “a < b” notation: so, in this case, ‘a’ is the LHS, and ‘b’ is the RHS.
Also, take note of how the operator< is declared as friend of the Time class, making it possible to access Time’s data members directly inside the operator< implementation (without calling public getters).
std::make_tuple vs. std::tie
We’ve now seen how it’s possible to use the std::tie function to write simple code to lexicographically compare data structures like time values made by hours, minutes and seconds. Since the operator< overload was declared as friend of the Time class, we could access Time’s private data members directly from the overloaded operator< implementation.
Now, let’s focus on a slightly different approach: What if we want to use public getters instead of accessing Time’s private data members?
We might be tempted to write something like this:
// Outside of the Time class definition |
But if we try to compile this code, we get several error messages. For example, the GCC C++ compiler complains with error messages like this:
std_tie.cpp: In function 'bool operator<(const Time&, const Time&)': std_tie.cpp:37:29: error: invalid initialization of non-const reference of type |
Let’s try to decode this error message. First, focus on the hat character ^ in the error message, which is pointing to the hours accessor call. This accessor returns a value of type int, since we used integers to represent hours (along with minutes and seconds). This return value is a temporary of type int (which is the pragmatic meaning of the “rvalue of type int” part in the error message).
The problem is that std::tie takes as input a non-const reference, in our case of type int: “int&”. And it’s a violation of a C++ rule to attempt to bind an rvalue (i.e. a temporary) to a non-const reference. This is because a temporary can evaporate, so in that case the non-const reference points to garbage, which is a bug.
In our previous code, when we used Time’s private data members, there was no problem, since std::tie’s int& parameters were bound to non-temporary data members. So the C++ compiler was happy with our code.
So, if our operator< overload can’t access private data members of the class instances to be compared, are we doomed to boilerplate ifs? Of course not. A possible solution in this case is to use another helper function made available by the STL library: instead of std::tie, we can use std::make_tuple.
std::make_tuple forwards its input arguments to build a std::tuple using values, not references, to the input arguments. In this way, the compiler doesn’t complain when we call public accessors returning temporaries. In fact, before the temporaries evaporate, their values are used to build a std::tuple. Unlike the tuples built with std::tie, the tuples built in this case with make_tuple contain values, not references, of the Time’s hours, minutes, and seconds components.
// |
Here's a demo of C++ compilable code. It can be used to experiment with the techniques discussed above.
Takeaway
We’ve looked at the problem of lexicographical comparison, introduced in the sample case of a Time class. We’ve seen that Time instances are compared using hours, minutes and seconds. We first wrote boilerplate and bug-prone code to implement the comparison mechanics by hand in an operator< overload, with a long series of if statements.
Then, we saw how this code can be enormously simplified by using a tool already available in the STL: std::tie, which can be used to build tuples. Since the STL already defines a lexicographical comparison for std::tuple, we can simply invoke operator< on tuples, without writing any boilerplate bug-prone code.
Finally, we saw how std::tie doesn’t work with accessors returning temporaries, and how in this case we can use std::make_tuple instead, to build tuples containing values (not references), and still simply invoke operator< on the returned tuples.
Bottom line? Stop trying to reinvent the wheel. If there’s a tool already available in the STL, you should use it instead of writing boilerplate bug-prone C++ code. By doing this, you’ll not only spare yourself a major headache, but you’ll almost always see better results.
Check out my courses here.
Thanks to Stephan T. Lavavej (Senior Library Developer at Microsoft Visual C++ Team) for technically reviewing this article.