Author avatar

Luke Lee

Asserts and Assert Downsides to Defensive Programming in Python

Luke Lee

  • Feb 27, 2019
  • 12 Min read
  • 99 Views
  • Feb 27, 2019
  • 12 Min read
  • 99 Views
Data Science
Python

Introduction

For an introduction and dive into the theory around Defensive Programming, check out the first guide in this series.

Asserts

Asserts are very common in unit tests. In fact, Python has a large collection of customized asserts for unit tests. However, there is no reason this useful tool should be used simply in the testing world.

Assert statements within normal code are very useful as well. These statements take an expression and raise an AssertionError, along with an optional message if the expression is False.

Assume we have the following function which takes values from a user and will normalize the specified range of data into something between 0 and 1, which can be used by a new widget later down the road.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def normalize_ranges(colname):
    """
    Normalize given data range to values in [0 - 1]

    Return dictionary new 'min' and 'max' keys in range [0 - 1]
    """

    # 1-D numpy array of data we loaded application with
    original_range = get_base_range(colname)
    colspan = original_range['datamax'] - original_range['datamin']

    # User filtered data from GUI
    live_data = get_column_data(colname)
    live_min = numpy.min(live_data)
    live_max = numpy.max(live_data)

    ratio = {}
    try:
        ratio['min'] = (live_min - original_range['datamin']) / colspan
        ratio['max'] = (live_max - original_range['datamin']) / colspan
    except ZeroDivisionError:
        ratio['min'] = 0.0
        ratio['max'] = 0.0

    return ratio
python

If our above function claimed to always return a value between 0 - 1. Unfortunately, more stressing of our assumptions shows this isn't true:

1
2
3
>>> age = numpy.array([-10.0, 20.0, 30.0, 40.0, 50.0])
>>> normalize_ranges('age')
{'max': 1.0, 'min': -0.5}
python

As you can imagine, this scenario could easily go unnoticed for a long time and this return value could be propagated all over the code base. This is precisely the type of bug that's impossible to find and leads to the sad story that started this series of guides.

We could try to think of every possible value our users could pass and handle it properly. In fact, this is the right thing to do, but there's no guarantee that we won't miss something. We've already conceded the fact that programmers are fallible.

Luckily, we can use assert statements to code against our future selves now that we've accepted that we make mistakes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def normalize_ranges(colname):
    """
    Normalize given data range to values in [0 - 1]

    Return dictionary new 'min' and 'max' keys in range [0 - 1]
    """

    # 1-D numpy array of data we loaded application with
    original_range = get_base_range(colname)
    colspan = original_range['datamax'] - original_range['datamin']

    # User filtered data from GUI
    live_data = get_column_data(colname)
    live_min = numpy.min(live_data)
    live_max = numpy.max(live_data)

    ratio = {}

    try:
        ratio['min'] = (live_min - original_range['datamin']) / colspan
        ratio['max'] = (live_max - original_range['datamin']) / colspan
    except ZeroDivisionError:
        ratio['min'] = 0.0
        ratio['max'] = 0.0

    assert 0.0 <= ratio['min'] <= 1.0, (
            '"%s" min (%f) not in [0-1] given (%f) colspan (%f)' % (
            colname, ratio['min'], original_range['datamin'], colspan))

    assert 0.0 <= ratio['max'] <= 1.0, (
            '"%s" max (%f) not in [0-1] given (%f) colspan (%f)' % (
            colname, ratio['max'], original_range['datamax'], colspan))

    return ratio
python

We added a few assert statements that will alert us if we don't return values within the expected range. Let's see how these assertions change our small test case:

1
2
3
>>> age = numpy.array([-10.0, 20.0, 30.0, 40.0, 50.0])
>>> normalize_ranges('age')
AssertionError: "age" min (-0.500000) not in [0-1] given (10.000000) colspan(40.000000)
python

This small change has several benefits:

  • Serves as a form of executable documentation
  • Places warnings closer to the root problem
  • Includes valuable debugging information about the "invalid" parameters

1. Serves as a Form of executable documentation

Typically documentation comes in a few different flavors such as in-line or block comments, docstrings, and sphinx. Each of these serves a specific purpose and are almost essential to software development. Unfortunately, they all suffer from the same issue: They can quickly get out of sync with fast changing code and requirements. This leads to documentation that the developer cannot trust.

Asserts act as documentation with a different purpose. They clearly and concisely describe the expected state of the application at run time. In addition, the application will complain if we change our assumptions without modifying the asserts to match the new behavior.

Assert statements are much more likely to be updated alongside other changes. Therefore, asserts are more trustworthy than non-executable documentation. In addition, asserts still provide many of the benefits of comments, docstrings, etc.

It's worth pointing out there's another form of executable documentation that is quite common in the Python ecosystem known as doctests. These tests/documentation can be somewhat ugly to look at, but their key feature is they are close to the code, just like asserts.

2. Places Warnings Closer to the Root Problem

We've all been there, you debug a problem for hours and realize the real bug wasn't even close to where you started (see 5 whys). Maybe the root cause of the bug was logically far away from where you first saw the symptoms.

For example, you find a byte string deep in your system, but you assumed everything internally was Unicode strings. It could take a long time to find where the conversion was first broken. This is a frustrating situation to be in. It would be nice to have found the bug much sooner or at least have more debug information.

Asserts aren't going to prevent this situation, but they do offer a chance to improve it. The above asserts will alert us the moment this function doesn't obey its contract to return a value between 0 - 1. This could give us a valuable clue later, if we find other code that has an invalid range. We'll know this function didn't live up to its end of the contract. This one clue could literally save hours since it can avoid tracing back from the symptom all the way to the cause.

3. Includes Valuable Debugging Information About the "Invalid" Parameters

Notice that our assert statements also included information about the input parameters. This information will be invaluable when a user encounters the bug using data we don't have access to. Also, the debug information will be especially useful when the user has trouble explaining the error scenario. So, these few assert statements could prevent you being the guy who disgracefully marks the bug with an "unreproducible" status.

The input parameter information also has a few other subtle benefits:

  • Displays an invalid assumption about what type of data users are running.
  • Explains oversight in our documentation about what type of data is expected.
  • Exposes potential new use cases that cannot be executed.

Assert Downsides

We've established that asserts can provide a ton of benefits, but it's not all fun and games. As usual, there are downsides.

1. Debug Mode

Typically, for both technical and practical reasons, assert statements aren't meant for production code. Asserts are only enabled when the hidden debug constant is True. However, the default value for this constant is True, which means your code is most likely currently shipping in debug mode.

This is something to consider if your application is in an environment where small amounts of additional logic is noticeable. The only way to turn off debug mode is to run the Python interpreter with the -O option.

2. Increased Code Noise

It's very easy to overuse asserts and quickly make your code difficult to read. This can make your code very noisy and bury the real functionality in a series of error checks and conditions. The following code is an example of overuse and how it's difficult to see what the code is meant to do.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def normalize_ranges(colname):
    """
    Normalize given data range to values in [0 - 1]

    Return dictionary new 'min' and 'max' keys in range [0 - 1]
    """

    assert isinstance(colname, str)

    original_range = get_base_range(colname)
    assert original_range['datamin'] >= 0
    assert original_range['datamax'] >= 0
    assert original_range['datamin'] <= original_range['datamax']

    colspan = original_range['datamax'] - original_range['datamin']
    assert colspan >= 0, 'Colspan (%f) is negative' % (colspan)

    live_data = get_column_data(colname)
    assert len(live_data), 'Empty live data'

    live_min = numpy.min(live_data)
    live_max = numpy.max(live_data)

    ratio = {}

    try:
        ratio['min'] = (live_min - original_range['datamin']) / colspan
        ratio['max'] = (live_max - original_range['datamin']) / colspan
    except ZeroDivisionError:
        ratio['min'] = 0.0
        ratio['max'] = 0.0

    assert 0.0 <= ratio['min'] <= 1.0
    assert 0.0 <= ratio['max'] <= 1.0

    return ratio
python

Proper Assert Usage

Use asserts sparingly and for things you assume are never supposed to occur. Don't go overboard using assertions to check for invalid input.

There are no hard and fast rules for this and each developer might have a different tolerance for assert usage. Try to adopt some standards of your own and include them in your developer style guide.

You do have a style guide right?

Also, remember Python embraces duck-typing so don't ruin this by going overboard and using asserts to verify all your types.

One technique I've found useful is to catch all AssertionError exceptions at the top-level of my application and combine them with another useful technique.

Conclusion

This style of development is tough to categorize and unfortunately there aren't any solid rules to say when to use what. So, I encourage you to keep the guidelines in mind. Continue on to the next guide in this series to learn more about logging for defensive programming.

The guidelines will lead to a subtle change in mindset. The mindset change is important, not the tools and mechanisms themselves. Eventually you'll make some mistakes by overusing asserts or logging and start to form your own style. Also, the requirements for every project differ so it's important to learn all the tools and combine them in ways that make sense for your situation.

0