Author avatar

Luke Lee

Introduction to Defensive Programming in Python

Luke Lee

  • Feb 27, 2019
  • 7 Min read
  • 37 Views
  • Feb 27, 2019
  • 7 Min read
  • 37 Views
Data Science
Python

Introduction

It's Friday afternoon, and your new release has been out for a few days. Your week began with a feeling of pride and relief, but your pride has slowly diminished as the week marched forward. It took a lot of effort and dedication to put out such a bug-free release. In fact, on the release date, you were confident that the next few weeks would be quiet as users didn't have anything else to need or want.

Of course, it was too good to be true and, not too long after the release, your first bug report came in. The first bug report was just something innocuous, a minor misspelling in a new dialog box. Then, a few more small bug tickets trickle in, which you quickly fixed and pushed to the repository.

Then it happened, every developer's worst nightmare, a bug reported in your most prized portion of the system. You frantically look through the code, even though you know it by memory. How is it possible that branch of code was even executed in this scenario? The code must be lying to you.

Fast-forward a few days into the bug hunt and you still have no clue how this happened. You cannot even reproduce the scenario in your testing environment. If only you had more debugging information about the failure...

The Truth Will Set You Free

You'll recognize this scenario if you've been writing software for any non-trivial amount of time. It's upsetting that, despite your best efforts, you've shipped broken software - again. Don't worry, it happens.

This is the part of the story where I reveal the magic bullet to solve this for you once and for all. Unfortunately, I can't and I don't think such a thing exists.

The hidden truth is all software has bugs. However, that doesn't mean we should give up and not strive for perfection. It just means we would be better served by slightly altering our perception of this reality. We should be writing software as if we are planning for defects. We should be writing software defensively, i.e calmly setting traps for the inevitable and unsuspecting bugs.

Defensive Programming

The best term to describe this style is Defensive Programming. The Wikipedia description doesn't quite capture what I have in mind, but it's a good starting point:

A form of defensive design intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software. The idea can be viewed as reducing or eliminating the prospect of Murphy's Law having an effect. Defensive programming techniques are used especially when a piece of software could be misused mischievously or inadvertently to catastrophic effect.

What I'm really talking about is a combination of the following guidelines:

  • Every line of code is a liability
  • Codify your assumptions
  • Executable documentation is preferable

Executable documentation is a term sometimes used to describe doctests. "Literate testing" is a term used to describe this concept.

These guidelines are crucial to ensuring that we can protect our code and sanity from inevitable bugs. Remember, we're operating from the standpoint that we aren't going to write bug-free code.

We need to keep the guidelines in mind to help us find bugs quickly. A lot of times, finding bugs is the hard part. So, let's optimize for finding, not the impossible task of preventing them entirely.

Python Tools

description

Let's take a deeper look at some of the tools available to help with following the guidelines. We'll use Python as our language for demonstration purposes, but most languages have very similar tools.

  • Asserts
  • Logging
  • Unit tests

Assume we have the following function which takes values from a user and will normalize the specified range of data into something between 0 and 1, which can be used by a new widget later down the road.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def normalize_ranges(colname):
    """
    Normalize given data range to values in [0 - 1]

    Return dictionary new 'min' and 'max' keys in range [0 - 1]
    """

    # 1-D numpy array of data we loaded application with
    original_range = get_base_range(colname)
    colspan = original_range['datamax'] - original_range['datamin']

    # User filtered data from GUI
    live_data = get_column_data(colname)
    live_min = numpy.min(live_data)
    live_max = numpy.max(live_data)

    ratio = {}
    try:
        ratio['min'] = (live_min - original_range['datamin']) / colspan
        ratio['max'] = (live_max - original_range['datamin']) / colspan
    except ZeroDivisionError:
        ratio['min'] = 0.0
        ratio['max'] = 0.0

    return ratio
python

Now, assume we have the following 'columns' that are returned by the get_column_data() function:

1
2
age = numpy.array([10.0, 20.0, 30.0, 40.0, 50.0])
height = numpy.array([60.0, 66.0, 72.0, 63.0, 66.0])
python

Let's verify that it does indeed turn our given range into something between 0 - 1:

1
2
>>> normalize_ranges('age')
{'max': 1.0, 'min': 0.0}
python

OK, that's a pretty short test, but it seems to work. We passed in a range of 'real' numbers and normalized it to something in the space of 0 - 1.

Remember the guidelines mentioned above? Let's find the bugs that exist in this function. What assumptions are we making implicitly in the code?

I can see a few assumptions:

  1. original_range contains only positive numbers
  2. Ratio returned is between 0.0 and 1.0

After careful examination, there are quite a few assumptions in the above code. Unfortunately, these aren't immediately obvious when glancing at the code. If only we could make these assumptions more clear...To dive into the different ways to clear up our assumptions, continue on to the following guides:

Conclusion

This style of development is tough to categorize and unfortunately, there aren't any solid rules to say when to use what. So, I encourage you to keep the guidelines in mind.

The guidelines will lead to a subtle change in mindset. The mindset change is important, not the tools and mechanisms themselves. Eventually you'll make some mistakes by overusing asserts or logging and start to form your own style. Also, the requirements for every project differ so it's important to learn all the tools and combine them in ways that make sense for your situation.

0