Test Driven Development (TDD) Research

By Amy Dredge

Jun 17, 2019 • 12 Minute Read

Introduction

Test-Driven Development (TDD) is a practice that has gained more traction in recent years. However, despite many having heard about TDD, its use is still not widespread. There are two reasons this might be:

Nobody wants to put the effort in.

I get it, learning can be scary and difficult.

We don’t think TDD is worth it.

TDD clearly doubles development time, right? And there’s little-to-no gain for the end-product, right? It just produces better code, but we don’t have time for that.

Let’s follow the thread on number 2. What would have to be true in order for TDD to make sense? What proof about TDD would make more of us embrace it? If the answer is that it makes developers faster and the code better, then you’re in for a treat because it (mostly) does.

I’m not going to dive into great detail about the results of TDD research (though I do summarize findings for the Industry category). The purpose of this guide is to encourage thoughtful questioning and judicious review when examining TDD research. After diving into the research, you will be equipped to determine whether you want to embrace TDD or steer clear. And with that clarity, hopefully number 1 becomes a non-issue - if TDD is worth the effort, it becomes much easier to put in that effort.

What Is TDD?

So first, what is TDD? There are a myriad of resources out there that explain and demonstrate TDD. Here is a five-sentence over-simplification:

Test-driven development is a process formed by repetition of a short development cycle, which is often referred to as the red, green, refactor cycle. You start by writing one test that describes the behavior the code should have, but does not yet have, so it fails - we call it a red test. Then you write minimal code to quickly make that test pass, committing whatever sins necessary in the process - now your test will pass and we call it a green test. Then you refactor the code as necessary to eliminate duplication or unwanted code created in merely getting the test to work - this is the refactor step. After refactoring, re-run the tests to verify nothing has been broken.

The TDD process allows tests to guide your implementation, resulting in more maintainable and higher-quality code. Maintainable and high-quality code aren’t benefits I just made up. The research backs it up.

The Research: Measures and Categories

Here is an overview of what factors most TDD research focuses on, and then we can dig a little deeper.

Don’t let me lose you here. If you aren’t a details person, skip the next few sub-sections and check out the Industry Conclusions.

Measurements Used

Most studies focus on four factors: internal quality, external quality, test quality, and productivity. Specifically, does TDD improve or hinder these?

Internal quality is about the design of the system. Does TDD make the code easier to work with and understand? Is the code more maintainable?
External quality is about defects. Does TDD reduce the number of bugs in the application?
Test quality is about, well, test quality. If engineers use TDD instead of other testing strategies, is there a material difference in the effectiveness of those tests?
Productivity is about developer effectiveness. Does TDD make software developers deliver software more quickly?

Categories of Studies

There are also different categories of studies, depending on the scale and subjects of the study: controlled experiment, pilot study, semi-industry, and industry.

Controlled experiments are academic laboratories or controlled industry environments with a defined research protocol.
Pilot studies are less-structured experimental tasks.
Semi-industry is where professional developers work on an experimental project or student developers work on a real project.
Industry studies are most similar to the real-world - they focus on industry teams using TDD as part of their everyday work.

Industry Conclusions

I’m not going to go over the findings for all measurements and categories. Here is a useful grid from my course,What Is TDD and Why It Is Not Unit Testing: Executive Briefing.

Notice the results for the Industry category:

Internal quality is better with TDD.
External quality is better with TDD.
Test quality is inconclusive. One study noted that TDD resulted in higher test coverage without impacting productivity. In the end, however, there isn’t sufficient evidence to draw a conclusion from measures like test density and test productivity. VTT Case Study
Productivity takes a hit in the short term, but likely improves in the long-term (numbers 1 & 2 lead to productivity wins in the long run). More research is needed to fully evaluate the effect on long-term productivity.

How to Evaluate Research

Since this guide is not an exhaustive dive into research (there are dozens of studies available), the important thing to learn is how to evaluate TDD research. After looking extensively for research on TDD, it became evident that there is a lot of variation in how TDD research is conducted and measured. So, while we can see common threads in the studies and draw some fairly consistent conclusions like the previous section, the variation may be significant to you. I’ll give you some tools here so that you can dig in on your own and decide if the variations are relevant to your situation and important to you.

Disclaimer: I’m not a researcher by trade or education, nor am I authorized to qualify or disqualify any research. These are specifically things that are valuable to consider when examining research about TDD.

The primary factors to consider come down to metrics, TDD process, control group, adherence, subject team and codebase size, and tech stack.

What metrics are used? How are they measured? For example, one study focuses on only test coverage when evaluating test quality. There is so much more to quality tests than just test coverage, so those studies may not be significant to you. Another example: How is bug frequency calculated? Is it based on reported bugs? What if there is a very heavy process in the way of reporting bugs? What about people who see bugs that don’t report them (most of us)? This could skew the results.
How is TDD practiced? Is it true TDD, or is it test-first? Some studies do not precisely explain the TDD process used, which means it may have been more of a test-first approach (one to many tests first, then production code) than TDD (one test, then simplest code possible).
What is the control that is used, if any? It’s usually a non-TDD team, which doesn’t allow us to understand the benefits of TDD vs. test-after. Or sometimes the control group is supposed to write automated tests after, but fail to do so consistently, which results in an inaccurate picture of the effect on developer productivity. (John Deere study referenced in Microsoft study). Pay close attention to the control used, because a poorly designed study can completely skew the results.
Is there any attention given to adherence? Is it practiced consistently? Did the subjects have prior experience with TDD? Did they have someone with experience teaching them, or were they learning on their own?
How big are the teams and projects studied? If you work in the industry, a study on an internal product with only 5800 lines of code (which I’ve seen claimed as an industry-level study https://arxiv.org/pdf/1711.05082.pdf) may not have significance for you.
Which languages and frameworks are used? Did their language of choice allow for access to mature testing libraries/frameworks? Or did it force them into a world where TDD is rare and fraught with roadblocks? Further, do you only care about studies where your language of choice was used?

Other considerations:

Any mention of how difficult it was for the subjects learn and become proficient with TDD? If the study lasted for two months, but it takes two months to become proficient with TDD, the results may not be accurate.

Unanswered Questions

There are claims about TDD that are difficult to measure, and therefore will likely remain to be invalidated claims. What are they?

TDD creates tests that serve as documentation. That documentation shortens developer onboarding, or hand-offs of codebases. It also increases resiliency to losing knowledgable people.
TDD increases developer retention, because it makes their jobs easier and more satisfying.
TDD increases developer ownership and concern for quality.
TDD helps developers focus on solving the right problems which can make them more efficient, and help ship software that better satisfies a customer need.

Are these factors important to you? How much weight do they hold in determining whether you use TDD? If they are primary factors, you must decide if you will rely on claims of TDD proponents and put it to the test (pun intended) or if you need strong proof before adopting TDD.

Some Resources

As mentioned, there are dozens of studies and relevant books out there. You’ll find these resources a great starting place, and if you want to dig in more you can look into their cited sources and find your own as well.

[Microsoft and IBM joint study] (https://www.microsoft.com/en-us/research/wp-content/uploads/2009/10/Realizing-Quality-Improvement-Through-Test-Driven-Development-Results-and-Experiences-of-Four-Industrial-Teams-nagappan_tdd.pdf )

They compared multiple teams working under the same manager. They examined the results from teams that used TDD, and those that didn’t. The big takeaways were 60-90% decrease in defects, and a 15-35% increase in time to complete the projects (but the teams agreed that it was offset by reduced maintenance costs).

[Test Driven Development: By Example by Kent Beck] (https://www.oreilly.com/library/view/test-driven-development/0321146530/)

This has more of a focus on how to practice TDD, but also includes fantastic explanations and evidence to support TDD.

[Making Software] (https://www.oreilly.com/library/view/making-software/9780596808310/) > Chapter 12 is titled How Effective is Test-Driven Development? which pulls in 22 clinical trial references and four general references.

Guest Editors' Introduction: TDD--The Art of Fearless Programming

Outlines some of the difficult-to-measure benefits of TDD, and a synopsis of the state of TDD research (published in 2007).

Conclusion

There are many studies out there that put TDD to the test. Most of them focus on quality and productivity, but there is quite a bit of variation in how TDD studies are designed. There are some resources that provide high-level overviews of TDD research, and that may be sufficient information for you. If you want to dive into the research, it’s important to look closely at different factors that influence the outcomes of the study. In the end, evidence from research is generally favorable for TDD. It improves code quality, decreases bug count, and likely leads to long-term productivity. I encourage you to take a closer look for yourself, and make a conscious decision to embrace or reject TDD.

Amy D.

Amy stepped into college without having written a single line of code, but after seeing her first application run, she quickly fell in love with writing software. She has enjoyed working on highly-collaborative cross-functional product teams as a software craftsman, particularly at Pluralsight. Currently an Engineering Manager at Pluralsight, she is passionate about helping others learn and improve their craft. Her passions include TDD, pair programming, lean development, sharing knowledge with others, and continually improving herself and her workplace.

More about this author