Quantity is for Production, Quality is for Performance

By Carol Lee, PhD

Mar 24, 2023 • 12 Minute Read

Quick Summary

Current state: Engineering organizations often use the single word “productivity”, when they are actually describing 3 different concepts: Production, Productivity, and Performance.

A better way: Asking ourselves which of these three concepts we are learning about better reflects different types of information we have about engineering work. Disaggregating these important, interrelated, but different concepts can help us understand why engineering teams struggle to define, and often strongly disagree, about what a signal for productivity is. Understanding productivity as an ecosystem can help engineering leaders understand what to measure, depending on which element they decide to change. And importantly, thinking through these three concepts can help engineering leaders understand what information they really have about engineering work.

Introduction

Despite a longstanding emphasis on developer productivity, engineering teams continue to struggle to define and measure it. This is in large part due to two things:

We tend to conflate related constructs like production and performance, which often leads to making conclusions from the wrong metrics.
It is difficult for us to maintain awareness that developers’ high performance is impacted by the constraints of their environment, and whether they work in sustainable cycles of problem-solving, which often means we fail to accurately see how definitions of success might change depending on the constraints at hand, and when we take a longer perspective over time.

Compounding the inherent difficulty of thoughtfully defining productivity, software organizations have largely failed to come to a standardized understanding of productivity. Inconsistent practices also means that many software teams essentially reinvent their own definitions of productivity, much of which remains unspoken and unshared between team members, and between individual contributors and managers (Storey, Houck, & Zimmerman, 2022). It is no surprise, then, that despite the importance of developer productivity, engineering leaders struggle to define, measure and set “productivity” targets within their organizations.

But there is a growing body of knowledge that engineering leaders can draw on to guide their decision-making. Research across the social and clinical sciences and software engineering has produced better definitions of productivity by breaking down “what we mean” when we use the term productivity.

Understanding and contextualizing these different definitions provides a guiding pathway for engineering managers and leaders, who may wish to introduce new metrics, adapt existing measures, or improve the shared understanding of organizational goals within and between software teams. In this report from the Developer Success Lab, we provide a conceptual framework for understanding three different “layers” we might be talking about, when we talk about productivity. Drawing from robust research, we unpack the different concepts which underlie common software metrics.

Productivity may be difficult to define, but it is not impossible to use evidence to help our teams understand whether we are making progress and working on the right things. By bringing this “three layer” lens to thinking about developers’ productivity and how it is being measured and compared within their organizations, engineering managers and leaders can take advantage of what we already know about defining high performance.

Breaking down the layers underneath “Productivity”

Across social, behavioral, and management science models of productivity, productivity refers to the quantity of output, given the resources provided (see Tangen, 2005 for a review). This is in stark contrast to production, which refers to the amount of output, regardless of resources provided and used (Stainer, 1997; Bernolak, 1996). Because productivity fully depends on the resources used and provided, production could be identical across two teams, while productivity vastly differs.

Finally, high performance refers to work that is not only productive in its context, but also achieves high goals of quality, and moves beyond individual project success to create sustainable, resilient patterns of innovation (Spreitzer, Porath, & Gibson, 2012). In a true high performance cycle, iteration and virtuous cycles of learning propel quality forward, making teams more easily achieve consistent “productivity” with less effort.

These different definitions can be useful for different goals, and different types of engineering metrics can be better understood if we think of them as signals for production, productivity, and performance.

In the following sections, we step through an example with two hypothetical engineering teams. The measures in these examples are not intended to be an ideal way to measure engineering work–rather, these are simplified examples to help us think about the three levels. In fact, no single software metric will ever provide a complete picture of engineering work! The power of layering in many types of observations is exactly what we can see when we start breaking down “productivity.”

Level 1. Production: output without context

Let’s say we are looking at two engineering teams: Team Unicorn and Team Dodo Bird. While Team Unicorn closed 100 pull requests this week, Team Dodo Bird closed 25. Because simply counting up the total number of closed pull requests is a production metric, that is, it assesses an output without any context, we can say that Team Unicorn had higher production than Team Dodo Bird.

If an engineering leader had only this information about Teams Unicorn and Dodo Bird, they would know something about the work that is getting done. But they would not truly have insight into productivity. Relying solely on production measures is dangerous, but production measures can still tell us something, particularly if we use them to look for large trends over time within the same team–if Team Dodo Bird had merged 0 Pull Requests in the past six weeks, this could be a red flag about a serious issue. However, our engineering leader becomes less certain about whether we could use this to compare between teams. For that, we need more information.

Level 2. Productivity: output given context

Now let’s say we’ve learned that team Unicorn is a large, well-funded team of 8 engineers with access to many people, tools, services, and resources. In contrast, team Dodo Bird is a small, scrappy team of 2 engineers, doing its best with the limited resources they have.

Since productivity examines production in the context of resource use and availability, we can say that, despite having different levels of production, both teams are high on productivity.

As shown by this example, because productivity measures inherently depend on the resources available and used by engineering teams, measures of production (e.g. raw number of pull requests) fail to capture the nuances of productivity.

If an engineering leader now takes in this information about Teams Unicorn and Dodo Bird, they might think about something like the rate of pull requests contextualized across the number of engineers. They might conclude that the two teams are equally productive. They might even think it is remarkable that Team Dodo Bird achieves this much production! Already, we can see that layering in more information, and especially context relevant to the effort, has helped this leader have a more accurate picture of the two teams. Even though counting “merged pull requests” is still an imperfect and incomplete measure of work, we already have more confidence in the conclusions that this leader is able to reach.

Level 3. Performance: add in quality & higher order dependencies

However, still missing from the picture is the teams’ performance. Performance, which is often conflated with “productivity,” refers to factors such as the flexibility, adaptability, dependability, sustainability, and quality of what is produced over time, all of which is influenced by factors such as resources, working conditions, developer experience, and sociocognitive factors that drive good problem-solving, such as the Developer Thriving Framework (Al-Darrab, 2000; Hicks, Lee, & Ramsey, 2023; Slack, Chambers, & Johnston, 2001; Stainer, 1997; van den Heuvel, 2010). Quality is hard to define, probably because it’s so important. But from robust research in human achievement, we do know that there are important signals for the high performance cycles that reliably result in quality work.

In the context of engineering teams, we might calculate DORA metrics (Peters et al., 2022) and try to find signals for the factors outlined in the Developer Thriving Framework, such as teams that prioritize learning and correcting maladaptive patterns over non-stop production, or teams that collaboratively define and measure success (Hicks, Lee, & Ramsey, 2023).

We might also consider risk signals: for example high code churn, long lead times, and low longevity of code written, can be seen to reduce performance over time. Risk signals can be particularly important to pay attention to, because we know that focusing on unsustainable production can look highly “productive” for a short time, but will ultimately lead to burnout, brittle or fragile systems, and work that falls apart under stress or duress (Trinkenreich, 2023). Spaghetti code, frictions that slow down or block code review, and accumulating tech debt can all be signs of an unsustainable production culture.

Going back to our example, let’s say that Team Unicorn is experiencing high code churn and lead times, and has low-longevity code. That is, their code quality is low. In addition, Team Unicorn has low levels of the important sociocognitive factors in Developer Thriving, such as a poor learning culture on their team. In contrast, Team Dodo Bird has low code churn and lead times, and high longevity code. That is, their code quality is high. Team Dodo Bird is also cultivating a positive learning culture, and the developers are frequently helping each other upskill and learn. Based on our definition of performance, we could say that Team Dodo Bird is higher performing than Team Unicorn.

Please set an alt value for this image...

Our engineering leader is now taking a different stance with this information: they’re looking at the activity of these two teams, they’re wrapping in appropriate and helpful context, and they’re also thinking about long-term impacts, and cross-team impacts. This is an example of a more mature consideration of evidence, and a deeper approach to “productivity.” A focus on sustainable performance might lead our engineering leader to ask how Team Dodo Bird has systems in place to create an environment conducive to high quality code.

For example, rather than creating a culture of anxious production by accumulating tech debt and focusing on rapidly producing code, Team Dodo Bird prioritizes a culture of support and learning that leads to more confident problem solving and effective collaborative decision making on how they work toward and measure success. Our engineering leader may now decide that Team Dodo Bird should share some best practices with Team Unicorn, or wonder just how much Team Dodo Bird would be getting done if they had 5 engineers instead of 2!

What now?

Given the distinctly different definitions of production, productivity, and performance, it is essential for managers and leaders to think about what they are aiming to improve. When we begin to measure things, the easiest thing to access and measure is usually activity. Activity is a useful building block, but it’s not the only type of information we might need.

Think about the ways that your own organization measures engineering work. It can be a useful and important exercise to ask what information leaders are using to make critical decisions about developers and their teams.

If you are measuring higher outputs, without information about or accounting for quality and the resources provided, you are likely focusing on production.
If you are measuring production given changes in resources available, you are likely focusing on productivity.
If you are measuring the lasting quality of your product & codebase, you are likely focusing on performance.

Moving from “productivity” to “sustainable performance” is not a small step for any leader or engineering organization. For example, there is not a single measure of “quality” that will work for all types of software work and contexts. However, one vital reason that engineering leaders should shift their understanding of productivity to include performance is because sustainable performance can unlock a virtuous cycle for engineering teams that helps to mitigate the impact of productivity downturns. Teams which are high on sustainable performance, not just high production, are more likely to maintain success in the long-term despite variations in the individual productivity of their engineers.

Imagine Team Unicorn and Team Dodo Bird both hit unexpected external friction that gets in the way of developers’ output. For example, because of an unexpected dependency, developers experience new friction in their internal tooling, making the code production in the entire engineering organization slower across the board.
In the face of this friction, Team Unicorn responds to the pressure by having its developers maximize code production. But because Team Unicorn failed to invest in understanding sustainable performance, it experiences further degradation of code quality, which ultimately creates developer burnout and higher software failure.
In contrast, because Team Dodo Bird has invested in quality processes such as code reviews, despite slower code production, Team Dodo Bird compensates for production slowness with the greater efficiency created by stable code. Developers’ effort is protected by Team Dodo Bird’s understanding of sustainable performance, and ultimately Team Dodo Bird experiences low burnout and sees no long-term impact on their velocity.

Co-Author

Cat Hicks, PhD

Social & Data Scientist

Cat Hicks is the Director of the Developer Success Lab and a social science leader in tech with expertise leading applied research teams to explore complex areas of human behavior, empirical intervention, and evidence science. Cat is a research affiliate in STEM education research at UC San Diego and an advocate for increasing education access. She holds a Ph.D. in Quantitative Experimental Psychology from UC San Diego, was an inaugural Fellow in the UC San Diego Design Lab, and has led research at organizations such as Google and Khan Academy.

Citations

Al-Darrab, I. (2000). Relationships between productivity, efficiency, utilisation, and quality. Work Study, 49(3), 97-103. https://doi.org/10.1108/00438020010318073
Bernolak, I. (1997). Effective measurement and successful elements of company productivity: the basis of competitiveness and world prosperity. International Journal of Production Economics, 52, 203-213. https://doi.org/10.1016/S0925-5273(97)00026-1
Hicks, C., Lee, C. S., & Ramsey, M. (2023). Developer Thriving: The four factors that drive software productivity across industries [research report]. The Developer Success Lab at Flow. https://www.pluralsight.com/developer-success-lab
Peters, C., Farley, D., Villalba, D., Stanke, D., DeBellis, D., Maxwell, E., Meyers, J. S., Xu, K., Harvey, N., & Kulesza, T. (2022). Accelerate: State of DevOps 2022. https://cloud.google.com/devops/state-of-devops/
Slack, N., Chambers, S., & Johnston, R. (2001). Operations Management, 3rd ed. Pearson Education Limited, Harlow.
Spreitzer, G., Porath, C. L., & Gibson, C. B. (2012). Toward human sustainability: How to enable more thriving at work. Organizational Dynamics, 41(2), 155-162.
Stainer, A. (1997). Capital input and total productivity management. Management Decision, 35(3), 224-232. https://doi.org/10.1108/00251749710169431
Storey, M. A., Houck, B., & Zimmermann, T. (2022). How developers and managers define and trade productivity for quality. CHASE ‘22: Proceedings of the 15th International Conference on Cooperative and Human Aspects of Software Engineering, 26-35. https://doi.org/10.1145/3528579.3529177
Tangen, (2005). Demystifying productivity and performance. International Journal of Productivity and Performance Management, 54(1), 34-46, https://doi.org/10.1108/17410400510571437
Trinkenreich, B., Stol, K. J., Steinmacher, I., Gerosa, M., Sarma, A., Lara, M., ... & Bishop, K. (2023). A Model for Understanding and Reducing Developer Burnout. arXiv preprint arXiv:2301.09103.
van den Heuvel, S. G., Geuskens, G. A., Hooftman, W. E., Koppes, L. L., van den Bossche, S. N. (2010). Productivity loss at work; health-related and work-related factors. Journal of Occupational Rehabilitation, 20(3), 331-339. https://doi.org/10.1007/s10926-009-9219-7

Carol L.

Carol Lee leverages her expertise in mental health and thoughtful measurement to study how developers cope and thrive through stressful circumstances. Carol has over a decade of experience leading academic and industry research in clinical health, measurement, and human behavior. Carol serves as a research fellow at the Integrated Behavioral Health Research Institute and as a clinical science advisor for Bravely Mental Health. She holds a Ph.D. in clinical psychology from UMass Boston.

More about this author