The first rule of code reuse is: don’t.
Ok, that’s (mostly) a joke.
But let’s explore some of the tradeoffs you make when deciding how and when to reuse code.
Along the way, we’ll see some interesting parallels to branches in version control.
Consider the following project structure:
PS.Listener are two applications deployed from the same codebase.
PS.DataAccess represents code that is used by both applications for accessing the database.
In our zeal for maximum modularity, we might be tempted to package up
PS.DataAccess and reference it through our package manager (such as npm).
Doing so introduces a layer of indirection that we should be aware of.
Before, when we had a direct code dependency in the same repository, any time we made an update to
PS.DataAccess, both of our applications would immediately get the new update (pending a deploy, of course).
If we’re referencing a package in our package manager, on the other hand, we now have the option to delay upgrading.
The longer we wait, the more painful it will be to upgrade.
Further, it’s also possible for each of our applications to reference different versions of the package.
That might seem like some nice flexibility to allow us to do things a little more piecemeal, but left unchecked we can end up in a situation with divergent versions (and behavior!) in production.
That might be fine for generic utility libraries, but pretty risky for something as domain-specific and critical as our data access code, for example.
(If we’re not comfortable with this level of coupling — or is it cohesion? — between our applications, then perhaps they shouldn’t be sharing a database!)
Responding to Change
Let’s consider a concrete change that we might make in our data access code.
Say we want to remove an old column from our database.
To do so safely and without downtime, first we must stop reading from the column and deploy both applications.
Then, we can stop writing to the column and deploy both applications.
Finally, we can drop the column from the database.
If we’re referencing the data access code directly, we’ll be encouraged to do the right thing — both from a code cleanliness and safety standpoint — and deploy all the affected applications at each step of the way.
If we’re referencing a package, we might make a mistake or simply end up leaving around that old column longer than we’d like because it was the easy thing to do.
This Sounds Familiar
Do you see the similarity between this discussion and branches in version control? Think about it for a minute. I’ll wait.
A common phrase you’ll hear in the Continuous Integration world is, “Integrate early and often.”
You may have also heard about trunk-based development.
The basic idea is that unmerged code is a liability, and any branches that you create from the mainline (“trunk”) should be as short-lived as possible. This helps ensure there are minimal conflicts and that things get tested together early.
Using packages for internal code reuse discourages this early integration of code, just like branches left unmerged in version control.
Just like with branching, this doesn’t mean you should avoid all package dependencies entirely.
You may have a valid reason to move the code to another repository, like if it’s shared between multiple teams. Just be aware of the extra indirection and potential maintenance cost.
As with most things in engineering, it’s a tradeoff.
There’s a spectrum that goes something like:
- Just copy the code
- Extract to a shared module in the same codebase; reference the code directly
- Share code through a package manager
- Extract a new application, in its own codebase, for the common functionality; interact through network calls or asynchronous communication
I like to start at the second option as a good default and move up or down as I feel it’s warranted.
Dependencies can be a liability.
You likely have enough third-party packages to deal with.
Think twice about making your own internal packages when a direct code reference (or copy!) will do.