Developers: This is why you need to ditch your morning checklist
- select the contributor at the end of the page -
Trashing my checklist sounds like a terrible idea. Are you sure about this?
Yes and no. Let me explain. To be clear, I'm not about to dispute the fact that a checklist is helpful. I'm sure it has saved you before. Monitoring systems is vital; the checklist's purpose is important. It's not the checklist itself that I'm suggesting you kill, but rather the method in which you're going about it. Because if you're like the rest of us, you've been doing it wrong. And I can assure you that once you adopt a new of doing it, you'll have successfully eliminated the checklist altogether.
To start, you'll need to ask yourself why you have the morning checklist. The most likely answer here is that systems have a tendency to break, right? We're in IT and it's our job to keep the lights on; to ensure all of the companies IT services are running in tip-top shape. The morning checklist is meant to serve as a reminder for someone to check in on all of these systems to ensure all things are in proper working order. So here's my big question: Why does it take someone to do that? Why can't something fill that role without any of us humans having to intervene?
So, what you're saying is that I should just let my computer handle it?
Yes, that's exactly what I'm saying. Humans need to eat and sleep. We're messy, error-prone beings that get tired and make mistakes and simply can't watch over IT systems all the time. But computers can – and they have for a long time now. By adhering to an old-school method of checking your systems, you're doing yourself (and your IT department) a huge disservice. Think of how much time you spend on your checklist each morning and consider other important tasks you could be handling instead.
And let's not forget the fact that while we may only spend an hour or so each day monitoring these systems, computers can take over the task on a 24/7 schedule. On top of that, once a problem is detected, it can be remediated automatically without anyone noticing. What's more, is there are already dozens of free and paid monitoring solutions available that can run circles around you and your morning checklist.
Products like Microsoft's System Center Operations Manager, Spiceworks, WhatsUp Gold, Nagios and PowerShell scripts, to name just a few, can easily be created to monitor and remediate issues in near real-time. Now, tell me, why are you still spending an hour each day doing something that a piece of software can already do better than you?
I'm still not convinced. What if my checklist has never failed me?
Consider this: You've got a file server running relatively low on space. During your morning checklist you notice this but there's plenty of space for now, so you just take a mental note. During the night, a nefarious user decides to backup his entire computer to his home folder which immediately takes up 100GB of your already low disk space server. A few short hours later, that temporary backup process ( the one that needs some “swap space” to do a VSS snapshot) takes up another 300GB and, just like that, the server is taken to its knees where it blue screens. This entire process happened during the night before you even got to work. Suddenly, your outdated morning checklist is rendered useless. This is just a simple example, but you get the idea.
Takeaway
We need to accept that we simply can't keep an eye on all of our servers all the time. Things break out of our control, no matter how dedicated we might be to ensuring they don't. It's time to ditch your morning checklist, implement a monitoring solution and go on about your day knowing your servers are in good hands.