The most important principle of managing a software organization

During my years of consulting, I’ve run into many managers (in enterprises and startup companies alike) who just didn’t get this whole “technical debt” thing. You’ve run into those managers too, I’m sure - the kind of manager who issues pressing deadlines, marks bug reports for left-over time (which never comes), relies on VMotion or fancy SAN for high availability, refuses upgrades because of “risks” and urges you to “stop wasting your time listening to tech talks and go write features”. I’ve even heard managers tell engineers “don’t waste your time learning this framework, just write your system with it” (!!!)

I’ve often wondered - why is it so hard for them to get something that is so painfully obvious to the majority of software engineers?

It seems many people who didn’t endure the horror of modern coding, have a mental picture of programming which is reminiscent of hardware engineering. In this model, specifications are well known, but more importantly the properties of construction material, the environment and workload are well known and fairly static. This doesn’t mean hardware engineering is easy, in fact it’s quite complex. But we have a long operating experience with silicon based transistors and we understand how they work pretty well; most defects are caught by QA and QC, and degradation in quality is slow and predictable. Once the hardware has been manufactured, engineers are free to move on to the next product.

This is not the case with software engineering. First, we have Agile which tells us specifications can (and do) change during development. The workload and environment also change frequently, sometimes quite violently. But the biggest problem is that we do not understand the dynamics of the system we are building on top of. This statement may seem preposterous, but many software engineers agree with it. We are building our systems on top of numerous layers of other software - all of them complicated, adaptive and flexible. The ground is quite literally shifting beneath our feet in ways we do not understand. And if that’s not bad enough, we continuously adapt and evolve our own code in response to the environment, the workload and our new understanding of the world. Our laws of physics are in constant flux.

Software development quickly transitions from developing a system on solid foundation to researching how an existing system works and exploiting its laws of behavior to build more machinery on top of it. In other words, to a scientific research of the current system and engineering new constructs based on that science.

It often surprises me to see R&D departments investing heavily in Development and almost nothing in Research. Without Research, you can never build a good system because you do not understand the properties of your building blocks and environment - and this research can never stop.

There are several consequences to this. The first is that we need to invest time in researching the system. The second is we need to build visible systems: metrics and monitoring are a core feature because they allow us to research our systems (Building visible systems merits a blog post of its own). The third consequence is that we should try to make our systems understandable: we should strive to build software that can fit in our head - although we know this will fail. Even a simple and understandable system becomes strange and obscure once it becomes a sub-system of the larger system. It interacts with other sub-systems in surprising and unexpected ways, and the number of possible states rises sharply. As the system matures, its behavior becomes more stable and known and we can choose to forget about its internal structure, treat it as a black box and build new systems on top of it. In much the same manner that the chemist does not need to master Quantum Physics in order to be productive, we can use the known laws of the system without understanding it completely - but this doesn’t mean that we won’t need the Quantum Physicist around, because the Law of Leaky Abstractions is lurking in the shadows…