One of the things that struck me the most when observing managers at work, and in particular newly instated managers, is how managers become more and more out of touch with the realities of work. There’s actually a lot of research on that from quite a bit of different perspectives. Safety research for example has interesting things to say about “work as imagined” and “work as done”. This doesn’t happen over night of course, but rather a slow process - and I found it has a lot to do with the shift from doing and experiencing to planning and monitoring. In many ways, this is a shift from intuition based thinking to analytic type II “slow thinking” which is very different and requires very different ways of working. Unfortunately, most managers don’t get this and continue using their intuition instead of formal models when conducting planning and monitoring - with disastrous results. This isn’t an argument against intuition or analysis, but one should be aware which method they are employing and act accordingly. As an example, it is interesting to explore the stark difference between estimates on individual level where intuition works very well and on the team/group level, where our intuition fails.
Although engineers are fairly good at estimating time of individual tasks, we are particularly bad at estimating projects. This partly due to erroneous methods used to aggregate time estimates, but as there is ample evidence that although people are able to give pretty good estimates for projects when knowing just a little about them - estimates get worse when they obtain more knowledge. How is this possible? the problem with knowing too much about the inner workings of something is that it shifts your perception from a blackbox view to a detailed analysis of the thing. This isn’t bad per se, in fact it is sometimes necessary; But detailed analysis requires a fairly accurate modeling and assumptions on what parts and process of the system under inspection are important and which can be neglected. This requires a much better understanding of the internal structure and behavior of the system then is available when you are just starting to learn about a system. In contrast, when using our intuition we are mostly ignoring the internal workings of the system and only inferring from its external behavior which we are usually fairly experienced with.
A (faulty) model of teamwork
Take a project that is split between 5 engineers. A naive approach would have every engineer estimate the amount of work for his part of the job and feed that to our project estimation model - we are smart enough not to sum it of course. Suppose for a minute that our engineers are calibrated estimators and give pretty good estimates with 90% certainty. However, since all 5 tasks need to be completed for the project to succeed, the combined probability is \( 0.9^5 \approx 0.59 \) - or about 59% chance to complete all 5 tasks within the estimates, not very helpful. Of course, Monte Carlo simulators can give us projections for a wider range of certainties - but this requires selecting a distribution of the randomly generated timelines of the simulation - forcing us to make our first uncorroborated assumption about how things work in this system. In addition, these 5 tasks are at least somewhat dependent (or they wouldn’t be a part of a single project) - which means that some coordination is required. How shall we estimate the coordination cost? remembering Fred Brooks’ lessons, we know that coordination costs rise \( O(n^2) \) with number of people involved, but that still doesn’t tell us how much coordination this task needs. We could ask our engineers to estimate the coordination cost - ending up with 5 90% estimates for coordination work imposed on the group. Each of these numbers needs to be multiplied by 4 (the work inflicted on peers) - and again we will feed this to our simulator. But coordination isn’t immediate as there is not guarantee our peers will be free when we need them; In other words, we are going to suffer queueing effects! If that’s not bad enough, coordination can fail, as anyone who sat in post mortem meetings can attest. Coordination failures may require a rollback of previously done work - which means negative progress, something often not expected by managers. A need for agreement might cause the group to deadlock or livelock with endless arguments as is common in many organizations and this too should be estimated and added to our simulator. At this point we have about 5 independent dimensions individually estimated by 5 engineers - about 25 degrees of freedom; from a purely computational perspective is it obvious a model like this needs a lot of data to be even fairly accurate.
High dimensionality and the failure of Analysis
Every measurement or estimate we make is going to have some error, and these errors compound. A model with 25 degrees of freedom, or dimensions, will have errors compound to the power of 25. Intuitively, if we think of these dimensions as spatial dimensions and the measurements as points in space, then the error margins are spheres or clouds around those points. Remember that length rises with the radius, area with the radius square, and volume with the radius to the power of 3. In math speak you would say an N dimensional volume rises with the power of N. This means the more dimensions, the faster these “error spheres” grow. Similarly, if you are trying to infer something from data points, the space described by the cloud of data points becomes very large as the number of dimensions grow - data points become very sparse in that space, forcing us to have many more data points to reach the same level of confidence. This is known as the curse of dimentionality and makes usual analytical methods useless pretty quickly as dimensionality rises. Unfortunately, this is not nearly as known as should be expected given the magnitude of the problem; Engineers, managers and even scientists often fall prey to the curse of dimensionality. A large portion of the training of physicists is methods and techniques for reducing dimensionality to cope with those problems, by identifying couplings (dimensions whose behaviors correlate and can therefor be considered degenerate) or by assuming some dimensions can be neglected (as they have little effect). All of those methods require making assumptions about the internals of the system and a deep understanding of it - and these assumptions also makes the model local (applies only where those assumptions hold). Analysis, despite being a powerful tool, is quite limited to a (often overly) simplified version of reality.
Blackboxes and whiteboxes
It should be clear at this point, that the transitions from making blackbox predictions based on experience to whitebox analysis of a system is a non-trivial one. It requires a lot of time and effort and you are bound to get it wrong - especially if you are not trained in building formal models. “Slow thinking” is unfortunately very slow. Knowing this, we would be tempted to restrict ourselves to blackbox intuitive predictions - and the human brain is still the most advanced blackbox learning device in existence; Our brains are capable of making useful predictions from as few as tens or hundreds of samples, where as the most advanced ML models require millions of samples. Unfortunately, blackbox predictions have a big downside: they discount rare events (outliers) - and even more so when dealing with a small number of samples. If you have only seen 500 samples, you have no information about the 99.9 percentile. Thus, intuition is particularly bad at assessing the potential of things. We shouldn’t be too surprised, we all know lots of stories of how people have missed the true potential of people, companies, inventions. This is where analysis comes in handy, as it can hint at the potential of things even if it is useless as a tool for immediate predictions. Each method has it’s own pros and cons - there are no free lunches. Understanding this, we can combine both methods - using analysis to predict where best to apply our intuition, or intuitively guess an analytic model to start with.
When it comes to estimating project time, this can be very powerful. For example, queueing effects are non-linear and increase very quickly as the utilization of a resource approaches 100%; Thus if you know a system is (over)loaded there is little point estimating how long individual tasks will take - instead use your intuition to estimate queueing as it is likely to dominate. Where coherence is necessary to make progress, estimate the time to reach consensus as it is likely to dominate. And when a slow feedback loop is in place you can expect failure demand to eat up significant resources. Even in the case where no behavior can be reasonably expected to dominate, analysis can still reveal where the greatest uncertainty and errors may come from; E.g. few things are as dangerous to a project as a protocol that needs to be agreed upon mid-way.
Most people (and of course, managers) will have a single preferred strategy, either analysis or intuition. They will apply it almost automatically even where it is clear it does not apply - you need to have considerable time and data for analysis and intuition is ill equipped to deal with outliers. In our case of project time estimation, the breakdown process of group work to individual work is non-trivial; I have yet to encounter a manager who took the time to do it properly. Conversely intuitive estimates that give you 90% certainty figure tell you little about what can go wrong and how wrong things can go in that risky 10%.
A two headed system for improving estimates
A systemic way of combining the two methods is based on recognizing the limits of each and iterating on them in different time frames. The idea is to use intuitive estimates tactically based on a model that is evolved strategically over a long time. Joel Spolsky’s Evidence based scheduling is one such method. You can build any kind of model, perhaps based on the Universal Scalability Law or Queueing Networks and feed your estimates into them - iterating on the model until it provides meaningful results. You could even program this model if you wanted and run a game of SimCompany. This approach pre-supposes that estimates are a-priori correct, but we need to discover what it is they describe. This is a radical shift from the conventional managerial approach which assumes the model is correct and the estimates are wrong and need to be fixed - which quickly leads to these metrics being gamed and rendered meaningless. Most companies I’ve worked with consistently gamed the estimates and metrics to agree with their (implicit) model of work. They aren’t aware of it, but it’s fairly obvious they do it. To clarify, I’m not referring to good old fiddlin’-with-the-numbers-till-they-look-good, expressionist graphs or “torturing the data until they confess”. I’m talking about the subtle gaming of how and what you measure or estimate to make honest measurements fit the model else it be proven wrong. You know, things like changing the definition of “done”, or “healthy system”, or “reasonable service”. At the end of the day any measurement is based on a subjective decision and the value of any metric depends on how well it drives a model. Linux load average is a great example: it measures something, but it’s not obvious at all how this thing relates to “load” or what “load” actually is - leading to generations of sysadmins chasing wild gooses.
The bottom line is that in order to strategically evolve your model of work, you must first allocate ample time for it and second a-priori accept that all data from the field is both correct and partial. Your model needs to be fixed to interpret the data in a better way and manufacture new measurements that need to be done - but data rejection is not allowed, or at the very least needs to be thoroughly justified. Saying things like “engineers suck at time estimates” is exactly the opposite of that - the estimates are data which you are rejecting instead of accepting your interpretation is wrong. The prevalence of this belief is testament to how much data rejection and metric gaming are wide spread in the industry, so entrenched I dare say they are cultural. The key here is awareness - without being aware that you have a model, and that the model is inherently incorrect and incomplete and needs constant evolution, it is likely rejection of data will be the default.