The Math of Reliability


We often hear talks of “scale” and “reliability”, mostly based on personal experience and lessons learned. What can Mathematics tell us about reliability and scale? Can math help us scale our systems and companies? It turns out that failure models, probability, statistics and other domains can help our analysis and provide insights.

It turns out that mathematics forces us to rigorously construct and analyse our models, often exposing subtle issues and misconceptions; But moreover, it allows us to expand our understanding and explore the consequences of scale and stress without actually building a system.

This presents simple failure models, explains the math behind common practices, shows common misconceptions and showcases mathematical examples of why things behave differently at scale and how things that work well in small systems can be horrible at scale.