3:17 AM - an SMS just woke you up; You stare at Nagios half asleep and panic, production is down!!!. You’ve got 20 minutes to do…. something.
While we often discuss resilient architectures, high-availability and backups we do not give proper attention to the human elements of handling disaster. Things like incident management procedures, on-call rotations and even what mindset you should have when handling disaster are left obscure. This talk aims at rectifying this situation and explains how to prepare for disaster and how to handle it, extracting maximum value (yes, extracting value) from calamity.