Presentations

Monitoring Done Wrong

Published July 25, 2019 (Last modified on April 25, 2023)

We've all heard numerous "awesome monitoring @ X" talks; Boring! Join me in exploring monitoring design principles through various fails - because we can learn sooo much more by analyzing cases where monitoring was done wrong :-)

The Pets the Cattle and the Germs

Published July 25, 2019 (Last modified on April 25, 2023)

10 years ago, we promoted the move from pet systems to faceless hordes of electronic cattle grazing on commodity infrastructure. But as the evolution of the cloud progresses we find that the cattle methodology is no longer sufficient and that cloud native systems resemble some other biological entity…

You don't have a Production Environment

Published July 25, 2019 (Last modified on April 25, 2023)

You don't have a production environment, at least, not in the way we're used to think about it.

Can I tell you a secret? I see dead systems

Published September 1, 2018 (Last modified on April 25, 2023)

We live in a world of shiny new tech introduced all the time. Heck, we even made cars that drive themselves. Yet all around us, unseen and hidden, lurk ancient, forgotten systems. They're in our kernels, our terminals and our CPUs... They are everywhere.

software-engineering

Total Cloud Immersion

Published June 6, 2018 (Last modified on April 25, 2023)

Serverless: Immerse yourself in the Cloud

software-engineering serverless cloud

The Missing User Stories

Published June 6, 2018 (Last modified on April 25, 2023)

software-engineering ux product-management

To err is human: Introduction to modern safety thinking

Published December 19, 2017 (Last modified on April 25, 2023)

In the last 40 years, the philosophy of safety and reliability has changed dramatically in the world of high risk industries. This has prompted many organizations in various risk-prone fields to adopt new methods and processes and sometimes even undergo a radical cultural and managerial change. However, the software industry remained largely oblivious of these advancements despite the similarities in failures and systems. After all, most systems today are software managed whether they run a nuclear reactor or a website builder. This talk introduces the major concepts of new-era safety thinking, e.g.: Safety II, Work as done vs work as imagined, Normal accidents theory.

reliability

Data: You keep using that word...

Published November 29, 2017 (Last modified on April 25, 2023)

Structured data, dynamic data, big data, data driven..... we hear about data all the time. But what is "data" exactly? The term is frequently used, yet is rarely defined or thought of - and it turns out the answer to "what is data" is not simple at all.

software-engineering data

Linux System Metrics

Published November 1, 2017 (Last modified on April 25, 2023)

While you can learn a lot by emitting metrics from your application, some insights can only be gained by looking at OS metrics. This hands-on workshop, covers the basics in Linux metric collection for monitoring, performance tuning and capacity planning. (Co-Author: @nocoot)

system-engineering linux

Actionable Exceptions

Published November 1, 2017 (Last modified on April 25, 2023)

So that exception I see in the logs 3000 times is a "normal exception"? sounds legit. Repeat after me: A Normal Exception is Not. Exception raising/handling is a popular and ingrained mechanism for dealing with faults. Unfortunately, it's also one of the most abused...

software-engineering java