When something unexpected or unplanned occurs that creates an adverse effect on the system, I define that action as an incident. Some companies reserve the term incident for large catastrophic events, but with this broader definition you get to increase the learning opportunities on your team when an incident occurs. As mentioned previously, at the center of DevOps is this idea of continuous improvement. Incremental change is a win in a DevOps organization, but the fuel that powers that continuous improvement is continual learning. Learning about new technologies, existing technologies, how teams operate, how teams communicate and how all these things interrelate to form the human-technical systems that are engineering departments.

One of the best sources for learning isn’t when things go right, but when they go wrong. When things are working, what you think you know about the system and what’s true in the system aren’t necessarily in conflict. Imagine you have a car with a 15-gallon gas tank in it. For some reason, you think the gas tank has thirty gallons, but you have this habit of filling your gas tank after you burn through about ten gallons. If you do this religiously, your understanding of the size of the gas tank never comes into conflict with the reality of the gas tank being only fifteen gallons. You might make hundreds of trips in your car without ever learning a thing, but the minute you decide to take that long drive, you run into problems at sixteen gallons. Before long you realize the folly of your ways and you start taking the appropriate precautions now that you have this new found information.


Learning from Incidents
1.25 GEEK