Imagine that you had a good idea and decide to create a digital solution. The service is innovative and you have no competitors. After some months you see your user base grow exponentially. More and more features are added in each release.

The success knocked at your door. The numbers are mind-blowing and your customers satisfied.

Time passes…

The numbers once dreamy start to drop. Your solution was not planned to support so many users and with the new features also came bugs and technical debt. Your customers are now upset with the recurrent crashes and slowdown at peak hours. Competitors arise in the market with simpler but steady solutions. Your customer base slowly starts to leave.

Eventually, an outage happens keeping you out of the business for an entire day. It was a disaster!

Your devs and ops teams fight and blame each other for the problem. For the operations guys, everything is caused by bad and untested code. In its turn, developers blame the operations team for being lazy and too slow to respond to incidents.

#devops #sre #site-reliability-engineer

Why SRE?
1.60 GEEK