Site Reliability Engineering seeks to balance the risk of unavailability with the goals of rapid innovation and efficient service operations, so that users’ overall happiness—with features, service, and performance—is optimized.
SREs engineer services, instead of binaries. This is a shift in perspective that exploits unusual skills and creativity. SREs are specialists in making changes safely.
John T. Reese
Here’s what you do when someone breaks something or finds something very difficult to debug: You say thank you. Thank you for finding this edge case. Thank you for highlighting this overcomplicated part of our system. Thank you for pointing out this gap in our docs. And then you go make it so nobody can break it the same way again.
Fundamentally, it's what happens when you ask a software engineer to design an operations function.