— summary, paper, sre — 2 min read
Today's summary is about a paper written by four Google employees in 2016.
In the long run, customers value a reliable, predictable interface offered by a healthy team more than they value a request queue that processes any and every request, be it standard or an unconventional oneoff, in an indeterminate amount of time.
Using the “pets vs. cattle” analogy discussed in a [2013 UK Register article](http://www.theregister.co.uk/2013/03/18 /servers_pets_or_cattle_cern/), your systems should be automated, easily interchangeable, replaceable, and low maintenance (cattle); they should not have unique requirements for human care and attention (pets).
Any team tasked with operational work will necessarily be burdened with some degree of toil.
While toil can never be completely eliminated, it can and should be thoughtfully mitigated to ensure the long-term health of the team responsible for this work.
When operational work is left unchecked, it naturally grows over time to consume 100% of a team’s resources.
Engineers and teams performing an SRE or DevOps role owe it to themselves to focus relentlessly on reducing toil — not as a luxury, but as a necessity for survival.