The SRE resource library

SRE represents a mindset, engineering practices, and a job function. Here you will find articles, videos, and guides to help you implement SRE principles and run reliable production systems.

Explore All Resources

Machine learning

Machine Learning in Production

Start your journey by exploring

Machine Learning in Production

Machine Learning Inference

Continue your journey by reading

Efficient Machine Learning Inference

Machine Learning at scale

Extend your journey by watching

Machine Learning at Scale

Service level objectives

Implementing SLOs

Begin by reading

Implementing SLOs

Alerting on SLOs

Dig deeper by exploring

Alerting on SLOs

measures service reliability

Build your skills with

Art of SLOs

Systems engineering

Non-Abstract Large System Design

Learn the basics by reading

Introducing Non-Abstract Large System Design

Distributed imageserver

Develop fundamentals by exploring

SRE Classroom: Distribued ImageServer

SRE best practices

Build advanced skills with this video workshop

How to Design a Distributed System

Explore resources

Filter by:

Sorry, no available at the moment.