Site Reliability Engineering (SRE): The Big Picture

Site Reliability Engineering (SRE) is how Google runs production systems, promoting high availability with high velocity and removing operational toil. It achieves the same goals as DevOps without the culture shift.
Course info
Rating
(270)
Level
Beginner
Updated
Mar 5, 2020
Duration
1h 41m
Table of contents
Description
Course info
Rating
(270)
Level
Beginner
Updated
Mar 5, 2020
Duration
1h 41m
Your 10-day individual free trial includes:

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.
Description

Site Reliability Engineering (SRE) is a set of principles and practices that supports software delivery - keeping production systems stable and still delivering new features at speed. In this course, Site Reliability Engineering (SRE): The Big Picture, you'll get a thorough overview of how SRE works and why it's a good choice for many organisations. First, you'll learn the differences between SRE, DevOps, and traditional operations. Next, you'll discover how engineering practices help to reduce toil and provide more time to focus on high value tasks. Finally, you'll learn how SRE approaches monitoring and alerting, and about the SRE approach to managing incidents. When you're finished with this course, you'll be able to evaluate SRE and see if it's a good fit for your organisation.

About the author
About the author

Elton is an independent consultant specializing in systems integration with the Microsoft stack. He is a Microsoft MVP, blogger, and practicing Technical Architect.

More from the author
Developing .NET Framework Apps with Docker
Intermediate
3h 34m
Jun 25, 2021
More courses by Elton Stoneman
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] Hey, how you doing? My name's Elton on. This is site reliability Engineering The big picture here. POM Pluralsight. I'm a Microsoft Azure MVC and a Docker. Captain and I've been working with teams using DevOps for many years. DevOps is great for building highly productive teams, but it needs a huge cultural shift which many organizations just find too hard. That's where SRE comes in because it shares many of the same goals and approaches as DevOps without the big culture change. SRE keeps a distinction between the Product Development Team, which builds the system, and SRE Team, which manages the system that makes for an easier transition from the traditional Dev and Ops teams. But SRE is not like traditional ops. It brings a software engineering approach to operations. You'll learn what that means in this course, which introduces you to Aled the practices and principles of sorry with lots of real world examples. First, you'll learn how sorry is different from traditional operations on from DevOps. You'll see why sorry is a better approach for lots of organization. Next, you'll learn about oil, which is mundane, repetitive, low value operations work on how sorry gives you tools to help manage it. You'll see that limiting how much oil the Sorry team has to work through frees up more time for high value project work with some useful guidance on automation. Then you'll learn about service levels. How sorry uses service level objectives slows to clearly state the availability targets for a system and a service level indicators s allies. To track those objectives, you'll see how you can define SLOC on implement SL eyes on the best approaches for monitoring systems and generating alerts when the cellos are in danger. Last you'll learn about being on call with Sorry how incident management works, how you can effectively investigate issues on why you should produce a postmortem for every major production incident. This is an introductory course, which gives you away the foundational understanding of sorry that you need to help you evaluate TIFF. SRE will work for you. So join me on. Learn how Google runs production systems right here on Pluralsight with site reliability, engineering, the big picture and