Site Reliability Engineering (SRE): The Big Picture

Site Reliability Engineering (SRE) is how Google runs production systems, promoting high availability with high velocity and removing operational toil. It achieves the same goals as DevOps without the culture shift.
Course info
Rating
(44)
Level
Beginner
Updated
Mar 5, 2020
Duration
1h 41m
Table of contents
Description
Course info
Rating
(44)
Level
Beginner
Updated
Mar 5, 2020
Duration
1h 41m
Description

Site Reliability Engineering (SRE) is a set of principles and practices that supports software delivery - keeping production systems stable and still delivering new features at speed. In this course, Site Reliability Engineering (SRE): The Big Picture, you'll get a thorough overview of how SRE works and why it's a good choice for many organisations. First, you'll learn the differences between SRE, DevOps, and traditional operations. Next, you'll discover how engineering practices help to reduce toil and provide more time to focus on high value tasks. Finally, you'll learn how SRE approaches monitoring and alerting, and about the SRE approach to managing incidents. When you're finished with this course, you'll be able to evaluate SRE and see if it's a good fit for your organisation.

About the author
About the author

Elton is an independent consultant specializing in systems integration with the Microsoft stack. He is a Microsoft MVP, blogger, and practicing Technical Architect.

More from the author
Getting Started with Prometheus
Beginner
1h 49m
Jun 24, 2020
Using Declarative Jenkins Pipelines
Beginner
2h 5m
May 1, 2020
Using and Managing Jenkins Plugins
Intermediate
2h 23m
Apr 3, 2020
More courses by Elton Stoneman
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hey, how you doing? My name's seldom on this is site reliability Engineering The big picture here on Pluralsight. I'm a Microsoft is your M V P. Onda. Doctor, Captain and I've been working with teams using Dev ops for many years. Dev Ops is great, building highly productive teams. But it needs a huge cultural shift which many organizations just find too hard. That's where s sorry comes in because it shares many of the same goals on approaches those devils without the big culture change, eh? Sorry Keeps a distinction between the product development team, which builds the system on the S sorry team, which manages the system that makes for an easier transition from the traditional Devon ops teams. But s sorry is not like traditional ops. It brings a software engineering approach the operation. You'll know what that means in this course which introduces you. Tow A ll the practices and principles of s sorry with lots of real world examples. First, you'll learn how s sorry is different from traditional operations on from devils and you'll see why s sorry is a better approach for lots of organization. Next you'll learn about toile which is mundane, repetitive, low value operations work on how s sorry if your tools to help manage it. You'll see the limiting, how much toil the sorry team has to work through. Frees up more time for high value project work with some useful guidance on automation. Then you'll learn about service levels. How s sorry? Uses service level objectives? A cellos to clearly state the availability targets for a system on service level indicators s allies. To track those objectives, you'll see how you could define SL owes on implement s allies on the best approach is for monitoring systems and generating alerts when the S fellows are in danger. Lastly, you'll learn about being on call with Sorry how incident management works, how you can effectively investigate issues and why you should produce a postmortem for every major production incident. This is an introductory course which gives you a ll the foundational understanding of s sorry that you need to help you evaluate if s sorry will work for you. So join me on Learn how Google runs production systems right here on Pluralsight with sight reliability, engineering, the big picture