- Learning Path Libraries: This path is only available in the libraries listed. To access this path, purchase a license for the corresponding library.
- Core Tech
Site Reliability Engineering (SRE)
This path introduces you to the core of **Site Reliability Engineering**, starting with the key principles, practices, and tools that help teams build and maintain reliable systems. You’ll learn how to define and measure reliability, automate for resilience, and use observability to stay ahead of issues. By the end, you’ll be equipped to create systems that are scalable, fault-tolerant, and aligned with **real business needs**.
Content in this path
Site Reliability Engineering
In these video courses you will dive into the core concepts and practices of Site Reliability Engineering. You’ll get clear explanations, real-world examples, and practical guidance on everything from defining reliability metrics and automating recovery to building resilient systems and aligning with business goals. These videos are designed to give you both the “why” and the “how” behind modern reliability work.
- By the end of this path, you’ll be able to:
- Explain what SRE is and how it fits into today's IT and DevOps landscape
- Define and use key reliability metrics like SLIs, SLOs, and error budgets
- Design systems that are resilient, scalable, and self-healing
- Implement monitoring and observability strategies that give actual insight into performance
- Use automation to manage reliability, respond to incidents, and scale operations
- You don’t need to be an SRE already, but you should have:
- A solid understanding of how modern applications and infrastructure work
- Some experience with DevOps practices or systems administration
- Familiarity with monitoring tools, automation, or scripting (even at a basic level)
- A general comfort working from the command line
- This path is great whether you’re stepping into SRE or looking to level up your reliability skills.
- Monitoring and Observability Tools
- CI/CD Pipelines and Deployment Automation
- Security and Compliance Automation
- Capacity Planning and Load Testing