Courses & Tutorials

Awesome Site Reliability Engineering – Massive Collection of Resources

A curated list of awesome Site Reliability and Production Engineering resources.

What is Site Reliability Engineering?

“Fundamentally, it’s what happens when you ask a software engineer to design an operations function.” – Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

Contents

Culture

Education

Books

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

  • Brendan Gregg’s Blog – Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  • Everything Sysadmin – Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
  • High Scalability – Technical Blog Posts About Systems Architecture.
  • rachelbythebay – Techincal Blog Posts.
  • Susan J. Fowler – Various blog posts about SRE, Software Engineering and Microservices.
  • SysAdvent – One article for each day of December, ending on the 25th article.
  • Operations for Developers – A collection of resources for developers to strengthen their Ops skills.
  • Stephen Thorne’s Blog – Blog Posts About SRE
  • Increment – A digital magazine about how teams build and operate software systems at scale.
  • GopherSRE – Blog Posts about Go and SRE.
  • Cindy Sridharan – Blog posts about distributed systems and their management.
  • Blameless Blog – Blog posts about SRE culture and practices.
  • Resilience Roundup – Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  • Squadcast Blog – Blog posts about SRE best practices, reliability, on-call and incident management.
  • FireHydrant Blog – Posts about complex systems, incident response, and SRE best practices.

Newsletters

  • DevOpsLinks – A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  • KubeWeekly – The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  • SRE Weekly – Weekly Site Reliability Newsletter.
  • O’Reilly Systems Engineering and Operations Newsletter – Weekly systems engineering and operations news and insights from industry insiders.
  • ChaosEngineering.news – Chaos Engineering newsletter. All things Chaos Wngineering, directly to your inbox!

Conferences & Meetups

Twitter

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button