Revison 130
Articles and updates:
Too Soon or Too Late: The Incident Escalation Dilemma (link)
AI Reliability Engineering: Welcome to the Third Age of SRE (link)
Insights from the OpenTelemetry Contributor Experience Survey (link)
Beyond High Availability: Disaster Recovery Architectures That Keep Running When HA Fails (link)
How to simulate network latency in local containers (link)
Engineering Resilience Through Data: A Comprehensive Approach to Change Failure Rate Monitoring (link)
Kubernetes at LinkedIn, with Ahmet Alp Balkan and Ronak Nathani (link)
Projects:
OpenOps is a No-Code FinOps automation platform that helps organizations reduce cloud costs and streamline financial operations. (link)