Introduction

In May, we experienced two incidents resulting in significant impact and degraded state of availability for API requests, GitHub Pages, GitHub Actions and the GitHub Packages service, specifically the GitHub Packages Container registry service.

May 8 06:46 UTC (46 minutes)

This  incident was caused by failures in an underlying MySQL database, which caused some operations to time out for the GitHub Container registry service. During this incident, some customers viewing packages in the UI or interacting with the registry through “docker push” and “docker pull” may have experienced failures as the engineering team investigated the incident. After performing a failover to one of our database replicas, the affected systems were properly restored.

Our internal engineering team is now prioritizing work that will help ensure reduced impact to customers should such underlying outages happen again. This work includes creating internal documentation, dashboards, and enhanced alerts to quickly triage the cause of operation failures. We will also continue to actively maintain and increase replicas in different regions and availability zones that serve as a line of defense against unexpected region outages.

#engineering #github

GitHub Availability Report: May 2021
1.20 GEEK