During the procurement of a new production region (APAC), an incorrect DNS change was applied to one of our DNS providers (AWS Route53). This led to intermittent DNS resolution issues, affecting customer reachability of IdP services. Due to the nature of DNS propagation, and the intermittentness of the issue, a delay of our monitors' ability to detect the problem occurred. Once identified, the DNS change was reverted, and DNS propagation of the fix began, completing at approximately 10:30AM PST.
Timeline of Events
Mar 26 7:30 PST Incorrect DNS change applied to Route53 Mar 26 8:30 PST DevOps team began diagnosing the issue Mar 26 8:55 PST DevOps team removed duplicate record Mar 26 10:30 PST DNS Propagation appears complete.
Corrective Actions
Additional approvals required for any DNS changes. Review opportunity to expand endpoint monitoring from multiple regions and DNS name servers.
Posted Mar 26, 2024 - 11:18 PDT
Monitoring
We are aware of ongoing intermittent DNS issues affecting various services. We have found the root of the issue and implemented a fix. Please allow for DNS propagation of the fix. We will continue to monitor.