US2 issue
Incident Report for SecureAuth Service
Postmortem

Impacted Customers: All on US2 Cloud Datacenter

Impacted Services: Push, Link2Accept, SMS, TTS, IP threat and geo, Certificate.

Incident Date: January 26th, 2020

Incident Description:

Starting at approximately 2:22AM UTC on 01/26/2020 the SecureAuth Services hosted at US2 Cloud Datacenter were substantially degraded. Operations staff received alerts at 2:24AM UTC and began remediation procedures, but access rights issues limited their ability to take corrective actions. At 4:31AM UTC DNS Failover was initiated. Operations staff noticed that DNS Failover was marginally effective due to DNS TTL not being respected or relayed to client servers for some impacted customers. At 5:58AM UTC Load balancer failover was initiated routing all traffic to the US1 cloud datacenter. This resulted in rerouting the balance of customers that were not impacted by the DNS failover.

Root Cause:

Impaired Network / Performance at US2 Cloud Datacenter Hypervisor infrastructure, currently under research by provider, and RCA is pending. This caused CPU and Network anomalies on hosted services, and process queue increase which impacted the services ability to respond to requests.

Corrective Actions:

Failover processes will be implemented more aggressively, bypassing DNS failover. Access rights limitations will be resolved for 24x7 operations staff.

Additional monitoring has been implemented, to detect infrastructure service degradation based on current findings. After RCA from provider is received, additional remediation may be taken.

Posted Jan 31, 2020 - 01:05 UTC

Resolved
Resolved – The incident has been resolved but our team is continuing to monitor the situation. RCA to follow once the investigation is complete.
Posted Jan 26, 2020 - 19:41 UTC
Monitoring
We have cut over all traffic from our US2 Datacenter to our US1 Datacenter and workaround is in place. Services are now restored and operational. We are continuing to monitor the situation and are still investigating the root cause and resolution. An RCA will be provided as soon as the investigation is complete.
Posted Jan 26, 2020 - 04:30 UTC
Investigating
Intermittent Issues with Cloud Services for US-west SecureAuth cloud datacenter
Monitoring - We are currently investigating intermittent issues with cloud services impacting SMS, phone OTP, Push, Geo, Threat, Phone Fraud, certificate issuance. We will provide updates as we receive more information.
Posted Jan 26, 2020 - 02:22 UTC
This incident affected: SecureAuth Cloud Services (Enhanced Geolocation Resolution Service - US2, Geolocation Resolution Service - US2, Push-to-Accept Service - US2, SMS Service - US2, Telephony Extension/DTMF Service - US2, Telephony Service - US2, Threat Service - US2).