Intermittent Issues with US East Cloud Services
Incident Report for SecureAuth Service
Postmortem

Incident Description

At 12:15 UTC on 9/8/2018 intermittent failures in Push Notifications, Risk, and Geolocation services were reported.  Investigation showed that the main database server in our primary data center database cluster was at 100% CPU.  This caused intermittent failures in Push Notifications, Risk, and Geolocation services.  Impact of this issue was very limited, affecting only a few customers.

After initial identification of the problem, impacted customers were redirected to our secondary data center which was unaffected by the issue.  CPU usage returned to normal at 13:00 UTC.  After internal monitoring determined the systems was stable again, impacted customers were failed back to our primary data center.

Root Cause

The primary database server in our database cluster in our primary data center suffered an extended CPU spike which was caused by an unexpected internal database maintenance job.  This caused internal service timeouts to be reached for many requests, which impacted Push Notifications, Risk, and Geolocation services.

Corrective Actions

Enhanced monitoring and alerting have been implemented.  Priority has been given to provide additional resources to the database cluster.

Posted Oct 01, 2018 - 12:53 PDT

Resolved
This incident has been resolved. RCA will be posted in the next few days.
Posted Sep 08, 2018 - 19:51 PDT
Monitoring
All End Points are currently working. Service was restored at ~08:20am Pacific. We will continue to monitor and will update this with an RCA once the investigation is complete.
Posted Sep 08, 2018 - 09:45 PDT
Investigating
We are currently investigating intermittent issues with cloud services. Current services that may be impacted are SMS, Telephony, Push, Location Services, and Certificate Enrollment. We will provide updates as we receive more information.
Posted Sep 08, 2018 - 07:00 PDT
This incident affected: SecureAuth Cloud Services (Enhanced Geolocation Resolution Service - US1, Enhanced Geolocation Resolution Service - US2, Geolocation Resolution Service - US1, Geolocation Resolution Service - US2, Nexmo Voice API, Push-to-Accept Service - US1, Push-to-Accept Service - US2, SMS Service - US1, SMS Service - US2, Telephony Extension/DTMF Service - US1, Telephony Extension/DTMF Service - US2, Telephony Provider SMS API, Telephony Service - US1, Telephony Service - US2, Threat Service - US1, Threat Service - US2, X.509 Certificate Service (SHA2) - US1, X.509 Certificate Service (SHA2) - US2).