Incident Description
The evening of 2018-04-01 SecureAuth customers began experiencing issues with MFA. Full availability of the impacted services was restored the next morning (2018-04-02) at approximately 10:25 AM EDT.
Root Cause
After a thorough analysis and review of our systems we have determined the following issues were the primary contributors to the failure: - SACloud has multiple backup processes running on the backend database, including full backups, differentials, and transactional logs. The various backups are performed by a combination of tools. At the time of the outage, the process for one of these backup tools was consuming abnormally high CPU, starving the database service of the necessary CPU cycles to return queries to the front end SACloud web server requests. - The immediate fix was to stop and disable that backup process. After consulting with our backup vendors, we learned there is a potential for conflicts between that tool and other backup processes that would result in it consuming high CPU. This is most likely what caused its sudden increase in CPU utilization.
Corrective Actions
The following measures are being taken to prevent an incident of this type from happening in the future: