SecureAuth Authentication Services Down
Incident Report for SecureAuth Service
Postmortem

Incident Description

A routine database automatic scaling operation failed around 9:30 AM ET on 4/19/2023. This caused the database cluster to become unresponsive and fail to accept connections. While data integrity was unaffected, data availability was compromised. The SecureAuth team worked to restore service and forced the scaling operation to finish. Once the scaling operation finished, the database cluster once again was accepting connections and service was restored around 12:30 PM ET on 4/19/2023.

Root Cause

It appears that the automatic scaling operation failed to complete due to an issue with the database platform in general. One cluster node was marked unhealthy by the scaling algorithm incorrectly, and this prevented the rest of the cluster nodes from scaling properly. The database cluster began rejecting connections soon thereafter.

Corrective Actions

Typically such automatic scaling operations are seamless and without incident. However, given the conditions for auto-scale failure on 4/19/2023, we acknowledge there is still risk with such activity. The SecureAuth team is following up with a database platform vendor for more information regarding the scaling operation failure. In the meantime, any auto-scale operations for this database cluster have been suspended, and the SecureAuth team will monitor capacity and do any scaling activity for this database cluster during planned maintenance windows. Finally, the SecureAuth team is confirming vendor best practices are being followed for any/all connections to this database cluster and will make adjustments as needed.

Note: times are Eastern time zone.

Posted Apr 19, 2023 - 20:55 PDT

Resolved
All customers should have returned to normal. Please contact SecureAuth Support (support@secureauth.com) if you are still having issues.
Posted Apr 19, 2023 - 10:39 PDT
Monitoring
Services have been restored and we will continue to monitor the situation to ensure no further issues.
Posted Apr 19, 2023 - 09:42 PDT
Update
Updated estimated resolution time 1:00pm Eastern Time. Continuing to recover the underlying database infrastructure components.
Posted Apr 19, 2023 - 09:23 PDT
Identified
Identified an issue with the underlying database infrastructure which should be resolved by 12:15pm Eastern Time (9:15am Pacific/16:15 UTC).
Posted Apr 19, 2023 - 08:41 PDT
Investigating
The SecureAuth Authentication services have been impacted by a issue within the backend services - we are working with the appropriate teams and third-parties to isolate and resolve.
Posted Apr 19, 2023 - 07:48 PDT
This incident affected: SecureAuth Polaris Services (FIDO Service, Mobile Services, SaaS IdP Broker) and SaaS/Full Cloud Components (SaaS/Full Cloud Identity Platform, SecureAuth Connector).