Cloud Service Degradation

Incident Report for SecureAuth Service

Postmortem

RCA – Cloud Services Outage/RDS Failure - October 23, 2025

Problem Description: On October 23, 2025, at 12:30 PM PDT, SecureAuth’s Cloud infrastructure experienced degradation of the IP risk evaluation service, which by around 2:00 PM PDT, escalated into widespread database connection issues, affecting multiple cloud services and resulted in authentication failures for impacted customers. A gradual rolling restart of all single tenant services in batches restored services, leading to a full recovery at 4:00 PM PDT.

Cause: The incident originated with an unbounded increase in the IP risk evaluation service traffic (ipintelsvc) to the production database. The resulting proliferation of concurrent, heavy-query sessions overwhelmed the shared Aurora RDS cluster, including the segment serving the Vault secret-storage service. Loss of database connectivity led to a Vault crash and downstream failures in dependent services.

Recovery: The DevOps and Engineering teams initiated resolution efforts by scaling down the IP risk evaluation service traffic (ipintelsvc), which led to RDS CPU and connection metrics gradually decreasing, allowing auxiliary vault services to come back up. A rolling restart was then performed for all customer service deployments, in a gradual batched fashion.

Timeline: October 23, 2025

  • 12:45 PM PDT – Alerts triggered for spikes for two customers
  • 1:08 PM PDT – DevOps and Engineering teams begin investigation
  • 1:13 PM PDT – First Support ticket received regarding outage
  • 1:30 PM PDT – ipintelsvc logs confirm spike
  • 1:45 PM PDT – ipintelsvc scaled down
  • 2:00 PM PDT – Cleanup of lingering sessions in progress
  • 2:05 PM PDT – Vault Polaris service recovered
  • 2:15 PM PDT – Rolling restart of all customer services begins
  • 3:15 PM PDT – Near full recovery confirmed, with a small batch requiring manual intervention
  • 4:00 PM PDT – Full recovery confirmed

Corrective Actions:

  • Software updates to the infrastructure and backend services to improve performance and security will be scheduled during a change maintenance window for customers on a weekly cadence.
  • Further enhancements leading to the separation and resilience of single tenant services will be made, eliminating the risk of the cascading restart requirement, and decoupling from the shared services RDS. This infrastructure has already been implemented and applied in Dev and Test SecureAuth Cloud environments, and we are starting to roll it out in Production.

    If there is a preference to be placed on a priority list to have the updates performed to your tenant, please reach out to your CSM or Support and we will work on prioritizing a change window for your organization.

Posted Oct 27, 2025 - 13:06 PDT

Resolved

All services remain fully recovered.

If you have any questions regarding this issue, please log a ticket at https://support.secureauth.com. Our teams are on standby and are ready to assist.
Posted Oct 23, 2025 - 19:35 PDT

Monitoring

All services are operational again and we are continuing to monitor.

If you have any questions regarding this issue, please log a ticket at https://support.secureauth.com. Our teams are on standby and are ready to assist.
Posted Oct 23, 2025 - 16:07 PDT

Identified

We are starting to execute the implementation of a fix. This should take up to 30 minutes to be completely rolled out.

If you have any questions regarding this issue, please log a ticket at https://support.secureauth.com. Our teams are on standby and are ready to assist.
Posted Oct 23, 2025 - 14:38 PDT

Update

We see heavy database utilization leading to service outages. We are in the process of failing over to secondary database to alleviate pressure.

If you have any questions regarding this issue, please log a ticket at https://support.secureauth.com. Our teams are on standby and are ready to assist.
Posted Oct 23, 2025 - 13:51 PDT

Investigating

We are investigating an issue with our backend cloud services that may impact customer workflows.

If you have any questions regarding this issue, please log a ticket at https://support.secureauth.com. Our teams are on standby and are ready to assist.
Posted Oct 23, 2025 - 13:30 PDT
This incident affected: Workforce (Cloud IdP).