Investigating Intermittent SMS Delivery Issues

Incident Report for SecureAuth Service

Postmortem

Incident Description

At approximately 2125 UTC on May 22, internal monitoring alerted us to an issue in SMS message delivery. Logging showed delayed messages and a very small percentage of undelivered messages. Corrective actions were taken and normal operation resumed at about 2225 UTC.

Root Cause

After detailed investigation with our network providers and others, it was determined that an extreme spike in volume had caused resource exhaustion on one of our servers, which resulted in failed SMS delivery. The load was determined to be test traffic that had been incorrectly directed to a production server.

Corrective Actions

Extensive discussions have taken place with the relevant parties. In addition, we have trained our staff on how to identify single-souce-IP traffic spikes and how to properly activate rate limiting tools. We have also adjusted the parameters of our alerting system to more quickly react to over-volume as well as delivery delays.

Posted Jun 04, 2019 - 11:57 PDT

Resolved

This incident has been resolved. We are continuing to investigate the root cause. The post-mortem will be provided once the investigation is complete.
Posted May 22, 2019 - 16:18 PDT

Investigating

We are currently investigating intermittent issues with SMS delivery. As a result, we have temporarily failed over to a secondary service provider. We will provide updates as we receive more information.
Posted May 22, 2019 - 14:47 PDT