SMS delivery delays
Incident Report for SecureAuth Service
Root Cause Analysis
Description of events:

On 2/22/2017 at 6:45am PST / 14:45 UTC SecureAuth Support was alerted to a customer reporting SMS delay or delivery failures.

Upon initial investigation with our SMS provider it was determined the issue impacted more than one customer. We were told by our SMS Provider there were no service outages and that the SMS requests in question were being rejected due to an API request rate threshold limit being exceeded.

Based on this information SecureAuth requested an increase of the API rate threshold. This change appeared to resolve the issue however while investigating why monitoring had not alerted on the failure, we began to see additional request rejections indicating the threshold was again being met. We immediately requested the threshold be set at the maximum allowable value which appeared to again resolve the issue but we later began to see more rejections and at that time migrated SMS services over to our secondary SMS provider.

What was the issue?

The information provided indicates that the failures, reported as SMS session rejections due to API threshold limitation, were caused by a combination of an account setup configuration error and programmatical change made to the SMS Provider’s back-end functionality released recently.

What was done to fix the issue?

1. The SMS Provider reverted the programmatical change to their back-end functionality and corrected the SecureAuth account misconfiguration.
2. The SMS Provider released a fix on 2/24/2017 at 17:56 UTC

Why did SecureAuth’s monitoring not alert us to the issue?

The account configuration error and back-end changes made by the SMS provider combined caused the error condition logging to be reported to a “rejection” log available only to the SMS provider.

What is being done to ensure this type of failure is avoided going forward?

1. The SMS Provider has added additional API threshold and account capability tests to their QA environment after their investigation of the issue.
2. The SMS provider will investigate and implement additional monitoring checks to alert on similar error conditions.
3. SecureAuth is adding additional API response handling to our back-end service which we identified during the investigation phase.
4. SecureAuth will investigate, identify and implement more automations to the status page to help inform customer’s more quickly.
Posted over 1 year ago. Feb 27, 2017 - 05:56 UTC
All production SMS services are fully operational and stable. We will keep this incident open until we have received the vendor RCA and posted this information.
Posted over 1 year ago. Feb 23, 2017 - 18:41 UTC
We have routed all SMS traffic to a secondary provider. We continue to work with the primary provider to determine the root cause and will update this incident as more information comes in. We apologize for the inconvenience and thank you for your patience.
Posted over 1 year ago. Feb 22, 2017 - 20:55 UTC
We have identified the issue and we are working with our SMS Vendor to resolve the SMS delays. We'll update this incident with more information as it becomes available.
Posted over 1 year ago. Feb 22, 2017 - 19:00 UTC
We are investigating reports of SMS delivery delays. More information on this issue will be provided as soon as possible.
Posted over 1 year ago. Feb 22, 2017 - 16:27 UTC