Certificate creation issue
Incident Report for SecureAuth Service
Postmortem

Incident Description 

  • On November 14, multiple customers reported issues with the renewal of PFX personal certificates. 

Root Cause 

  • SecureAuth utilizes Cloud-based Hardware Security Modules (HSM) through Thales. 
  • Thales was performing maintenance on the Cloud HSM infrastrucuture over the weekend which caused the SecureAuth Certificate Authority (CA) systems to be unable to connect to the Cloud HSM for key validation. 

  • The SecureAuth CA’s regularly renews the Certificate Revocation Lists (CRL) for multiple CA’s - tthe expiration of the delta CRLs is approximately 48 hours, which is why we did not have any impacts from the Thales maintenance until Sunday evening with customers not being impacted until Monday morning. 

  • To further exacerbate the problem, the alerts generated by the monitoring systems were not going to the location the L1 team monitors. 

Corrective Actions 

  • Restarting the CA’s on all of the “NGE” servers corrected the issue 
  • Documentation was not completely up-to-date on the configuration of the multiple-region deployment of the NGE certificates. The DevOps team will be reviewing the documentation and updating as necessary. 

  • Review of all DevOps alerts has been conducted to ensure all alerts are going to the location that is actively monitored vs. the Slack channel that also has the alerts, but is not routinely monitored. 

  • The DevOps Team has enrolled in status updates through the Thales Status Page and will review any changes or maintenance that is posted to that site to ensure internal testing can be performed to validate all SecureAuth operations are not impacted.

Posted Nov 14, 2022 - 18:35 PST

Resolved
RCA to be posted soon
Posted Nov 14, 2022 - 15:06 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 14, 2022 - 12:07 PST
Identified
The issue has been identified and a fix is being implemented.
Posted Nov 14, 2022 - 09:57 PST
Update
We are continuing to investigate this issue.
Posted Nov 14, 2022 - 08:53 PST
Investigating
We are currently investigating reports of our PFX/Certificate creation not working for some customers.
Posted Nov 14, 2022 - 08:53 PST
This incident affected: SecureAuth Cloud Services (X.509 Certificate Service (SHA2) - US1, X.509 Certificate Service (SHA2) - US2).