From 6:11AM to 8:05AM users could not login to the platform due to authentication issues.
A quick fix was implemented but we experienced issues when trying to deploy it. The issues were related to:
- The cluster losing one of its nodes and thus not having enough compute power to run the deployment pipeline.
- Integrates’s previous deployment being stuck due to hard limit policies.
What we have done
- Re-deployed Integrates in our cluster with the proper fix to the specific issue.
- Increased the number nodes in our Kubernetes cluster to improve performance.
- Improved Integrates’s deploying rules to avoid future deployment errors
What is the impact
Failed login attempts to Integrates from 6:11AM to 8:05AM that resulted in users getting an error message saying that they did not have authorization to access the platform. 38 users were affected by this at the most.
What we are doing to help
We are improving our cluster’s capabilities of recovering from undesired states.