Regarding Integrates's login issues on 08/10/2019

What happened

From 6:11AM to 8:05AM users could not login to the platform due to authentication issues.
A quick fix was implemented but we experienced issues when trying to deploy it. The issues were related to:

  • The cluster losing one of its nodes and thus not having enough compute power to run the deployment pipeline.
  • Integrates’s previous deployment being stuck due to hard limit policies.

What we have done

We have:

  • Re-deployed Integrates in our cluster with the proper fix to the specific issue.
  • Increased the number nodes in our Kubernetes cluster to improve performance.
  • Improved Integrates’s deploying rules to avoid future deployment errors

What is the impact

Failed login attempts to Integrates from 6:11AM to 8:05AM that resulted in users getting an error message saying that they did not have authorization to access the platform. 38 users were affected by this at the most.

What we are doing to help

We are improving our cluster’s capabilities of recovering from undesired states.