- Last week’s Monday and Tuesday (February 17th and 18th) and this week’s Monday (February 24th) in the afternoon we released new versions of Integrates that caused a failure in the continuous deployment of Integrates.
- We have a daily deployment that rotates some security keys that last only for 24 hours to preserve the security of our customers.
- In those previously mentioned days, the rotation jobs failed, so the keys used by the app were invalid, thus the access to all the projects was forbidden.
What we’ve done
- On all three days we manually triggered the rotation job early in the morning and, after that, we fixed the code that caused the failure on the pipelines.
What’s the impact
- The issue lasted approximately 5 hours (4:10-9:21) on February 18th, 3 hours (4:10-7:04) on February 19th and 3 hours (4:10-7:02) on February 25th. However, access attempts only occurred from 6:50 to 9:20 on the first day, from 6:44 to 7:02 on the second one and 6:26 to 6:58 on the third one, resulting in a total real affectation of around 3.5 hours.
- Approximately 45 of our users were unable to access their projects on February 18th, 19th and 25th in the morning.
What we are doing to help
- We are developing an automatic test that checks the access to the projects and that notifies us by SMS and mail whenever it fails.
- We are increasing the availability window of the old keys to two days, in this way, if a nightly pipeline fails, we have 24 hours to fix the errors without causing availability affectation to our customers.
- With this announcement, customers are being notified that this was an internal error caused by two failures on deployment pipelines and of what we did to fix it.