On Friday, December 8th at 11:00pm UTC, we noticed that one on our visualization servers had become unresponsive. Upon further investigation, we found that while the VM was still on, the web requests to that server were unsuccessful. This issue was due to a hardware failure, which forced the machine to reboot. Upon reboot, the Windows OS became unresponsive due to a partially installed update.
Over the past months, Unified Logic has been going through the process of becoming SOC2 certified to provide our customers with better security and compliance, which entails many steps and lots of processes, but also means we have been making adjustments to our policies. One of those policies concerns patching our VMs. As a result, we are updating our Windows and Linux servers more frequently and regularly. This also enforces that all our VMs must download and install updates without forcing a reboot. When a reboot is required, we wait for an allowable maintenance window, and then proceed.
How we Addressed the Issue
After identification of the problem, we immediately contacted Azure support and continued to troubleshoot the system in the meantime. Due to discovery occurring on a Friday, the ability for supported troubleshooting was extended until the next business day, after the weekend. After 1 business day of unsuccessful attempts to bring the Windows OS back to its original state, we decided to rebuild the system and restore the affected customers from backups. Full restoration was completed Tuesday the 12th.
We had clearly underestimated the impact of this issue, especially since this particular VM hosts our Movere Demo account. We now understand the impact that this had on our partners and our customers and want to offer our sincerest apologies for the interruption in daily business that this caused.
What We’ve Changed
As a result, we have implemented some new precautions to ensure this issue is not repeated:
- Service level monitoring for all of our services including web-facing visualization services
- More robust VM level monitoring
- Change in auto-install policy for updates
These precautions will allow us to detect anomalies or disruptions in service from the moment they occur.
Future outage communications will include:
- Notification of outage
- If possible, source of the outage
- Scope and impact of outage
- Status of customer data and privacy
- If possible, time to resolution
We will also be hosting a second Movere Demo account on a separate VM instance to ensure that in the event of an issue with one, a second will always be available.
We sincerely appreciate the patience of our partners and our customers as we grow in our cloud journey. At Unified Logic, we value transparency and sharing knowledge is a core value. We want to ensure our partners and customers have the opportunity to understand what happened if we have an issue, as well as learn from our mistakes.
We value our partners and customers tremendously and are incredibly excited and optimistic about what 2018 will bring. We wish you and your families the Happiest of Holidays and send you our sincerest appreciation for your continued support of Unified Logic and Movere.