We did have one client I believe makes an interesting case study. They had budgeted to do a server upgrade in 2018 after only 4 years of that server being in operation. However, that server seemed to have a critical flaw so we all agreed to replace it early. There had already been 3 or 5 hard drives replaced under warranty when yet another one failed in March. Lesson #1 is that if we had moved faster in implementing the budgeted upgrade this hardware failure would not have impacted operations at all.
Lesson #2 (If you’re not technically inclined please try reading Lesson #2 anyway.) is that RAID systems are supposed to survive a hardware failure but this isn’t always the case. The real world is not always so clean and pretty. Pretty much the client was down for an entire day as the hard drive failed yet it didn’t look like it had failed. The manufacturer’s log only showed a potential failure on one drive yet the situation was really bad. Basically nothing worked on that server for an entire day because 1 drive within a RAID system had an error.
Which reminds me of the incredible value of Lesson #3. A great team of people working together can get a lot done when called upon. Ever have someone on your team who gets a whole lot of things done without drawing a lot of attention to their work? I feel we are really fortunate to have Henry work for us in Customer Care who does just that. The volume of work and troubleshooting he took on that Monday was way above and beyond. Henry in the end made the judgement call the disk system should be checked for errors after hours. This step restored operations the next morning without having to go onsite or causing the client hours further or wasted troubleshooting and diagnosis. It was simply a great call under pressure. There is no instruction manual for this type of response and Henry did a fantastic job.