Data center crashes can cause huge problems for IT departments, but those issues often have very low-tech causes. What’s most often to blame when a data center goes down?
In a 2010 Ponemon Institute study, 95% of data center managers said they’d suffered an unplanned outage at least once in the two years prior. Those likely caused big problems, as 51% of respondents agreed that every single application in their organization’s data center is mission critical.
Many of those crashes could have been prevented, according to the study. The most common causes of data center outages cited by the respondents:
- Failure of an uninterruptible power supply (UPS) battery (65%)
- Exceeding UPS capacity (53%)
- Human error, including an accidental emergency power shut-off (51%), and
- UPS equipment failure (49%).
However, data center downtime can also be caused by factors that are out of the organization’s control — and that includes many bizarre and unexpected causes of data center outages, as chronicled in a recent Wired article.
Some of the strangest factors behind data center failures include:
- Squirrels – Networking vendor Level 3 Communications estimates that 17% of their problems result from squirrels chewing through fiber-optic cable. In addition, a Google search for “squirrel outage” will turn up many incidents involving other companies.
- Hunters – Another common problem affecting data centers: bored hunters and gunmen using cables and other components for target practice. Apparently, that’s been enough of a problem for Google that the company has been burying cables leading to its Oregon data center.
- Thieves – Data centers often contain valuable materials, including copper wire and other metals. Therefore, criminals have been known to break into those buildings and take whatever they can.
Lessening the impact of data center outages
Given the number of different ways a data center can lose power or otherwise go down, it’s important for IT to limit the risks when they can and have ways to minimize the damage of events they can’t control.
Experts recommend IT departments:
- Make sure all critical data is backed up and recoverable — that means having a disaster recovery plan in place and testing it regularly
- Add uninterruptible power supplies for critical pieces of hardware and run regular power outage tests to make sure that equipment is working properly, and
- Work with department heads to figure out which operations are most critical and prioritize back-up and recovery plans accordingly.