To Lie or Not to Lie – Transparency and Trust in Data Center Crisis Management
eco DCSA Auditor & independent consultant Gerd Simon, on vital processes and communication for managing data center crises, and the impact on business value.
dotmagazine: How can a data center operator manage their risks and can this have an impact on the value of the business?
Gerd Simon: It is the attitude of top management that clearly differentiates between a data center leader and the rest of the crowd. This attitude affects the entire value of a data center business, and brands may go bust if they do not acknowledge this.
There are many technologies that enable better tracking and data collection to improve stability and quickly identify problems, including AI tools. There are many routines and management tools that help data center teams to structure processes. But there's no way out if the data center management doesn't listen to the subject matter experts and to the data. The question is not whether an outage will occur; the question is when it will happen. So you need to have a team spirit that is not afraid to talk about and act in crisis mode. And you need subject matter experts who know what they are doing and who can analyze and understand the data.
It is important to improve the availability rating of your data center infrastructure usage. The well-known five 9 formula may help (the closer you can get to 100% availability, the more reliable your infrastructures get). Typical designs deliver anywhere from two 9s up to five 9s, or in other words, the theoretical risk of having a downtime varies between 87.6 hours a year and 53 minutes a year. The more 9s, the lower the potential risk. The better the availability, the better the perceived maturity of your infrastructure. This does not translate into direct business value unless you calculate the cost of downtime in order to justify risk mitigation. Redundant infrastructures help to cope with operational aspects, keeping up the availability, but redundancies might be too costly. So, a redundant design helps. But only if the operator does not poorly manage a good design.
Meta certificates may help to assess the developments of the infrastructure’s robustness. The eco Datacenter Star Audit (DCSA) is one based upon EN50600, which focuses more on the operational aspects of a data center than the initial technical design. This helps the data center teams to cross-check their own perceptions.
dot: What are the most important areas that a data center operator should focus on in dealing with risk minimization?
Simon: Take care of your maintenance and perform it in time and in a high-quality manner with your suppliers. But regular, realistic maneuver conditions for data center operations at 3 am on Saturdays or Sundays are best, targeting power, cooling, security, or other areas like connectivity and data safety. But you have to involve marketing, sales, and management – and sometimes your suppliers and your customers, too. What you want to achieve is a winning team that takes great care and has great respect for each other, that has a low heart-rate and high concentration. And you need to earn trust.
Have a clear line of command when a data center is not in regular operations mode, and create trust. If this does not happen, problems emerging from a crisis may not be solved in the early stages, and the situation may become chaotic, without any strategic crisis situation or crisis management plans.
Heed and respect local norms and regulations. Use local construction companies that know the local laws and regulations as much as you can. And have your data center manuals and security documentation ready, online and offline – in local language. Sufficient training of staff is essential, helping the data center teams to acquire knowledge on how to analyze a situation and quickly search for alternative solutions when needed. The action-based approach supports the top management not only to assess the dynamics within a crisis situation, but also helps them to become familiar with fast decision making, and of not being afraid to make changes.
dot: What is best practice in the event of an emergency in a data center?
Simon: Keep cool and check off all your parameters in your area that provide you with facts and figures. If your area is not affected, fine. If it is affected, start remedial action. Report along the command line. As said, perform your duty with a low heart-rate and a lot of concentration. Tell others what you are doing, and do what you say.
dot: What are the most important assets of a data center operator?
Simon: Happy customers! Only these will buy again and maybe more. So, the real internal assets are the experienced team members of a data center operator, the spirit of learning within the organization, and the leadership attitude of the management team.
dot: How important is a geo-redundant strategy for keeping data safe? And how can smaller operators achieve this?
Simon: Geo-redundant strategies help to improve data safety and at the same time keep costs down because you increase your safety by multiplying the same data – which can be cheaper than investing into infrastructures that represent sleeping investments. Regional natural catastrophes like earthquakes and storms will then not have an impact on your access to data. And you get more time to take care of your remedial action. But if you work with various parallel data hubs, you should make sure that your data is mirrored in three locations, accurate in almost real time, regardless of where you access it from. Smaller operators can achieve this by forming partnerships with data center players in other regions or by adding hosting or private cloud capacities.
dot: Is there anything else you would like to add?
Simon: Let me finish by saying: The data center business is a trust business. Therefore, data center operations is a people business. As a result, perception is everything. Data center operations may fail once in a while. Anyone who says that will never happen is a liar.
So, the biggest nightmare of a data center operator is experiencing a downtime. Always on, 100% uptime: These are the marketing buzzwords that datacenter operators put on their websites, because customers want to hear this sweet talk. Reliability and availability are essential each and every second. Transparent communication at all times matters, too. And the talents within a data center team make the difference.
Gerd Simon is a trusted adviser for digital infrastructure investment, an auditor, a senior analyst, and a business leader. For more than 25 years, he has been creating and enabling digital infrastructures, mainly in Europe but also further afield. His focus was and is to create GTM models and also to take care of business strategy implementations, developing business conceptions from scratch through strategic and operational business development. He has been working in TMT markets since the mid-nineties and has a broad network in the Internet, cloud, and data center data area.
Please note: The opinions expressed in Industry Insights published by dotmagazine are the author’s own and do not reflect the view of the publisher, eco – Association of the Internet Industry.