Data Centers need responsibility
Germany's leading news site is offline, and a public service radio station in the south west of the country cannot publish any up-to-date weather news on its website because the partner who supplies this data is also offline. The reason? A power failure in a major German data center.
Structured risk management concept for critical technical environments
Banks, hospitals, airlines, administrative bodies, universities, online services – the list of companies that rely on smoothly functioning data centers is never-ending. In particular, companies whose day-to-day business relies heavily on the technical infrastructure cannot afford any downtime, not even of just a few seconds. Functioning data centers are urgently required as part of the basic infrastructure, especially in the financial services, manufacturing, logistics, trade, administration, education, R&D and service provision industries.
This is a challenge not only for servers and data networks, but also for the infrastructure contained within this highly sensitive technology. What is needed, therefore, is a structured approach to responsible facility management in data centers, a concept that takes all these factors into account and assesses them in an integrated manner. The top priority of FM in the heart of the data infrastructure is to ensure availability, fail-safe operation and efficiency, 24 hours a day, 365 days a year. With a management concept that is unique in the industry, Apleona HSG has set itself the goal of identifying and eliminating potential causes of downtime in both the power supply infrastructure of its customers and within the operating processes. This increases the reliability of the customer's critical environments and minimises the risk of downtime.
Developed by Apleona HSG, the “4P Critical Engineering Framework” is a structured risk management concept that has been designed for the operation and management of critical technical environments to guarantee maximum possible system reliability. This is vital in a society where risk management in critical environments is no longer just an option but an indispensable part of safeguarding processes as well as a company’s integrity and earning power.
Right from the outset, the 4P Framework was based on the customers’ desire for a holistic concept for operating critical environments that can be flexibly adapted to the conditions in different systems and locations. This laid the foundation for the 4P concept, which has since become part of Apleona HSG’s corporate culture. This special approach to supporting critical environments is based on multiple, independent focal points and elements, all of which help to protect the customers’ business operations.
These four focal points – “people”, “performance”, “plant” and “processes” – together form the basic framework of the 4P concept, whose core ideas and main goals ensure that
- operational risk is significantly lowered;
- risk awareness and an awareness for risk minimisation within the operator team is increased;
- the risk of operational downtime due to incorrectly performed work on critical technical environments can be properly evaluated;
- effective monitoring tools designed to reduce the risk of downtime are used;
- systems are maintained and monitored according to their criticality;
- employees who work on critical technical environments are aware of the significance of downtime and its potential impact on business operations;
- effective, comprehensive and tried-and-tested best practices are employed in the operation of critical technical environments and handling of unexpected events;
- replacement parts for critical technical environments are managed effectively to exclude the risk of downtime due to missing spares;
- risk transparency and the visibility of potential risks is improved;
- a company-wide, uniform approach is established through specific, clearly defined processes;
- continuous learning supported by review and feedback processes takes place.
Focal point: People (Employee management)
According to various studies, up to 75% of all critical events and system downtime in data centers are attributable to human error or procedures that rely on human action and interaction. Faults attributable to human error can occur in any phase of a critical environment’s lifecycle – during planning, construction or operation – and are the greatest source of system downtime in the infrastructure. One of many examples is an employee who has not been properly trained or briefed, who works in a critical environment and, with the best of intentions, presses the wrong button at the wrong time, accidentally shutting down the entire system.
This is why Apleona HSG’s 4P concept attaches particular importance to employee management. After all, only a well-thought-out approach can ensure that employees have the necessary skills to identify, report and actively minimise risks. However, importance is also attached to detecting skills deficits and highlighting the necessary training measures to rectify these (e.g. further training measures for employees) so that the employees can deploy new technologies and methods more quickly and accurately assess the risk of operational downtime.
All the people employed in Apleona HSG’s data centers require a sound knowledge of the critical environments and the associated infrastructure in their different areas of responsibility. They must also be aware of not only the operational requirements but also the potential impact of their actions on the customer's core business. After all, operating highly technical and complex critical environments requires highly qualified employees.
To maintain performance at the very highest technical level, employees need to receive ongoing training. This can be done through individual and team training programmes in which soft skills like communication skills and hard skills like technical skills are taught and tested.
Training is particularly effective if it is performed on the systems to be supported. Knowledge is conveyed not in isolation but in a real-life situation and within the context of the overall system. This object-specific knowledge helps to reduce the risk level for the customer's critical technical environments.
All these training and further training measures then form the basis for the company’s own employee certification as a “Data Center Operator”. These certification processes are carried out in Apleona HSG’s dedicated competence center in a training center for data center technicians set up specifically for this purpose.
Focal point: Process (Process control)
The management process for process control in critical environments ensures clear and uniform process standards throughout the entire company, strict compliance with these standards and high risk transparency thanks to a robust process framework and continuous feedback and learning.
The operational procedures and processes defined in 4P are based on best-practice solutions for operating critical environments. Of course, the processes will sometimes have to be adapted to local requirements. In such cases, Apleona HSG uses these for orientation and adapts the sample processes to integrate the local working practices and operating regulations in the procedures.
The operational processes are continuously tested and optimised to ensure that they meet the current standard for handling critical environments. Implementation of these operational processes is also regularly tested in the field to ensure that they are standardised and observed throughout the company. Since a lack of knowledge concerning the operation of critical infrastructures can have critical consequences for business, particular importance is attached here to areas such as documentation, processes and employee training. To minimise the risk of errors, all available opportunities are leveraged to ensure that employees receive proper training in all key processes during system launch phases or during induction.
Focal point: Performance (Performance regulation)
Many companies today rely on their critical infrastructure to transact their core business, which is why these systems have to be continuously monitored and carefully operated and maintained so that the central business activities are safeguarded thanks to the integrity and durability of the systems and not in any way compromised. Or to put it more simply: once the electrical supply system or cooling system has reached its performance limit, this puts the core business at risk. Beyond this point, the critical environments to which power must be supplied cannot be upgraded unless the supply infrastructure is sufficiently developed. As an operator of these sensitive, high-tech environments, Apleona HSG is responsible for performing effective capacity analyses of the power supply systems, clearly evaluating the supply infrastructure's technical status as well as continuously analysing and improving its own performance.
In addition to availability, the energy consumption of critical environments is a focal point because critical environments are in continuous operation. Calculating, visualising and optimising the energy performance of critical environments has therefore been firmly anchored in the “4P Critical Engineering Framework”, as has the concept of maximum possible availability.
Focal point: Plant (Plant operation)
Planned preventive maintenance work is essential for maintaining the value of critical environments. For specific critical environments, advanced maintenance strategies that have been aligned towards the system's specific risk profile and specially developed for it are necessary.
Just like preventive maintenance, the approach to repairing critical technical environments plays an important role in reducing business-critical downtime. The processes underlying these activities are crucial in ensuring safe and efficient repair or re-commissioning work. If replacement parts cannot be made available immediately, this can have a major impact on the customer's core business. Suitable processes for managing critical replacement parts are required in order to ensure that replacement parts for critical environments are available at all times.
This immediate provision of replacement parts at the place of use, transparent use of these parts as well as real-time inventory data require precise and customised inventory management processes.
The availability of critical replacement parts is a key element of system operation. Proactive control and continuous monitoring of capacity utilisation is essential for ensuring the constant availability and reliability of the power supply infrastructure – a fact that is often overlooked. It is therefore important to have the capability to evaluate the potential risk of capacity overshoot, which could severely compromise reliability and availability.
However, we must be honest and admit that operational downtime cannot be completely prevented, no matter how well maintained the system. Even if many risks can be reduced thanks to a well-thought-out facility management concept and the adoption of the 4P approach, downtime attributable to human error cannot be completely eliminated. Training employees in these specific processes of system management, and thus anchoring these processes in their minds, is a decisive factor in reducing the risk of downtime and an important part of training that all technicians at Apleona HSG regularly undergo.
Responsible facility management in critical environments such as data centers thus involves not one single component, but is the result of the interaction among a combination of intelligent concepts encompassing skilled and motivated employees, standardised processes and effective system operation. And with the right Facility Manager, maybe that news site, that radio station or that meteorological service might never have suffered the downtime they did on 28 March 2017.
4P Critical Engineering Framework is the name of our service concept for data centers and critical environments
Here you can download the article as a PDF-file