If Rodney Dangerfield ran a data center he might have said, “Preventive maintenance gets no respect.” Of course, everyone but Rodney the data center manager knows that maintenance is critical to maintaining mission-critical systems in top condition.

Or do they? Recently I started following an engineering blog. Some of the posts I’ve seen are shocking. For example, one contributor cautioned against testing automatic transfer switches (ATS) because ATS have many moving parts that may work out of adjustment while testing.

I would submit that if testing an ATS might knock its parts out of adjustment, your mission-critical facility depends on an inferior product. The best time to find a problem is during a controlled activity, like maintenance or testing, rather than in a true emergency when performance of these systems is critical. As far as units having many mechanical parts, what piece of equipment doesn’t?

Each day we hear the bleak economic news about layoffs, production cutbacks, reorganizations etc. In this climate, if you maintain or care for mission-critical infrastructure, your stock could be on the rise.

How often should you perform maintenance? There is no standard answer. Most electrical gear manufacturers recommend annual maintenance and testing. However, I believe testing frequency depends on two key factors:

  • Statutory considerations may mandate more frequent testing. Hospitals, for example, must test monthly to comply with accreditation authorities.

  • High usage may drive more frequent maintenance or testing.


Another Word About Testing

Testing by itself is good, but it may provide a false sense of security. After all, testing really only proves that the system worked when it was retested. On the other hand, in my opinion testing following maintenance is mandatory. Performing pre-planned audits after maintenance ensures that controls, circuit breakers, and system configurations are returned to the pre-maintenance condition. At the end of the day, it’s your data center. Take, for example, the UPS vendor who completed maintenance on the battery plant and forgot to close the battery disconnect switch. To use the phrase probably coined by Mr. Otis, “Going down?”

Preventive Maintenance Tasks

The common preventive maintenance tasks include thermographic scans, visual inspection, mechanical operation, and checking setpoints. These do not have to be intricate procedures, but a few tricks make them very effective.
  • The thermographic scan is simply an infrared picture or video of an electrical device under load. The properly trained thermographer equipped with a calibrated, properly adjusted scanning device will be able to interpret heat signatures that indicate problems. For example, take an ATS feeding a transformer with a balanced load. The heat signature should indicate a calculated rate of rise and be consistent across the phases within a small variation. A scan that shows that one phase has a material difference in temperature migrating up the conductor probably indicates a loose connection.

  • Understanding what to look for is key, so the trained eye might find something not readily visible, which could eventually cause a failure. For example, are there signs of corrosion, moisture etc? Are the mechanism and the enclosure clean? On new installations, if construction debris can be seen on the bottom of the enclosure, it’s also in the mechanism.

  • Once de-energized, does the mechanism operate freely? Sluggish operation can indicate a number of problems ranging from lack of lubrication to deterioration of assemblies. Your service professional should have the experience to understand the difference and solution to make sure the ATS will operate when called upon.

  • Many systems depend upon the correct dynamic interface of multiple systems. The emergency power system of a data center consists of ATS, engine generator sets and their supporting systems (fuel, coolant, intake louvers, batteries, exhaust etc.), UPS, and power distribution. Accordingly, set-point adjustments of the individual pieces of this system are critical to seamless operation. For example, consider the ATS voltage drop out set point (the point at which the ATS controls consider the incoming or normal energy source unacceptable) adjusted higher than the same relative adjustment on the UPS. In this case, the UPS may sense utility sag and transfer to battery but the ATS does not recognize the condition, never starts the engine generators, and the UPS runs out of battery resulting in a failure. Another example may be the governor of the engine generator not reacting properly to the UPS block load. This could result in an unacceptable dip in output frequency and the UPS may revert to its battery. Once unloaded, the engine generator sets recover and the cycle begins again. This will continue until the battery or some other system fails.


Comprehensive Maintenance

Comprehensive maintenance monitors and compares system set points against history and as a result of maintenance adjustments. Checking these set points means tracking a number of power parameters and monitoring controls.

It is necessary to measure and track the normal and emergency load voltage and current. These parameters can change over time. Recording these levels routinely and comparing the changes may tip off changes in the supply of either source or the load. There are also some very important measurements to be taken without power. Digital low-resistance ohmmeters impress current across a connection or contact point and then measure the joint or contact resistance in micro-ohms. While industry standards define acceptable levels, these measurements are not absolute indicators; however, the analysis of the thermographic scan, current measurements, heat measurements, and micro-ohmmeter readings correlate to the ohmmeter readings and may reinforce conclusions about the condition of current-carrying components.

We live in a world driven by software. Most control platforms incorporate software or firmware. As with software on your PC or laptop, the revision level of control software may be significant. Your service professional must understand what various revision levels represent. Not having the latest revision does not necessarily mean you need to update. In fact, given the dynamic interaction of system components, indiscriminately updating software may introduce a problem. The fundamental fact is when changing a control function, be it relay logic, discrete components or programmable logic control, you have to consider every path that may be affected. The road to hell is paved with good intentions and unintended consequences.

Upgrades

With Cap Ex funds tighter than ever, refurbishing or upgrading legacy systems makes good sense. Your OEM is your best source of information on available programs. The availability of legacy control platforms is another driver.

As markets change, competition and price pressure drive development of a better mousetrap. As manufacturers seek better solutions and value engineer their products to reduce cost and maintain necessary margins, new components become state of the art (at least according to the literature and advertisements). Regardless, demand increases once engineering, consulting, and manufacturing communities accept these components and sub-systems. As demand for new platforms increases, support for older platforms decreases. Manufacturers plan for end of life, and at some point in time older control platforms become hard or impossible to obtain. In a word, they become obsolete, which is not the situation you want to be in at o’dark thirty.

Understand where your mission-critical systems are in their productive life and the availability of spare parts. If you find that you’re on precarious ground, consult your service professional. Some manufacturers provide non-invasive solutions. Replication of obsolete controls with today’s standard may be an attractive alternative to replacement, especially in light of economic conditions

Maintaining your company’s mission or should we say business critical infrastructure is critical to survival. Do your research, become informed, drive improvement.