Case Studies

Data Center Over-Temperature Event

J.S. Held Acquires Shechter & Everett to Expand Forensic Accounting Capabilities for Family Law Disputes in Florida

Read More close Created with Sketch.
Home·Data Center Over-Temperature Event

The Situation

A major data center experienced an HVAC failure, resulting in temperatures exceeding 120°F. The over-temperature event could have impacted more than $40 million in servers, storage arrays, network equipment, and other IT equipment. As a result of the HVAC failure, all systems either automatically powered off when internal temperatures reached relevant limits or were eventually powered off manually by staff.

Some original equipment manufacturers (OEMs) condemned the IT systems, recommending replacements and voiding warranties due to the event. As a result, the facility’s Insurer retained experts from J.S. Held’s Equipment Consulting Practice to assess the damage and provide recommendations based on J.S. Held’s inspections, analysis, and discussions with the Insured and OEMs.

How We Advised

J.S. Held data center equipment experts conducted thorough assessments of the impacted equipment, which included analysis of error log data. With a few exceptions, all of the equipment demonstrated no evidence of visual damage, and error log data demonstrated that the systems either automatically shut down and entered safe mode or the internal temperatures did not exceed out-of-specification temperature levels.

However, the analysis did identify several systems that were subjected to excessive temperatures and suffered internal failures. The error log data showed that temperatures rose quickly, especially on the GPUs, indicating thermal stress or inadequate cooling. A simplified summary of error log data for one damaged system is shown below:

  • CPU Activity: The processor speed jumped significantly (from ~t625M Hz to ~34 71MHz), indicating a shift from an idle state to an active state.
  • CPU core temperatures rose from 117°F to as high as 145°F, showing increased thermal output as workloads intensified.
  • CPU load percentage spiked from 13.7% to 84.8%, then stabilized around 50%, suggesting a burst of activity followed by sustained moderate usage.
  • GPU Temperature: The GPU temperature climbed steadily from 117°F to 183°F, reflecting increased graphics processing demand or poor cooling efficiency.

Based on our experts’ review and analysis, multiple systems were determined to be viable for continued usage, with other systems needing component or full replacement based on error log data. This analysis not only saved the facility the cost of replacing unaffected equipment but also enabled the Insured to return to operation expeditiously.

Related Practice Areas

> Information Technology
Our Information Technology (IT) experts evaluate systems ranging from desktop environments to expansive, multi-million-dollar data centers. We deliver objective and independent analyses of IT claims based on years of industry experience, technical expertise, and market knowledge.

> Equipment Consulting
J.S. Held’s equipment experts deliver support on matters ranging from daily desktop reviews to multi-million-dollar, complex technology claims. Our team leverages years of experience handling a variety of specialized equipment and systems.

Key Contacts

For additional information about the engagement or to learn more about our services, contact:

Scott Armstrong
View Bio
Scott Armstrong
Executive Vice President | Equipment Practice Lead
Our Experts