Card Thermal and Electrical Protections - DS1013

Alveo V80 Data Center Accelerator Cards Data Sheet (DS1013)

Document ID
DS1013
Release Date
2024-05-08
Revision
1.0 English

Built-in shutdown logic protects the card from damage by removing power to the device when either electrical or thermal limits reach or exceed their respective shutdown thresholds. The voltage regulator module (VRM) monitors the VCCINT current and temperature. When any of the thresholds are exceeded, card power is removed. A cold reboot of the server hosting the card is subsequently necessary to reload the device configuration and re-enumerate the card in the server. The following table lists the card shutdown thermal and electrical thresholds of the VRM. The temperature thresholds apply equally with and without AUX power connected.

Table 1. Thermal and Electrical Protection Thresholds
Sensor Description Card Shutdown Threshold
VCCINT Current
  • 60A (no PCIe AUX power)
  • 180A (one 2x4 PCIe AUX power)
  • 360A (two 2x4 PCIe AUX power)
VCCINT Temperature 125°C

There is no external system controller on the V80 card that is monitoring voltage and temperature thresholds like previous Alveo cards. While the VRM protects the card from physical damage, precautions must be taken to avoid these limits to prevent a system failure. This includes installing the V80 card in a server that provides sufficient managed airflow as well as preventative in-system monitoring of temperature and voltage via the design on the Versal device or via communication with the host (PCIe in-band or BMC Out of Band).

The Alveo Versal Example Design shows an example of RPU firmware that collects telemetry on the card and can be used as a starting point for a thermal and electrical monitoring solution. It is the user's responsibility, however, to ensure adequate protection measures such as clock throttling are implemented for their application to avoid the system failure resulting from hitting the shutdown thresholds. Refer to the Alveo Versal Example Design documentation for more information on the example implementation.