Exacluster with 144 Nvidia H200 AI GPUs detailed by its designer: Hydra Host enters the scene

Earlier this month, we reported on ExaAILabs’s Exacluster, a cluster of 18 machines running 144 Nvidia H200 GPUs, which happens to be one of the first clusters based on these processors. Since then, Hydra Host, the company that facilitated the construction of the cluster, has given us additional details about the system. The cluster uses Lenovo systems with multiple customizations from Hydra Host, which played a significant role. The machine can also be rented — when not in use by the owner — through Hydra’s Brokkr platform.

A Lot of Compute Power

The cluster’s backbone consists of 18 Lenovo nodes equipped with 144 Nvidia H200 GPUs and 20TB of HBM3E memory — or eight per system — enabling compute performance of 570 FP8 PetaTOPS for AI. 16 nodes are configured and fine-tuned by HydraHost for training, which requires massive computation and memory performance, while the remaining two serve as inference nodes. In addition, Hydra Host installed its Brokkr platform for GPU provisioning, management, and remote renting (more on this later).



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *