CS 498, Hot Topic in High Performance Computing), University of Illinois at Urbana Champaign, Computer Science Department

Hot Topics in High Performance Parallel Computing: Networks and Fault Tolerance. Large-scale computer systems such as Petascale or upcoming Exascale machines pose significant challenges on the system and software designers. In this course, we will address to very important topics in this design: HPC networking and Fault Tolerance. The network will soon be the most expensive and critical part of large machines and fault tolerance is needed to ensure correct operation under the increasing probability of failures of single elements. This course requires basic knowledge in graph theory and system architecture.

  • Franck Cappello (Fault tolerance)
  • Torsten Hoefler (Networking)
  • Marc Snir


  • Thomas, Ropars, INRIA
  • Ana Gainaru, UIUC/NCSA
  • Leonardo Bautista Gomez, Titech

The Joint Laboratory for Petascale Computing includes researchers from the French National Institute for Research in Computer Science and Control (INRIA), the University of Illinois at Urbana-Champaign's Center for Extreme-Scale Computation, and the National Center for Supercomputing Applications. The Joint Lab is part of Parallel@Illinois.