Joint Laboratory Research

Current trends in high-performance computing will increase the number of hierarchical levels (CPU cores, multiprocessors, cluster of multiprocessors, clusters of clusters, and grids) and the mix of memory models (shared caches, shared memory, and distributed memory) in supercomputers. This increased level of complexity creates significant opportunities to develop new software for petascale computing from low-level communication mechanisms to high-level numerical methods.

The Joint Laboratory for Petascale Computing focuses on these opportunities, including numerical libraries, fault tolerance, and new programming models.

Numerical libraries

The Joint Lab will design parallel linear algebra algorithms that are at the core of many scientific applications. Simulations created with these applications frequently require solving very large sets of linear equations, eigenvalue problems, or large least squares problems, often with millions of rows and columns. Solving these problems is very time consuming, and the focus of the Lab's research is the design of faster algorithms, which maintain the numerical accuracy of existent algorithms. These algorithms will address the fact that contemporary supercomputers use tens of thousands of processors and the programming and latency issues that come with that massive parallelism.

The Joint Lab will estimate the performance of these new libraries in several real-world scientific applications.

Fault tolerance

Petascale systems have reinvigorated interest in managing hardware failures in such systems and ensuring that large jobs can be completed even in the face of these failures. Fault tolerance research at the Joint Lab will include both traditional approaches to these challenges and new ones.

Potential topics to be explored by the Joint Lab include:

  • The origin of failures in large-scale systems.
  • Failure prediction and proactive migration of tasks before failures occur.
  • The performance of checkpoint-restart protocols.
  • Techniques for checkpoint size reduction.
  • The costs and benefits of dedicating processor cores to fault tolerance operations.
  • New fault-tolerance strategies based on techniques like diskless checkpointing, speculative execution, software transactional memory, and others.

Programming models

Novel supercomputers feature deeper memory and computational hierarchy as well as hybrid memory models. Traditional high-performance computing environments featuring single memory models are popular, but they may not match the requirements of the next platform generations. PGAS programming languages also have limitations in dealing with many hierarchical levels. One way to explicitly deal with hierarchy and hybrid memory model is to mix programming models in a single code. But these hybrid programming models are still difficult to use.

The Joint Lab will evaluate and compare the performance of the different combinations of programming models using state of the art applications and application kernels. It will also compare the performance of the hybrid programming approaches with flat and PGAS languages and investigate the use of numerical libraries to improve application performance.

The Joint Lab will explore a workflow language to handle the distribution level of codes made of dependent computational blocks and to propagate end-user expertise through the different levels of the hierarchical architecture.

The Joint Laboratory for Petascale Computing includes researchers from the French National Institute for Research in Computer Science and Control (INRIA), the University of Illinois at Urbana-Champaign's Center for Extreme-Scale Computation, and the National Center for Supercomputing Applications. The Joint Lab is part of Parallel@Illinois.