Modeling and System Support to Balance the Resource Demand and Supply in High Performance Computing
College Of William And Mary, Williamsburg VA
Investigators
Abstract
Xiaodong, Zhang 0405909 High performance computing in multiprocessors and clusters has been mainly characterized for performance evaluation and guided for performance optimization by communication-oriented models, such as the LogP model, which considers the communication latency as the dominant factor contributing the performance degradation. With the rapid development and advances of commodity processors and network technologies, a modern cluster is equipped with fast interconnection networks, where each node has a high capacity with increasingly fast CPUs and larger memory. Unfortunately, the speed gaps between the CPU and the memory and the I/O storage continue to grow, seriously limiting the cluster computing efficiency. Since the dominant bottleneck concerns have been dramatically changed from communication bandwidths to memory bandwidths, there are several limits of using a LopP-like model. First, balancing and well-utilizing the CPU, memory and storage resources in clusters is a serious issue to be addressed because this is a major source limiting the sustained performance. Second, one common phenomenon due to the large CPU and memory speed gap is that CPU cycles are over-supplied while the memory and I/O bandwidths are highly demanded and not sufficient. The research in this project will address the growing concern of unbalanced resource demand and supply in cluster computing, we propose several related research projects. The first objective is to develop memory hierarchy oriented analytical performance models and experimental tools to quantitatively provide the insights into the resource demand and supply in high performance cluster computing, which guide users and computer architects to optimize their system designs and program implementations. This will be a general model covering both communication and memory effects. The second objective is to concentrate our efforts on two critical issues to improve the sustained performance: (1) locality exploitation and (2) latency reduction beyond the on-chip cache level by proposing two novel and cost-effective memory system designs and their implementations. The project will design and build a structured PSP scheme and its implementation in the cluster to decentralize the resource management. These methods will be tested on three types of large real-world and data intensive applications: the CFD computation, a direct numerical simulation of turbulence, and Internet multimedia data delivery.
View original record on NSF Award Search →