STI: Network Measurement, Monitoring and Analysis in Cluster Computing
University Of Illinois At Urbana-Champaign, Urbana IL
Investigators
Abstract
High-speed networks are vital to next-generation Grid applications, enabling rapid access to remote resources and allowing users to mask latency via aggressive data staging. This project intends to build tools that will allow the collection, analysis and visualization of the network interactions of nodes within a cluster. Further, the investigators anticipate building and publishing what will become the foundation for a body of knowledge about the characteristics and limitations of cluster networks. The outcome of this project should also provide those involved in the design and implementation of cluster networks both the tools to make new measurements and empirical data about how these networks operate. To ensure that the tools developed are able to interoperate with the largest variety of cluster environments, the authors propose a modular and extensible system of data collection and processing. In the proposed configuration, the tools created would determine the status of the network from several different data sources at a variety of granularities. These data sources include information obtained from the switching hardware in the network, passive tapping of links, instruments in the networking stacks of the nodes, and instruments in the message passing libraries. These tools may be repackaged and used for a several purposes that extend beyond the primary scope of the project. Possible purposes include monitoring the health and status of an operational cluster, aiding an understanding of a particular application for user support, detection of suspicious traffic patterns for security, and detailed accounting of the utilization of the networks in a Grid environment. The networks that form the interconnections between the elements of the Grid are similar in many ways to the networks that have been built for general use by the Research and Education Community (such as the vBNS and Abilene). However the ways in which these networks are used is significantly different and thus require the development of tools and methodologies specifically designed for the distributed cluster and Grid environment. The capabilities of traditional tools and methodologies will be harnessed and built upon as is appropriate. The NSF Distributed Terascale Facility, TeraGrid, represents the pre-eminent laboratory for exploring the network implications of distributed and Grid computing. NCSA and Argonne designed the State of Illinois funded I-WIRE [4] DWDM network in the Midwest, and have together with SDSC and CalTech, designed and have deployed the TeraGrid.s 40 Gbit/s national scale wide area network. NCSA is a major TeraGrid site and has strong working relationships with the other sites and is in an excellent position to instrument TeraGrid with the tools developed by this project. Looking beyond TeraGrid, The Extensible Terascale Facility (ETF) will bring networks with TeraGrid speed networks to clusters and other high performance computing systems throughout the research and education community in the US. NCSA will be actively involved in the development of the ETF, which will also benefit greatly from the proceeds of this project. NCSA has a rich history of high-speed network research, development, and deployment. wide area network performance optimization tools and assists with high-performance applications.
View original record on NSF Award Search →