Data Intensive GRID Benchmarks
University Of California-San Diego, La Jolla CA
Investigators
Abstract
Efficient development and deployment of Grids will be advanced by defining a suite of benchmark to measure expected quality-of-service for Grid architectures and anticipated time-to-solution of Grid applications. In particular, the benchmarks to focus on the impact of alternative middleware implementations on application performance are needed. The outcomes of the project will be: (1) A set of low-level benchmark probes for measuring the performance of Grid infrastructure and the overheads of Grid middleware. (2) A synthetic applications benchmark suite embodying anticipated Grid usage scenarios of several emerging data intensive Grid applications. (3) Applications profiling tools that can summarize the resource usage patterns of Grid applications, and thus inform the refinement of an evolving synthetic benchmark as production applications mature. (4) A maintained website of performance data where Grid users can obtain benchmarks, view results reported by this group and others, and submit their own new results. (5) Enabled research (some as part of this proposal, but also by the wider community) into factor affecting Grid performance. As Grids emerge, it is important to deploy measurement methods along with them so that applications and architectures can evolve guided by scientific principles. All sciences need agreed upon metrics-a common language for communicating results. And a system, to be well engineered, must be measured so that alternative implementations can be compared quantitatively. Also, users of systems need performance objectives that describe system capabilities so that they can develop and tune their applications towards informed objectives. And system architects need examples of how the users will exercise the system to inform the design process. Benchmarks are thus an important part of the middleware for Grids, enabling communication about technological advances and design tradeoffs. The proposed work will develop a suite of data intensive Grid benchmarks to convey information back and forth between people building Grids and people planning to use them. The data-intensive applications form a class of problems requires access to multiple data archives remote from their computational resources. They belong on the Grid because they are already distributed (as opposed to traditional HPC applications that may or may not scale well to Grids). A benchmark suite to embody the resource requirements of these emerging applications is needed to help Grid architects understand how applications will stress infrastructure and middleware, and give users an early look at the quality-of-service they can expect for their applications.
View original record on NSF Award Search →