COLLABORATIVE RESEARCH: Data Mining Meets I/O Performance Evaluation: Advanced Statistical Tools for Analyzing Bursty Traffic
University Of California-Santa Cruz, Santa Cruz CA
Investigators
Abstract
The goal of this collaborative research project involving Christos Faloutsos and Ngai Hang Chang at Carnegie Mellon (award 0083148) and Tara Madhyastha at U of Cal Santa Cruz (award 0083130) is to develop and apply statistical and datamining tools to analyze bursty time sequences, with emphasis on I/O traffic optimization. The interdisciplinary team includes researchers in computer science, computer engineering and statistics, and industry collaborators. The approach has three parts: (1) advanced statistical tools using the ``ARFIMA'' method; (2) wavelets and the related ``80-20 law'' to model disk traffic; and (3) incorporation of these models inside the so-called ``Active Disks'', with the goal to build self-tuning, adaptive disk subsystems. The results will advance data mining and statistics as well as disk design. An easy-to-use toolkit "T-REX" will aid in I/O and systems design, handling bursty traffic, and better buffering and prefetching. The theory behind the T-REX toolkit will be based on new data mining algorithms and statistical methods that model self-similar time sequences (like web and network traffic, in addition to I/O traffic). The research team has strong ties with database, data mining and disk manufacturing industrial groups, and this will aid in testing the research toolkit and its technology transfer. It can be expected that the T-REX system will significantly aid the design of disk sub-systems with beneficial impact on the storage industry.
View original record on NSF Award Search →