GGrantIndex
← Search

CAREER: Improving Storage System Performance, Dependability and Manageability Using System Mining Techniques

$449,405FY2004CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

Technology trends indicate that today's computing is becoming more and more data-centric. The widespread use of devices and services has created unprecedented demand to store and retrieve information. The current annual growth of storage demand is 60%. By 2008, the average data center will manage 10 times as much data as it does today. According to a recent study conducted by UC Berkeley, the annual storage demand is roughly 1.5 exabytes of storage, around 250 megabytes per person for everyone on earth. To satisfy the increasing data service demand, modern storage systems need to address three challenges: (1) performance, delivering satisfactory performance to keep up with the rapid growing processor speed; (2) dependability, providing reliability and availability to minimize data access loss, which currently costs companies more that $250,000/hour and one-third to one-half of a company's total IT budget; (3) manageability, simplifying storage administrator's jobs to reduce the storage maintenance cost, which is currently almost nine times the storage equipment purchase price. This proposal addresses these three challenges. It investigates a novel technology called system mining that applies data mining techniques to storage systems to improve their performance, dependability and manageability. More specifically, the proposed system hinges on the following innovations: 1) Performance: using frequent sequence mining, clustering, classification and other data mining algorithms to characterize storage access patterns and infer data semantics for guiding storage cache management, prefetching, disk scheduling, and data layout to maximize storage performance; 2) Dependability: applying outlier analysis, signature analysis and other data mining techniques to unified, correlated activity logs to detect and correct storage administrators' mistakes and other human errors; Manageability: building a context-aware, "self-maturing" autonomic storage system that can learn from storage administrators and automatically generate administrative scripts to gradually minimize administrators' involvement.

View original record on NSF Award Search →