CIF21 DIBBS: The Data Exacell
Carnegie Mellon University, Pittsburgh PA
Investigators
Abstract
The Pittsburgh Supercomputing Center (PSC) will carry out an accelerated, development pilot project to create, deploy and test software building blocks and hardware implementing functionalities specifically designed to support data-analytic capabilities for data intensive scientific research. Building on the successful Data Supercell (DSC) technology which replaced a conventional tape-based archive with a disk-based system to economically provide the much lower latency and higher bandwidth data success necessary for data-intensive activities, PSC will implement and bring to production quality additional functionalities important to such work. These include improved local performance, additional abilities for remote data access and storage, enhanced data integrity, data tagging and improved manageability. PSC will work with partners in diverse fields of science, initially chosen from biology, astronomy and computer science, who will provide scientific and technology drivers and system validation. The project will leverage current NSF/CI investments in data analytics systems at PSC. Those investments include DSC, Blacklight (an SGI UV1000 with 2×16TB of hardware-enabled cache-coherent shared memory), and Sherlock (a YarcData ?Urika? graph-analytic appliance which also supports a globally accessible shared memory), both very capable for data analytic applications. Their tight coupling to the pilot storage system will allow synergistic development of analytical capabilities with development of increasingly sophisticated mechanisms for data handling. Working with the new, multi-petabyte data store, they will constitute a system specifically optimized for data intensive work as contrasted with conventional HPC systems. Blacklight will be upgraded with more powerful technology, specifically architected to satisfy the more demanding needs of data analytics in years 3,4. When successful, PSC will engage the NSF to consider larger-scale deployment aiming at exascale capacity.
View original record on NSF Award Search →