GGrantIndex
← Search

Novel compression strategies for efficient storage, retrieval and analysis of sequence data

$308,794R01FY2025GMNIH

Temple Univ Of The Commonwealth, Philadelphia PA

Investigators

Abstract

Project Summary/Abstract A post-genomic era is upon us, marked by the rapid availability of massive sequence data. Storing, retrieving, and analyzing such high volumes of raw sequence data, as well as alignment information, pose significant challenges to researchers in the field. Consequently, new methods are desirable to support efficient storage and retrieval of such data to meet the demand of the exponential growth in sequence data. Within this scope, we will develop a suite of novel computational methods for compressing sequence data. Our proposed research includes three specific aims to develop: (1) an error-bounded lossy compressor for quality scores in sequence files, (2) an adaptive compression strategy to achieve optimal performance, and (3) a high-throughput compression framework for efficient sequence compression. We will extensively evaluate the proposed approaches on real and synthetic datasets, and develop modularized software tools to facilitate the application of the proposed methods. With a well-defined research plan for developing innovative methodologies to compress sequence data, this project will establish a new paradigm of data compression to support efficient storage, retrieval, transfer, and analysis of the ever-growing DNA and RNA sequencing data.

View original record on NIH RePORTER →