GGrantIndex
← Search

CAREER: Coding for Composite DNA Storage

$430,066FY2023CSENSF

University Of California-Irvine, Irvine CA

Investigators

Abstract

DNA, with remarkable properties of high density and stability, is an appealing storage medium. The DNA storage system involves encoding the binary information to DNA strands, synthesis (producing DNA molecules), storage, sequencing (reading DNA strands), and decoding the binary information. Among them, synthesis is a major bottleneck for its high cost, which is approximately $0.10 per nucleotide position. Conventionally, information is represented by the four types of DNA nucleotides, namely, adenine (A), cytosine (C), guanine (G), and thymine (T); hence synthesis requires at least $0.05 per bit of information. An emerging technology called composite DNA storage, on the other hand, represents information by probabilities of all four types of nucleotides, leading to higher information density and lower synthesizing cost per bit by several orders of magnitude. This project investigates coding for composite DNA storage that seeks efficient methods to represent binary information by composite DNA letters so as to enable cost-effective, reliable, secure, and large-scale information storage. The activities in this project will be an important step towards the ubiquitous application of DNA storage in a broad range of applications. In composite DNA storage, a composite DNA letter constitutes a mixture of all four standard nucleotides (A, C, G, T) in the same position among multiple strands obeying a predetermined probability mass function. In this project, the fundamental limit on the information density and the corresponding achievable schemes will be established considering the choice of the composite DNA symbols and the various factors of the storage channel. Moreover, information-theoretic security will be explored to combat adversarial attacks. As a unique feature, secret information can be potentially decoded by reading only the mixed DNA strands of multiple storage vessels, instead of reading all vessels for standard DNA. In addition, the identification of a given composite DNA strand from a storage vessel containing many composite strands will be inspected in order to facilitate large-scale storage. Finally, the theoretical schemes will be simulated and verified by writing software tools and conducting experiments. Accordingly, theoretical models including channel characteristics and composite DNA symbol choices will be further improved. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →
CAREER: Coding for Composite DNA Storage · GrantIndex