CDI-Type I: Collaborative Research: Supervised Learning in Molecular Classifiers
Columbia University, New York NY
Investigators
Abstract
Inborn diseases of steroid metabolism are detectable at birth and treatable with low-cost medicine. They are characterized by a gross increase of a specific steroid or set of steroids in the urine of affected infants. At present, however, there is no cost-effective method for screening for all such diseases. Even in developed countries, screening for only one subtype of only one such disease (congenital adrenal hyperplasia) is considered cost-effective. In this CDI project, chemical sensor arrays are being developed that are capable of cheap, powerful, and reliable screening for diseases of steroid metabolism. The arrays use oligonucleotide-based receptors known as three-way junctions (TWJs). A systematic procedure for chemical sensor array design is used, covering the phases of sensor synthesis, feature (sensor) selection, training data collection, and classifier design and analysis. The TWJ acts as a scaffold for sensor design, allowing thousands of variations, each with a different selectivity for small molecules such as steroids. Comprehensive characterization of thousands of sensor responses is made possible with microchips that can synthesize up to 90,000 sensors at fixed locations. Wrapper-based feature selection approaches are used to find small, high-quality sensor subsets from these thousands. Diagnostic decisions require detecting and quantifying gross increases in concentrations of particular indicative steroids. These concentration changes must be detected in the presence of small concentrations of other steroids, and, owing to differences in kidney filtrations, samples may occur over a range of overall dilutions. This requires mixed classification/regression inference algorithms capable of working over a range of input concentrations. TWJ sensors have non-linear responses to concentration, and non-additive signals for analyte mixtures, and this requires new approaches to chemical sensor array analysis and classifier design. Lastly, new wrapper-based concentration coverage procedures are being developed to ensure accurate representation of sensor response profiles in training data while minimizing the number of measurements needed. The vast majority of newborns in developing countries are not screened for inborn illnesses of steroid metabolism; even in the US the coverage is not complete. Current methods are precise but disease-specific, expensive, and impractical outside a modern hospital. TWJ sensor arrays will be cheap, stable, and reliable; they are powerful enough to test for many steroid metabolic diseases simultaneously, and can identify new diseases via anomaly detection. They will have the potential to be deployed in the field, resulting in cost-effective screening of many rare diseases in developed countries, and, for the first time, cost-effective screening in the rest of the world as well.
View original record on NSF Award Search →