GGrantIndex
← Search

III: Small: Simultaneous Decomposition and Predictive Modeling on Large Multi-Modal Data

$489,323FY2010CSENSF

University Of Texas At Austin, Austin TX

Investigators

Abstract

Several modern data mining applications involve predictive modeling on large amounts of multi-relational data with added structures such as product hierarchies or social networks among customers. The broad goal of this proposal is to develop a comprehensive framework for predictive modeling on large, heterogeneous, multi-relational data based on "Simultaneous Decomposition and Prediction" (SDaP) approaches that iteratively partition the problem into more homogeneous and manageable pieces while concurrently building multiple predictive models, one for each piece. Such approaches lead to simpler and more accurate solutions. The proposed algorithmic strategies that determine how many models to learn and where they should apply, which data to discard and which to keep, how to learn multiple related tasks defined on multi-modal data, and how to scalably implement the solutions on distributed computers, provide practical solutions to certain real-world problems for which current learning and data mining techniques are severely lacking. Application domains of ecology, bio- informatics, market research and web mining are specifically identified and targeted. There are two broad research impacts of the proposed project: (a) it further vitalizes the research in data mining towards better algorithms for predictive modeling on rich and heterogeneous multi-modal data, and (b) provides and promotes the SDaP approach as a fundamental data analysis tool across multiple disciplines. The PI will organize a workshop and offer a tutorial at major data mining conferences to foster and promote research on various aspects of SDaP analysis. Moreover, the curated complex datasets and software developed under this project will be shared with the scientific community via a public web site as part of the proposed one-of-a-kind multi-relational data benchmarking facility. The PI will further develop a novel graduate course on Modeling and Analysis of Complex Data. Outreach modules that illustrate data analysis concepts and capabilities at levels appropriate for pre-college students will also be developed. For further information see the project web site at the URL: http://www.ideal.ece.utexas.edu/projects/sdap/

View original record on NSF Award Search →