EAGER: Parallel Semi-supervised Machine Learning for Volumetric Datasets

$99,998FY2017CSENSF

University Of Florida, Gainesville FL

Investigators

Anand Rangarajancontact Sanjay Ranka Sivaramakrishna Balachandar

Abstract

Machine learning and parallel computing have come of age. Rapid advances have been made in this decade in the automated recognition of objects and faces in images with "deep learning" almost becoming a household term at the present time. However, a clear limitation in most of the present work is the restriction to two dimensional datasets such as images and the like. When the focus shifts to large three dimensional datasets such as 3D medical imaging, fluid dynamics simulations, remote sensing and electron microscopy, the problems become more difficult by several orders of magnitude. The automatic labeling of three dimensional structures in large datasets requires a comprehensive integration of machine learning and parallel computing with a "from the ground up redesign" to be efficient, accurate and capable of scaling to ever larger volumes. The benefits to the engineering communities and society at large are clear. Successful execution of this project will enable experts in a variety of disciplines in which three dimensional data are generated to efficiently perform large scale automated labeling of structures of interest like the hippocampus in brain images or vortices in fluid dynamics simulations. Students trained in this nexus of machine learning and parallel computing will be capable of making their own contributions ranging from academic research to commercialization. Finally, the software suites generated by this project should play a role in the formation of vertically integrated enterprises. Volumetric applications require the development of novel, efficient and scalable machine learning algorithms as existing approaches are computationally intensive and are limited to small size images/video. Volumetric data require that approaches classify homogeneous regions into single categories while maintaining clear-cut region boundaries between classes (urban versus forest for example in terrain classification). To this end, new methods are developed for extracting supervoxels from volumetric datasets, using three dimensional filters, nonlinear dimensionality reduction and Hamilton-Jacobi or Schrodinger geodesic solvers. Next, deep learning principles which have resulted in automated feature extraction and discriminative convolutional filters must be adapted to work on volumetric data. Consequently, the integration of supervoxels and deep learning is central to the proposed work. Very limited expert interaction is permitted since the volume of the datasets is too large, therefore calling for semi-supervised learning approaches. The integrated machine learning and parallel processing software suite created by this project will be disseminated using software management repositories and open source licensing. In summary, the intellectual merit lies in the careful integration of semi-supervised learning, volumetric supervoxel driven segmentation, deep learning and parallel processing.

View original record on NSF Award Search →