A Framework for Exploring Data in Heterogeneous Sensor Networks

$150,000FY2015ENGNSF

University Of Texas At Arlington, Arlington TX

Investigators

Abstract

Heterogeneous sensing systems consisting of sensors with different types of sensing and communication capabilities offer flexibility and facilitate the acquisition of different views of a monitored field by acquiring different types of measurements. The main challenge is that the often large amount of acquired raw sensed data does not provide any clue of what lies beneath the sensed field. The acquired sensor data contain information about different and unknown in number phenomena of interest. The vision of this project is the development and analysis of an algorithmic framework that has the ability to learn the unknown structure of a monitored field and enable the mining and exploration of information in heterogeneous sensor data. The techniques developed in this project will enable learning of the sensed field while effectively reducing the usually large amount of sensor data that need to be processed by removing irrelevant and non-informative sensor measurements. This research project will introduce benefits in a wide span of areas including analysis and mining of ecological and climatic data. The proposed techniques will be capable of analyzing heterogeneous sensor data sets with different information content and adhering to different statistical models. It will pave the way for efficient learning schemes that extract the informative portion of the sensed data, and remove irrelevant or noisy sensor measurements that can hurt any subsequent data processing/inference task. The project focuses on the development of general algorithms that have learning capabilities and can identify different informative portions, in a data vector sequence, which may adhere to different data models. The task of clustering data into groups that contain information about different sources is translated into a sparsity-aware canonical correlation analysis formulation. Building on this link, a generalized toolbox is proposed to address settings where the number of information sources and their statistical behavior is unknown. The proposed framework has the potential to uncover multiple information data clusters that may exhibit different statistical behavior, and separate them before applying standard statistical inference tasks. Different from the notion of outliers which are present in a small number of data, the project involves the design of a novel framework that allows the extraction and elimination of corrupted data affecting an arbitrary in size data portion. The idea is to exploit the low-dimensionality that informative data usually have, and identify corrupted data by controlling the rank of the corresponding covariance matrix to be in agreement with the low dimension of the informative data. Distributed optimization tools are also explored to deal with high-dimensional and spatially scattered data.

View original record on NSF Award Search →