EAGER: FODAVA: Spectral Analysis for Fraud Detection in Large-scale Networks
University Of North Carolina At Charlotte, Charlotte NC
Investigators
Abstract
Abstract This project takes a unified spectral transformation approach to address challenges of analyzing network topology and identifying fraud patterns in large-scale dynamic networks by using data spectral transformation with network topology visualization. Large-scale social and communication networks contain rich topological information embedded inside, in addition to various structured, semi-structured, and unstructured data. There has been little to date work dedicated to exploring network topology, especially from the spectral analysis point of view. If the proposed methods, which are based on the simple adjacency matrix representation of a graph and the node representation based on k communities in the graph, can be demonstrated to work for very large data sets, it will be a significant advance. The research is being integrated with information visualization and visual analytics algorithms and has a testbed of banking data available to allow for a search for fraud. The research is characterizing patterns of various attacks in the spectral projection space and developing spectrum based methods to identify these attacks. The approach, which exploits the spectral space of the underlying interaction structure of the network, is orthogonal to traditional approaches using content profiling. The ability to perform this spectral analysis is dependent upon the development of complex mathematical techniques. Critical issues that are being explored include the scalability of the methods to very large data sets and the determination of the dimensionality of the node representation in spectral space (which depends upon the number of clusters in the graph). Another issue is that each component of the k-dimensional representation of each node is interpreted as the 'likelihood' of a node's attachment to the k communities. However, it must be guaranteed that the components of the k-dimensional vector that represent each node will be all nonnegative or else an interpretation of the negative number as 'likelihood' must be developed that is mathematically consistent. These and other related mathematical issues are being explored.
View original record on NSF Award Search →