CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms

$174,671FY2024CSENSF

Clark University, Worcester MA

Investigators

Abstract

Data stream monitoring relies on measurement functions that extract useful information from raw data items and pass this information to application software -- web services, e-commerce, stock trading, social networks, distributed sensing, and more. The past two decades have seen the development of many individual streaming algorithms that meet specific measurement requirements, such as determining item frequencies and identifying heavy hitters, super spreaders, anomalies, large deviations, persistent patterns, etc. However, new algorithms must be continuously designed to meet new demands from modern data streaming applications. This project charts a different course with a holistic, systematic approach that explores the design principles of fundamental streaming algorithms (e.g., Bloom Filters) and establishes several common frameworks that can be used to develop a new and highly flexible, future-proof family of streaming algorithms. This project seeks to design a one-size-fits-all family of streaming algorithms that can be easily reconfigured for different measurement tasks or multiple combined tasks. For example, the proposed streaming algorithms will be able to identify malicious intrusions and anomalies in enterprise networks, and at the same time rank the popularity of online content (e.g., social media posts, product pages) to provide guidance for business decisions. Real-world data streaming applications in large data centers must deal with heterogeneous, and often limited, resource availability. The algorithms explored in this work are designed to be lightweight. In a security context, they might be used to monitor and secure a network without interfering with other applications. Further, users / applications will also have options for multi-dimensional tradeoffs among performance, overhead, resource usage, and accuracy. For example, users will be able to dynamically commit more memory to quickly identify and prevent malicious users more accurately. The advances will impact the applications and measurement functions that are built upon them. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →