CICI:IPAAI:CANIS: Curated AI-ready Network telescope datasets for Internet Security
University Of California-San Diego, La Jolla CA
Investigators
Abstract
Researchers rely on Internet Background Radiation (IBR) data to detect a wide range of malicious activities and cyber threats. However, as the volume and complexity of Internet activity grow, ensuring the integrity, authenticity, and provenance of this data becomes increasingly challenging, especially when deploying advanced machine learning and artificial intelligence (ML/AI) techniques that depend on high-quality input data. To overcome these challenges, the CANIS project is developing and deploying a new monitoring framework to safeguard the integrity of cybersecurity research workflows. The framework results in AI-ready datasets that accelerate the development of ML/AI techniques for cybersecurity and enable advances in anomaly detection, threat intelligence, and attack mitigation. The framework combines active internet measurement with data from the University of California San Diego's Network Telescope (UCSD-NT), a long-standing NSF-funded scientific cyberinfrastructure that supports the collection of unsolicited IPv4 traffic. The framework sends beacon packets from globally distributed vantage points, and combines this signal with traffic generated by known Internet scanning campaigns to continuously verify the fidelity of the data collected by the UCSD-NT. To support ML/AI-based cybersecurity applications, this project disseminates IBR data in AI-ready data formats that contain labels and rich metadata. These resources facilitate model training, benchmarking, and evaluation. The datasets also serve as valuable resources for cybersecurity and AI education, helping to train the next generation of experts. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →