ATD: Activity Aware Bayesian Deep Learning
University Of Virginia Main Campus, Charlottesville VA
Investigators
Abstract
This project will research next-generation deep learning AI models for comprehending objects, activity, and context from overhead imagery and sensor data. This work will use Large Language Models (LLMs) to create machine reasoning that replicates human reasoning at a scale, speed, and complexity unachievable with human analysts. Different types of data collected from satellites and aircraft are currently processed separately, leading to siloed information without context. This research will use LLMs to synthesize multiple data sources with context, location, and time, producing Activity-Aware Deep Learning AI. Current deep learning AI can take pixel-based information in images to object-based (groups of pixels) information, and the current state-of-the-art scene-based (groups of objects) information. This project will enable a new level of activity-based information: what are the objects doing in the scene? what is the broader context? For example, a blue tarp in a US suburb is likely covering an object to protect it from weather, but multiple blue tarps along streets following a natural disaster may be indications of people in makeshift shelters in need of help. The Activity-Aware DL models developed research will comprehend these different situations using LLMs, with no activity-specific training beyond the logic already present in the LLMs. Software developed will be made available as open source, and new editions of textbooks on the area written by the investigator will be released. There are existing models for determining object-based information from sensor data. For example, convolutional neural networks can identify objects in high resolution imagery, and Bayesian models can accurately identify chemical species present on the ground in hyperspectral imagery. Even simple prompts into LLMs including just objects and location can produce activity-aware information. For example, the text prompt “Why are there rows of XYZ military vehicles []?” where [] can be filled in with “outside Location A”, “In Location B”, or “at Location C” will produce different conclusions when put into the ChatGPT LLM without explicit context training. The current project will develop neural network architectures that can translate ‘what is present’ class probabilities from an object-based model, combine this with geospatial information, and generate a text prompt input into an LLM to determine activity and context. Particular attention will be focused on developing prompts that do not generate factually inaccurate output from the LLM. Models will be developed in TensorFlow software to facilitate sharing. Bayesian probabilities will be used throughout the models to provide regularization and interpretability of the models, facilitating future ongoing research in this area. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →