III: Small: Query-By-Sketch: Simplifying Video Clip Retrieval Through A Visual Query Paradigm
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
This project addresses the growing demand for analyzing movement patterns in videos across diverse applications such as sports analytics, wildlife tracking, urban planning, and autonomous vehicle development. For example, analyzing vehicle trajectories from surveillance videos is essential for improving traffic safety. This project introduces a novel method for querying movement patterns in videos, enabling users to sketch events of interest on a canvas. The main innovation lies in accurately and efficiently matching free-form sketches to real-world trajectories, overcoming challenges posed by ambiguous user intent and variations in perspective, orientation, and camera movements. Consider a user describing a left-turning vehicle event as a 90-degree angle from a top-down perspective; in practice, the turning angles may appear different on video due to varying camera positions relative to vehicles. The project will lead to an open-source video database featuring a sketch-based query interface, making the analysis of movement patterns in videos more accessible and accurate. Research findings will be disseminated through publications at top conferences and incorporated into new database courses at Georgia Tech, as well as research classes for Atlanta-area high school girls interested in pursuing computing careers. Video retrieval from trajectory queries has been explored by the database and machine learning communities using SQL-like and natural language interfaces, but they face limitations due to high query specification time or poor generalizability to unseen videos. This project seeks to address these challenges by introducing a novel visual query paradigm that enables users to sketch exploratory trajectory queries in video analytics through drag-and-drop actions. The project is structured around two research thrusts. The first focuses on developing a human-in-loop similarity search framework that leverages active-learning techniques to solicit user feedback. This process aims to clarify user intent in query specifications and address inaccuracies inherent in human sketching. Domain-specific knowledge will be incorporated as additional predicates in the pre-processing and post-processing stages of similarity search to further enhance retrieval efficiency and quality. The second thrust develops an end-to-end machine learning model that learns a robust similarity measure between user-drawn sketches and trajectories in real-world videos, accounting for variations in camera angles and movements. It will address the lack of diverse and labeled datasets for video retrieval from trajectory queries by developing a self-supervised learning framework based on trajectory simulation. Overall, this project will leverage database-style optimization to reduce both user effort and computational resources required for utilizing vision models in exploratory video analytics, which will help expand the adoption of video analytics. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →