SBIR Phase II: Automatically Generating Domain Specific Structured Ontologies for Video
Vidrovr Inc., New York NY
Investigators
Abstract
The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project will be to enable more specific and granular search through a large array of different types of video content, unlocking the information within the world's video archives and live stream video content. Countless hours and productivity in corporations are lost when employees search for a small, specific video clip within a larger video asset to meet their business needs, costing institutions valuable time and money. Through the enhanced discoverability within videos that the results of this project will provide, many industries such as education, media, online gaming, and others will benefit from enhanced efficiencies in publishing and video search results better reflecting user intent. The benefits of this project are not confined to enterprises. Society at large can benefit from a greater accessibility of video within archives that were previously unavailable, which can be used to create a society better-informed on world events and develop education and skills as the world's video archives that were once undiscoverable by the general public are made accessible. This Small Business Innovation Research (SBIR) Phase II project develops a video understanding framework and knowledge graph to better enable video search and discovery for enterprise video archives and live stream video. Leveraging multimodal data and domain specific structured ontologies provided by content creators and data maintainers, this project proposes to develop four new machine learning and computer vision technologies. Firstly, a webly supervised content-based retrieval system for video will be created in order to build classifiers without being provided annotated training data, effectively solving the video classification data cold start problem. Second, the project will build a multimodal ontology mapping system to enable mapping semantic concepts external and novel to our current technologies. Furthermore, to provide the ability to understand and formalize object relationships temporally, an Allen interval algebra module for video will be developed; combining it with knowledge relational learning in order to learn temporal relationships between objects that comprise its knowledge graph. Finally, to qualify the action relationships between people/objects in the developed knowledge graph, a holistic video modeling approach will be developed and applied to action/event recognition and other tasks, such as captioning or titling, expanding a key recognition capability to complete the knowledge graph. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →