SBIR Phase I: Automatically Generating Domain Specific Structured Ontologies for Video

$225,000FY2016TIPNSF

Vidrovr Inc., New York NY

Investigators

Abstract

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase 1 project is to make video searchable and discoverable in a cost-effective manner by automating video annotation tasks that are currently done manually. Video is being created at an increasingly high rate, and media companies are becoming overwhelmed by the sheer amount of video in their libraries. Companies have resorted to paying people to manually watch their videos and "tag" them with relevant descriptions so that the content becomes searchable and therefore useable. In this project the company will build an inference engine that can leverage the data and structure within video to discover specific multimodal concepts significant to the particular domain and automatically train and refine classifiers to apply unique and meaningful data describing the video. Leveraging this metadata, companies can index and search their videos more efficiently, enabling them to generate tailored video clips to meet specific goals, and ultimately publish video at a much larger scale. The company believes the unique domain-specific video metadata generated by its system will have an overwhelming effect on the ability of companies to disseminate informative videos, and improve their use of video online. This Small Business Innovation Research (SBIR) Phase 1 project proposes to develop an inference engine that combines the modalities of information in video to automatically discover and train multimodal classifiers for important concepts within specific domains. Current approaches to video understanding seek to develop general visual classification models; these approaches focus on leveraging labeled data to train supervised learning algorithms in order to describe the video. These approaches fail in specific verticals, because the information output is not granular enough to provide value within the context of the specific domain, and the cost for manual annotation is high. The approach proposed in this project takes into account the multimodal information associated with a group of related videos to automatically learn classifiers for concepts that are important in this video domain, without expensive manual annotation. The approach leverages the correlations between the different modalities in video and existing domain-specific structure to intelligently accomplish this task. The company expects this research to lead to the discovery of many new domain-specific video concept classifiers that do not exist in current visual ontologies, and a reusable approach for training visual classifiers that can be applied across many domains.

View original record on NSF Award Search →