III: Small: Collaborative Research: Algorithms for Query by Example of Audio Databases

$311,895FY2016CSENSF

University Of Rochester, Rochester NY

Investigators

Abstract

Finding ways to automatically index, label, and access multimedia content (such as audio documents) is increasing in importance as multimedia repositories proliferate and grow. The community-generated SoundCloud repository is one example. It contains recordings of bands, sound effects, podcasts, etc., and contributors upload 12 hours of audio every minute. Repositories like SoundCloud typically tag audio at the file level with short text labels. Text-based search for a desired recording using these labels can be problematic. Text-based search within a track is not possible, since they are not indexed with tags in the body of the file. In this project, investigators at the University of Rochester and Northwestern University aim to develop methods and a system for audio search via query-by-example, where the example is similar, in some key way, to the desired audio in the database, but is not an exact match. This will allow search within files, bypassing the need for text-based tagging. This project will be focusing on using vocal imitations as search keys because they are natural for humans and are widely used in interaction. It will develop a novel search engine for sounds that takes vocal imitations as queries (e.g., imitation of a bird call to find recordings of the bird call). The technology developed for this novel way to search through audio/video collections will also benefit society in numerous other ways, such as crime surveillance (e.g., automated gunshot or scream detection for policy monitoring stations), biodiversity measurement (e.g., automatic ID of bird species that sound "like this" in field recordings), environmental awareness for the hearing impaired (e.g., alert me when my dog is the one barking), a production aid for a movie sound designer (e.g., finding door slam sounds in a database of thousands of sound effects), and sound-based diagnosis (e.g., "your car needs a new starter motor"). The project will benefit science technology engineering and mathematics (STEM) education as audio-based research has been shown to be a successful way to attract diverse college students into STEM disciplines. Vocal imitation conveys rich information covering many acoustic aspects: pitch, loudness, timbre, their temporal evolutions, and rhythmic patterns, etc. This lets a user query for precise sounds that are difficult to search for with text tags. For the same reason, however, vocal imitations may vary from the desired target on many dimensions. The query sound can also lies in a very constrained sound space compared to the sounds to be retrieved, due to the physical constraints of the human vocal system. Building a successful query-by-vocal-imitation system will require research into methods for representing audio and retrieving audio based on queries that are similar to target sounds only on a subset of their measurable dimensions. It will also require interfaces that facilitate providing queries and refining search results in a non-text-based context. For the former, the investigators will research on methods for learning of aspect-specific audio representations using deep neural networks. The investigators will also develop matching algorithms suitable for these representations. The investigators will design novel search interfaces that let users iteratively refine their search results. The system will learn from the interactions and adjust the weightings of different acoustic aspects to search for the wanted sound. Expected outcomes of this research are: (1) audio representations that highlight perceptually relevant features of vocal queries for matching to general audio target sounds; (2) algorithms for matching and aligning vocal queries to general audio; (3) interaction methods for iteratively refining search results using vocal imitations and sound examples; (4) a large vocal imitation and sound dataset; and (5) an open-source sound retrieval system that embodies these outcomes. More information about this project can be found at the project web site (http://www.ece.rochester.edu/projects/air/projects/audiosearch).

View original record on NSF Award Search →