SBIR Phase I: A Cloud-Based Service for Audio Access to News and Blogs
Agivox, Inc., Sunnyvale CA
Investigators
Abstract
The innovation improves access and discovery of online written content when content is automatically converted to an audio format such as mp3. In general, synthesized audio from random written text often delivers a poor listening experience. The technical effort is motivated by the absence of applications that provide a user preferred news articles and blogs with high quality synthesized audio that is phonetically correct for the visually impaired person or the multitasking visually busy person such as a car driver. This work uses techniques such as textual processing motivated by text understanding and content analysis by domain knowledge and machine learning. Machine learning techniques are used to improve speech synthesis and to incorporate auto-discovery of user preferences into listenable news. Since content scanning by listening is a slower process than visually scanning for relevant responses, this technical work will improve this auditory search process by combining user input with information retrieval for a smoother user experience. The resulting technology infrastructure is expected to provide an array of compelling commercial products with far-reaching implications. The broader/commercial impact of this technical work comes from the cloud-based infrastructure that can process online written text into high-quality audio. This cloud software has advantages of unlimited storage and computing capacity, and uses this to support content retrieval, machine learning, text preprocessing, content discovery, natural language processing, and interaction with commercial Text-to-Speech servers. The first version of this technology will focus on news and blogs, a sufficiently large corpus of information that provides a challenge while also providing considerable commercial interest. The cloud infrastructure can support a range of client-side applications that work on smart-phones, tablets, and desktops. These applications will have access to high quality synthesized audio useable in an "eyes-busy" situation including the low-vision community. The apps will provide customizable access to user-preferred content via intelligent information retrieval. While there is commercial potential in such client applications, the greater value is from licensing the server technology. The societal impact of such a product is tremendous since neither the blind community nor the general public have such easy listening access to the large corpus of online content that is curated with user preferences and with an application control mechanism that is entirely via voice and finger gestures.
View original record on NSF Award Search →