VEC: Medium: Large-Scale Visual Recognition: From Cloud Data Centers to Wearable Devices

$960,000FY2015CSENSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Jason Mars Kevin P Pipe Lingjia Tang Thomas Wenisch

Abstract

Advances in computer hardware and software promise to revolutionize the ways in which society interacts with visual information. However, visual recognition systems are limited by the lack of a practical means to classify the millions of concepts that arise in visual scenes and thus efficiently recognize when a small number of these concepts appear in a given scene. Furthermore, while real-time processing of visual data could significantly expand our perception of our surroundings, state-of-the-art vision systems cannot currently be implemented on wearable devices such as smartphones due to the limited heat dissipation (e.g., no fans or liquid cooling) and power such devices can provide. This research will overcome these challenges by developing artificial intelligence (AI) systems that efficiently manage the resources most crucial for high-performance wearable-based visual recognition, including the wearable device's real-time power consumption and computation. These systems will be empowered to initiate bursts of intense computation that are thermally managed by materials within the wearable device which are engineered to melt during heavy heating and solidify between bursts. Moreover, the AI systems will govern the communication between the device and external (cloud-based) computation resources as well as large-scale visual concept databases housed in data centers, thus providing extreme performance in a wearable form factor. Central concepts of this work will be integrated in undergraduate and graduate coursework, and a demonstration system will be made available to the research community and used in educational modules for high school students. This effort seeks to advance the core capabilities of large-scale visual recognition by co-designing visual models and computing infrastructure. The goal is to enable encyclopedic, real-time visual recognition through seamless integration of visual computing on wearable devices and in the cloud. The PIs envision a wearable visual recognition system that continuously captures live video input while providing intelligent, real-time assistance through automatic or on-demand visual recognition by means of a combination of computation at the device and offloading to the cloud. Such a system is not currently feasible due to a number of fundamental challenges. First, the severe energy and thermal constraints of wearable devices render them incapable of performing the intensive computation necessary for visual recognition. Second, it remains an open question how to support encyclopedic recognition in terms of both visual models and data center infrastructure. In particular, it remains unclear how current visual models, although highly successful at recognizing 1,000 object categories, can scale to millions or more distinct visual concepts. Moreover, such an encyclopedic visual model must be supported through data center infrastructure, but little progress has been made on how to build such infrastructure. This project addresses these fundamental challenges through an interdisciplinary approach integrating computer vision, hardware architecture, VLSI design, and heat transfer. The PIs will investigate three research thrusts. In Thrust 1, the PIs will develop a new type of deep neural networks that allow resource-efficient execution of modules. This new framework provide a unified way to design, learn, and run scalable visual models that can maximize the utility of recognition subject to resource constraints, such as latency, energy, or thermal dissipation of a wearable device. In Thrust 2, the PIs will design and fabricate a visual processing chip capable of computational sprinting (bursts of extreme computation well above steady-state thermal dissipation capabilities), leveraging the new framework developed in Thrust 1. In Thrust 3, the PIs will design datacenter infrastructure that supports large-scale hierarchical indexing of visual concepts for encyclopedic recognition, with a focus on latency, throughput, and energy efficiency. Finally, the PIs will build a demonstration system to evaluate the proposed algorithms, software, and hardware components and to assess the overall performance of an end-to-end system. The project web site (http://mivec.eecs.umich.edu/) will provide access to the results of this research including technical reports, datasets, and source code.

View original record on NSF Award Search →