Attentive knowledge device for visual assistance

$602,198R61FY2025EYNIH

University Of Southern California, Los Angeles CA

Investigators

Laurent Itticontact Vijaykrishnan Narayanan

Abstract

The latest Artificial Intelligence (A.I.) smartphone apps have been a game changer for many persons with visual impairment. Yet, the current workflow of these apps remains cumbersome: snap a picture, ask a question, wait for a few seconds, listen to an (often verbose) answer; repeat. Could a workflow that is more natural, goal-driven, and more synergistic with a visually-impaired person be created? The basic tenet of this proposal is to leverage A.I. in a new way: Instead of being front and center, the A.I. will operate mostly in the background. Instead of only answering user queries, it will also be informed by a collection of machine vision algorithms that collect rich real-time visual information, from 4 cameras mounted on smart eyeglasses with an ultra-wide 270o field of view. The machine vision algorithms will run 100x faster than the A.I. can, to handle the dynamics of the world and of the moving user. The role for the A.I., then, will be to leverage its world knowledge to combine, unbeknownst to the user, contextual situation awareness from real-time vision with high-level tasks and goals specified verbally by the user. Our central working hypothesis is that the A.I. will be able to better assist its user by proactively integrating what the glasses have seen with what the user actually wants. Our team combines expertise in a) computational neuroscience, machine vision, and A.I. (PI Itti), b) micro- electronics and machine vision hardware (co-PI Narayanan), c) human factors, lived experience focus groups, and participant recruiting (GetBraille consultants), d) wearable electronics design and fabrication (Siliconscapes consultant), and e) lived experience partners and participants at the National Federation of the Blind and several other local chapters and communities. In the R61 phase, we aim to: 1) engage our lived experience partners in selecting the most useful algorithms for visual attention, object detection, 3D depth estimation, scene recognition, spatial navigation and mapping, and hand/face/body tracking algorithms. These will provide real-time situation awareness that runs in the glasses (no cloud servers), and is non-intrusive to the user; 2) develop visual-assistive large language models (LLMs) to digest and leverage the machine vision results and help the user achieve an unbounded range of tasks; 3) implement and accelerate the real-time machine vision algorithms onto Field-Programmable Gate Array (FPGA) custom hardware processors to achieve high energy efficiency in a small form-factor. In the R33 phase, we will collaborate with our lived experience partners to create and test affordable, intuitive, and useful smart glasses that can be broadly disseminated ($399; compare to $4,250 Orcam MyEye 3 Pro). We aim to 1) develop miniature, low-power, low-cost implementations of the glasses, 2) engage our lived experience partners in developing a user interface that works best for them, 3) conduct a user study with 5 groups of 50 PVI participants, each R33 year, covering a range of tasks from grocery shopping to cooking to working in a team, 4) address issues of privacy, security, social acceptance, and liability, through focus-groups with experts, the PVI community, and students.

View original record on NIH RePORTER →