I-Corps: Multi-Cue Facial Restoration (McFAR) for Recognition and Identification

$50,000FY2022TIPNSF

Northeastern University, Boston MA

Investigators

Abstract

The broader impact/commercial potential of this I-Corps project is the development of a multi-cue frontal face restoration and identification system that may be utilized by security authorities or deployed on surveillance facilities to synthesize suspect portraits and conduct identity matching with limited information. Recently, the rising adoption of facial surveillance in public safety and security is a vital factor gaining significant tractions in the market. With the threats to national borders, airports, seaports and public transportation hubs, advanced security systems are in high demand. Surveillance camera videos are often recorded and analyzed automatically. Video face recognition performance has been significantly boosted due to the powerful deep learning models but in the real-world, surveillance cameras provide limited resolution and only capture side-views of the target suspects due to limitations of the environment. Recognition with low-resolution, non-frontal faces remains a challenge. Most existing face recognition algorithms assume high-resolution, near-frontal faces yet cannot provide sufficient performance with low-resolution, side-view images. The proposed technology utilizes multiple aspects of information to conduct a more faithful face identification and recognition, which provides a novel option for security applications. This I-Corps project is based on the development of a multi-cue frontal face restoration system that provides a recognition and identification framework to synthesize identity-preserving, high-resolution frontal face images from low-resolution, side-view faces and a narrative description provided by eyewitnesses. The proposed technology utilizes a series of extreme poses of face images via a super-resolution integrated network to synthesize high-resolution frontalized faces. It aims to recognize low-resolution extreme-pose faces with multiple cues provided by the surveillance system as well as witness descriptions. To improve the discriminative ability of learning representation, intra- and inter-class constraints are imposed to penalize redundant features. Instead of employing naive fusion methods, orthogonal regularization is used in a generative model for optimal training and to learn a comprehensive representation of broader spans. Eyewitnesses may provide narrative description of the face that is fed into a language encoder and converted to attribute-level representations. A description-guided face editing network with spatial constraints enables the ability of editing both attribute-level and geometric contents of input images by leveraging transformer-based language encoder for image translation. The face editing network takes the encoded features and modifies the frontalized faces with various attributes to generate refined frontal faces for high fidelity image matching. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →