Neuro-AI, AI Safety, and Mental Health: Common Principles of Understanding in Brain Networks and AI Systems

$111,497ZIAFY2025MHNIH

National Institute Of Mental Health

Investigators

Abstract

In this first project period, we have collaborated with AI groups to synthesize concepts and approaches from mechanistic interpretability and those from neuroscience. As an example, we have examined a set of observational methods from AI and connected them to their neuroscience counterparts: Feature Visualization: Generating images that maximally activate particular neurons or layers. Neuroscience analogy: Receptive field mapping. Neuroscience has a history of optimizing stimuli to maximally activate neurons. As just a few examples, in inferotemporal cortex where neurons respond to complex items, Gross et al. identified 'toilet brush neurons' [1], Quian Quiroga and Koch identified [2] cells responding highly to Jennifer Aniston and Brad Pitt and sometimes the two of them together, and Tanaka [3] used systematic mapping to find maximal activation. Stimulus optimization methods have been applied across sensory systems, in auditory and visual cortex, and for movements and cognitive properties. Activation Maximization: Optimizing inputs to maximize the activation of specific neurons or features. Neuroscience analogy: This has been used less widely in neuroscience, because of the lack of methods to record input to neurons within the brain. Some heroic experiments have been done to determine which inputs produce a particular response. These often use white-noise methods that generate a large set of driving input and then use dimensionality reduction to find kernels that describe the input. Linear Probes: Using linear classifiers to test for linearly decodable information in representations. Neuroscience analogy: This is a time-honored method in neuroscience, where it is typically called ideal observer decoding. Thousands of papers have fit a variety of statistical models to recorded neural data to estimate information content and the structure of the neural representation. Circuit Tracing: Integrating gradients along a path. Neuroscience analogy: No prominent corresponding neuroscience methods. Neuroscientists have sometimes but not often measured gradients of neural dynamics, but this approach is similar to tuning curve estimation, parameterizing neural responses as a function of input variation. Representational Similarity Analysis: Comparing similarity structures between different layer representations. Neuroscience analogy: At the single neuron level, this is receptive field mapping, or in general comparing responses of neurons across brain areas. At a population level, this is a relatively new approach in neuroscience but one that is gaining momentum. There it is similarly named representational similarity or representational geometry. As the project continues, we expect to further connect neuroscience and AI safety approaches, including using our two-photon cell-specific stimulation methods in brains to understand brain network function, applying analysis and conceptual methods from AI safety or mechanistic interpretability. We believe this will lead to an increased and deeper understanding of brain function and mental disease and ultimately will help design treatments for diseases of wiring, from schizophrenia to the cognitive and memory symptoms of Alzheimerâs disease. References 1. Gross, C.G., Rocha-Miranda, C.E., and Bender, D.B. (1972). Visual properties of neurons in inferotemporal cortex of the Macaque. J. Neurophysiol. 35, 96â111. 2. Quian Quiroga, R., Kraskov, A., Mormann, F., Fried, I., and Koch, C. (2014). Single-cell responses to face adaptation in the human medial temporal lobe. Neuron 84, 363â369. 3. Tanaka, K. (1996). Inferotemporal Cortex and Object Vision. Annu. Rev. Neurosci. 19, 109â139.

View original record on NIH RePORTER →