Transforming Analytical Learning in the Era of Big Data: A Summer Institute in Biostatistics and Data Science
Yale University, New Haven CT
Investigators
Abstract
The emergence of artificial intelligence and multi-modal data streams are taking us far beyond a structured, rectangular data matrix (as described in classical statistics courses), underscoring the importance of harnessing information from heterogeneous data sources and turning them into actionable knowledge. Building an intellectually dynamic and computationally mature workforce in data science is more important than ever. We propose a six-week long undergraduate summer institute in Biostatistics and Data Science: âTransforming Analytical Learning in the Era of Big Dataâ to be held in person at the Department of Biostatistics, Yale School of Public Health with a group of approximately 25 undergraduate students from 2025-2026. The program builds on the success of past iterations of the summer program known as Big Data Summer Institute (BDSI) at the University of Michigan supported by a NIH BD2K Courses and Skills grant award (2016-2018) and a SIBS award from NHLBI (2019-2024). Over the past nine years we have trained 357 undergraduate students. Of the students who have finished their undergraduate degree, approximately 60% have pursued graduate education in a relevant discipline and 47 have already enrolled in a relevant graduate program at the University of Michigan. We propose to transfer the last two years of the ongoing award to Yale University as the PI and founding director of the program, Bhramar Mukherjee has recently moved to Yale University. The structure of the program remains almost identical though the location and personnel will change. We plan to expose program students to techniques, skills and problems at the intersection of Big Data and Human Health. We primarily focus on four genres of health Big Data arising in Electronic Health Records, Genomics, Infectious Disease Epidemiology and Imaging. The mentored research projects will be defined primarily in cardiovascular and infectious diseases in collaboration with clinicians and public health scientists. The trainees will be taught and mentored by a team of interdisciplinary faculty from Biostatistics, Statistics, Biomedical Data Science, Computer Science and Engineering, Epidemiology, Social Sciences, and Medicine, reflecting the shared intellectual landscape needed for Big Data research. At the conclusion of the program there will be a capstone symposium showcasing the research of the students via poster and oral presentation. There will be lectures by Yale researchers, outside guests and a professional development workshop to prepare the students for graduate school. Along the way, students are expected to form lasting bonds over shared research experiences and social activities. The program has strong institutional support from multiple units and centers on Yale New Haven campus and leverages the cross-disciplinary intellectual richness of Yale University and Yale College. The students will reside in Benjamin Franklin College, one of the newest Yale residential colleges. The resources developed for the summer institute, including lectures, assignments, projects, template codes and datasets will be freely available through a public facing course webpage and a YouTube channel so that this format can be replicated anywhere in the world. This democratic dissemination plan will lead to access of teaching and training material in this new field of health data science across the world. The overarching goal of our summer institute in big data is to recruit and train the next generation of data scientists using a non-traditional, active learning paradigm and engage them in influential research related to human health. We aspire to teach, mentor, grow undergraduate trainees in ways that will shape their vision for a career in data science. Our goal is to create an inspiring educational experience that will have a transformative impact on the future career trajectories of our trainees. Our long-term objective is to create a skilled and motivated research workforce to handle some of the pressing challenges in biomedical big data.
View original record on NIH RePORTER →