REU Site: Data Science in the Life Sciences, Environmental Science and Engineering
Harvey Mudd College, Claremont CA
Investigators
Abstract
This Research Experiences for Undergraduates (REU) in Data Science in the Life Sciences, Environmental Science and Engineering at Harvey Mudd College will provide nine undergraduate students per year from across the United States with opportunities to learn and apply data science concepts and tools to projects in the (i) biological and life sciences, (ii) environmental science and (iii) engineering and industrial applications. The students can apply to the REU program by ranking their favorite projects and recruitment will consider the applicant's skills and likelihood of having a successful experience as well as attracting students from traditionally underrepresented population groups, including women, African American and Hispanic/Latino students. The participating students will spend a total of 10 summer weeks at Harvey Mudd College (a member of the 5 Claremont Colleges that also include Claremont McKenna, Pomona, Pitzer and Scripps) and will work on their research projects with an experienced faculty member. They will be supported by state-of-the-art infrastructure such as computing facilities, libraries and laboratories. Students will also engage with their peers in a series of data science and professional skill workshops. Social events and educational field trips round out the program. Domain-specific research using data science methods and tools provide the students with important skills that prepare them for graduate studies and are useful for analyzing and solving problems in many disciplines and environments. The professional skill modules, including research ethics, time management and scholarly publishing, will complement and enrich the technical training and further equip students with competencies needed to become well-rounded, successful researchers. As computational capacity continues to expand at a rapid pace, there is an increased need for data science literacy among all scientists and engineers. Exposing the students to relevant concepts and tools in data science through this REU program is aimed at encouraging them to pursue careers in STEM-related fields and helping fill the persistent skill and labor gap in the U.S. by producing graduates who can have an impact on science and technology in the public and private sectors. The research projects that are part of the Harvey Mudd College REU program in Data Science in Life Sciences, Environmental Science and Engineering address new and open problems in different STEM disciplines using computational, mathematical and statistical methods and tools. The participating faculty mentors maintain active research activities in these tracks that offer data and hypothesis-rich environments for exploration and in-depth analysis. Example projects include the analysis of flow cytometry data, modeling of epidemiological and public health data such as surgical cataract coverage in developing countries, spatial data modeling and analysis for health risk assessments related to unconventional oil and gas development, testing hybrid mathematical models in atmospheric chemistry, and developing predictive models for sports coaching such as real-time coaching recommendations based on play-by-play basketball data. These projects involve high-dimensional data analytics and use different techniques to extract insights such as simulated data to validate mathematical models of effective health intervention coverage, multivariate spatial regression and Kriging, time series analysis of cloud-chamber data on air pollution models, and algebraic analysis of partially ranked data coupled with predictive techniques used in machine learning to predict player performance. In addition, students will also receive hands-on training in R, Python, MATLAB, become well versed in Linux command line processing, batch scripting, data movement, use of XSEDE supercomputers, and version control. They will be exposed to big data environments such as Hadoop and Spark, learn to use relational databases and write SQL and PostgreSQL queries, and work with ESRI's ArcGIS to model high-resolution spatial data. While the technical training and participation in real research processes is the main component of the REU program, necessary soft skills for successfully developing and running research programs are taught as well. This includes the group's participation in Harvey Mudd College's successful weekly Stauffer lecture and open lab series as well as a series of custom-tailored workshops involving additional personnel from the College's academic departments and the Writing Center. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →