A Genetic Database for Anthropology
Yale University, New Haven CT
Investigators
Abstract
The past decade has seen a virtual explosion in the amount of data available on the human genome. Data on the human genome can be used to help us understand human evolutionary history, including both our relationship as a species to other great apes and much more recent relationships among human populations reflecting their histories. Information derived from our knowledge of the human genome is also relevant to many of the social sciences. Yet, there is little infrastructure available to help provide an interface between data on the human genome and data derived from the social sciences. One area of particular relevance to history, historical demography, linguistics, medical anthropology, forensic anthropology, and ethnic studies, among others, is the gene frequency variation that exists among human populations. Gene frequency variation among current populations is the net genetic effect of all of the factors in the history of those populations: relative endogamy versus exogamy, past and present population sizes, length of time relatively endogamous, origins, selection, etc. Currently there is no centralized location for information on gene frequency variation for the modern DNA polymorphisms detected in the bulk of the human genome. The closest approximation to such a resource is the current implementation of ALFRED, the web-accessible ALlele FREquency Database <http://alfred.med.yale.edu/alfred/index.asp>. This project will further develop ALFRED to serve as a resource for the anthropological genetics field and the other social science areas noted above. The database will contain gene frequency data for multiple genetic loci (both functional genes and anonymous loci) and multiple populations. Each frequency will have detailed descriptions/definitions of both the polymorphic site studied with protocols used as well as the specific sample of the specific population studied. The molecular definition of the polymorphism at the DNA sequence level will be linked to the molecular databases and the description of the population (name, language, location, etc.) will be linked to at least one ethnographic database. Development will involve three aspects: curated on-going accumulation of gene frequency data from the current literature and that of the past 15 or so years, enhancement of the design and implementation (hardware and software) to allow robust and rapid response to queries over the internet, and implementation of interconnections to relevant human genomics databases on the molecular side and to ethnographic and other databases on the population/social science side. The enhanced database will serve many functions. It will be an educational resource at both the undergraduate and graduate levels. It will be an inter-disciplinary research resource providing reference gene frequencies for comparison with new data. It will provide impetus to focus future data collection efforts in diverse labs on those genetic markers that early studies suggest will provide the best information on specific research questions. It will provide entree to the genetics/genomics world via specific social/cultural variables such as language, geographic location, population size, etc. Conversely, it will provide links to relevant ethnologic and historical data on populations that anthropological geneticists need in order to interpret the gene frequency data they collect. As a bridge between the genomics databases and the relevant social sciences the database will provide infrastructure to the increasingly interdisciplinary nature of modern research.
View original record on NSF Award Search →