GENCODE: comprehensive gene annotation for human and mouse
European Molecular Biology Laboratory, Heidelberg
Investigators
Linked publications & trials
Abstract
PROJECT SUMMARY/ABSTRACT The GENCODE project is focused on the generation, expansion, and maintenance of high-quality reference gene sets for human and mouse, which are critical for genomic research. Our work will prioritise the continued refinement of the reference gene sets, incorporating long-read sequencing data, ribosome profiling, constraint data, and draw on collaborative efforts such as the MANE project. This will ensure that the annotations remain comprehensive and accurate, addressing both existing and novel genes while incorporating evolutionary insights and functional data. Additionally, we aim to expand GENCODE's scope by expanding our annotations to the human pangenome. With the completion of a draft human pangenome and first T2T (telomere-to-telomere) human genome, the genomic landscape has changed significantly, highlighting the need for accurate gene annotation across diverse human haplotypes. GENCODE will apply manual annotation and semi-automated approaches to ensure that the genetic diversity represented in the pangenome is accurately annotated across different haplotypes. This will be critical for the long term goal of clinical adoption of the pangenome. We will also significantly enhance the annotation of non-canonical open reading frames (ncORFs) using ribosome profiling (Ribo-seq). These regions, often overlooked, may play significant regulatory or functional roles in disease and cellular processes. GENCODE will focus on cataloguing ncORFs across different cell types. We will analyse these ncORFs from the perspective of both evolutionary and human variation constraint and to highlight both conserved ncORF which may be functional and those potentially involved in disease. Lastly, GENCODE data will be made more AI-ready and FAIR-compliant (Findable, Accessible, Interoperable, and Reusable). By improving metadata and providing machine-readable datasets, we will maximise the utility of GENCODE data for both academic research and clinical applications. We are committed to community collaboration and feedback, ensuring that the data produced aligns with the diverse needs of genomic consortia, industry, and the broader research community.
View original record on NIH RePORTER →