GENCODE: the reference human genome annotation for The ENCODE Project
- PMID: 22955987
- PMCID: PMC3431492
- DOI: 10.1101/gr.135350.111
GENCODE: the reference human genome annotation for The ENCODE Project
Abstract
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Figures
Similar articles
-
GENCODE 2021.Nucleic Acids Res. 2021 Jan 8;49(D1):D916-D923. doi: 10.1093/nar/gkaa1087. Nucleic Acids Res. 2021. PMID: 33270111 Free PMC article.
-
The Protein-Coding Human Genome: Annotating High-Hanging Fruits.Bioessays. 2019 Nov;41(11):e1900066. doi: 10.1002/bies.201900066. Epub 2019 Sep 23. Bioessays. 2019. PMID: 31544971 Review.
-
GENCODE reference annotation for the human and mouse genomes.Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773. doi: 10.1093/nar/gky955. Nucleic Acids Res. 2019. PMID: 30357393 Free PMC article.
-
GENCODE: producing a reference annotation for ENCODE.Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925838 Free PMC article.
-
EGASP: the human ENCODE Genome Annotation Assessment Project.Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925836 Free PMC article. Review.
Cited by
-
PNPO-PLP axis senses prolonged hypoxia in macrophages by regulating lysosomal activity.Nat Metab. 2024 Jun;6(6):1108-1127. doi: 10.1038/s42255-024-01053-4. Epub 2024 May 31. Nat Metab. 2024. PMID: 38822028
-
Small RNA sequencing reveals snoRNAs and piRNA-019825 as novel players in diabetic kidney disease.Endocrine. 2024 May 27. doi: 10.1007/s12020-024-03884-3. Online ahead of print. Endocrine. 2024. PMID: 38801599
-
Sex dimorphic response to osteocyte miR21 deletion in murine calvaria bone as determined by RNAseq analysis.JBMR Plus. 2024 Apr 18;8(6):ziae054. doi: 10.1093/jbmrpl/ziae054. eCollection 2024 Jun. JBMR Plus. 2024. PMID: 38784723 Free PMC article.
-
Energy stress-induced circDDX21 promotes glycolysis and facilitates hepatocellular carcinogenesis.Cell Death Dis. 2024 May 21;15(5):354. doi: 10.1038/s41419-024-06743-1. Cell Death Dis. 2024. PMID: 38773094 Free PMC article.
-
PML restrains p53 activity and cellular senescence in clear cell renal cell carcinoma.EMBO Mol Med. 2024 Jun;16(6):1324-1351. doi: 10.1038/s44321-024-00077-3. Epub 2024 May 10. EMBO Mol Med. 2024. PMID: 38730056 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources