Comprehensive Analysis of 'GENCODE' Annotations and Splice Site Motifs
Assign intron donor and acceptor splice sites consensus
Calculate GC Content of Genomic Features
Classify Exons by Their Relative Transcript Position
Compare Annotation Counts Between Two GENCODE Releases
Convert Data Frame to FASTA File
Eliminate Redundant Genomic Elements
Extract Coding Sequences (CDS) from GTF Annotations
Extract Genomic Elements by Strand
Extract Intron Coordinates from GENCODE Annotations
Identify Single-Exon Genes/Transcripts in GENCODE Data
Extract Splice Site Motifs for MaxEntScan Analysis (5' or 3')
Identify Potential Cryptic Splice Sites.
Download GFF3 File from the GENCODE Database
Download GTF File from the GENCODE Database
Get the Latest Gencode Release Dynamically
Load a GTF or GFF3 file from GENCODE as a data frame.
Calculate Spliced Transcript Lengths
Generate Summary Statistics for Genomic Elements
Tiny example GTF files
A comprehensive suite of helper functions designed to facilitate the analysis of genomic annotations from the 'GENCODE' database <https://www.gencodegenes.org/>, supporting both human and mouse genomes. This toolkit enables users to extract, filter, and analyze a wide range of annotation features including genes, transcripts, exons, and introns across different 'GENCODE' releases. It provides functionality for cross-version comparisons, allowing researchers to systematically track annotation updates, structural changes, and feature-level differences between releases. In addition, the package can generate high-quality FASTA files containing donor and acceptor splice site motifs, which are formatted for direct input into the 'MaxEntScan' tool (Yeo and Burge, 2004 <doi:10.1089/1066527041410418>), enabling accurate calculation of splice site strength scores.
Useful links