Tools for calling RNA modifications from meRIP-seq
MACS was originally designed to give robust and high resolution peak identification for ChIP-Seq data. It can also be used to identify peak for MeRIP-Seq data.
ExomePeak2 provides peak detection and differential methylation for Methylated RNA Immunoprecipitation Sequencing (MeRIP-Seq) data. MeRIP-Seq is a commonly applied sequencing assay that measures the location and abundance of RNA modification sites under specific cellular conditions.
MeRIPtools is a comprehensive tool to process and analyze aligned sequencing data. MeRIPtools also provide a framework to manage data associated with peak-calling, differential methylation analysis in R.
MeTPeak is a graphical model-based peak calling method for transcriptome-wide detection of m6A sites from MeRIP-seq data. MeTPeak explicitly models read count of an m6A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model to characterize the reads dependency across a site.
BayesPeak is a Bioconductor package for the analysis of data sets from ChIP-seq experiments, particularly for identifying the genomic sites of protein–DNA interactions. It can also be used to identify RNA modification peak for MeRIP-Seq data.
m6aViewer is a cross-platform application for analysis and visualization of m6A peaks from sequencing data. m6aViewer implements a novel m6A peak-calling algorithm that identifies high-confidence methylated residues with more precision than previously described approaches.
PEA is an integrated R toolkit to facilitate the analysis of plant epitranscriptome data. The PEA toolkit contains a comprehensive collection of functions required for read mapping, CMR calling, motif scanning and discovery and gene functional enrichment analysis.
deepEA is a convenient, freely available, web-based platform that is capable to support deep analysis of epitranscriptome sequencing data with several general and specific functionalities. Currently, deepEA consists of six modules: Data Preparation, Quality Control, Identification of RNA Modifications, Functional Annotation, Multi-omics Integrative Analysis and Prediction Analysis Based on Machine Learning.
RNA modification predictors ( Diverse modifications Specific modification)
The RNAMethPre web server provides a user-friendly tool for the prediction and query of mRNA m6A sites.
HAMR (High-throughput Annotation of Modified Ribonucleotides) is a web application that can not only locate these modifications transcriptome-wide with single nucleotide resolution in RNA-seq data, but can also differentiate between different classes of modifications.
RNAm5Cfinder is a web-server that is based on RNA sequence features and machine learning method to predict RNA m5C sites in eight tissue/cell types from mouse and human.
WHISTLE is a prediction framework for transcriptome-wide m6A RNA-methylation site prediction. A web server was built to facilitate the query of their high-accuracy map of the human m6A epitranscriptome predicted by WHISTLE.
DeepM6ASeq is a deep-learning-based framework to predict m6A-containing sequences and visualize saliency map for sequences.
BERMP is a web server that could predict multi-species m6A sites from nucleotide sequences. It integrates a classifier based on random forest with the encoding of extended nucleic acid content and a deep-learning classifier based on bidirectional Gated Recurrent Units. BERMP performs better than existing m6A classifiers for different species.
SRAMP is a mammalian m6A sites predictor which can extract and integrate the sequence and predicted structural features around m6A sites under a machine learning framework.
RFAthM6A is tool for predicting m6A sites in Arabidopsis thaliana based on manually curated a reliable dataset of m6A sites and non-m6A sites.
iRNA-Methyl is a web server for identifying N6- methyladenosine sites using pseudo nucleotide composition.
iRNA-2methyl is a web server for identifying RNA 2'-O-methylation Sites by Incorporating Sequence-Coupled Effects into General PseKNC and Ensemble Classifier.
PseUI was developed by using support vector machine based on three different kinds of features including position specific nucleotide propensity, nucleotide composition, and Pseudo nucleotide composition. Now it is for three different species including H. sapiens, M. musculus and S. cerevisiae.
RAM-NPPS is a sequence predictor for identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine.
MethyRNA is a sequence-based tool for the identification of N6-methyladenosine sites
HMpre is a mRNA N6-Methylation predictor for human, which exhibits good performance and robustness, owing to training on the whole dataset without sampling noise.
PEA-m5C is a machine learning-based m5C predictor trained with features extracted from the flanking sequence of m5C modifications.
M5C-HPCR is a m5C site predictor by introducing a novel heuristic nucleotide physicochemical property reduction (HPCR) algorithm and classifier ensemble.
AthMethPre is a web server for the prediction and query of mRNA m6A sites in Arabidopsis thaliana.
M6APred-EL is a a Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning.
m6ASNP is a user-friendly web server that is dedicated to the identification of genetic variants that target m6A modification sites.
pRNAm-PC is a tool for predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties.
PPUS is a web server to predict PUS-specific pseudouridine sites.
Large-scale functional prediction of individual m6A RNA methylation sites from an RNA co-methylation network.
M6AMRFS is a new machine learning based predictor for the identification of m6A sites.
Identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition.
The m6A-NeuralTool makes the final prediction for the identification of m6A sites by applying majority voting on three different sub-architectures. These sub-architectures uses a set of Convolution layers to extract the important features from the one-hot encoded input sequence. Further, one of the sub-architectures uses fully connected layers for the classification while the other two uses Support Vector Machine and Naive Bayes.
The software was applied to the synthetic modified IVT RNA, rRNA, and mRNA sequences obtained with the materials described in the next section,the RNA modification prediction performance of ELIGOS by using rBEMs as the reference and the modified IVT RNA datasets that contained nine types of base modifications (m6A, m1A, 5moU, psU, m7G, Ino, hm5C, f5C and m5C).
The web-server iRNA-methyl was developed to identify the N6-methyladenosine (m6A). It was observed by the rigorous cross-validation test on the benchmark dataset that the accuracy achieved by the predictor in identifying m6A was 65.59%. All benchmark data can be downloaded from the Data window of this web-server.
The web-server was an updated version of iRNA-Methyl, which was developed to identify the N6-methyladenosine (m6A) in the Saccharomyces cerevisiae genome. It was observed by the 10-fold cross validation test on the benchmark dataset that the accuracy achieved by the predictor in identifying m6A sites in the Saccharomyces cerevisiae genome was greater than 88%. The performance is better than iRNA-Methyl.
RNA modification database ( Diverse modifications Specific modification)
MODOMICS is a database of RNA modifications that provides comprehensive information concerning the chemical structures of modified ribonucleosides, their biosynthetic pathways, the location of modified residues in RNA sequences, and RNA-modifying enzymes.
RMBase v2.0 is a comprehensive database to integrate epitranscriptome sequencing data for exploring post-transcriptionally modifications of RNAs, as well as their relationships with microRNA binding events, disease-related SNPs and RNA-binding proteins (RBP). RMBase v2.0 was expanded with 566 datasets and 1 397 244 modification sites from 47 studies among 13 species, which represented about 10 times expansion when compared to the previous release. It contains ~1 373 000 N6-Methyladenosines (m6A), ~5 400 N1-Methyladenosines (m1A), ~9 600 pseudouridine (Ψ) modifications, ~1 000 5-methylcytosine (m5C) modifications, ~5 100 2′-O-methylations (2′-O-Me), and ~2 800 modifications of 100 other types.
RNAMDB has served as a focal point for information pertaining to naturally occurring RNA modifications. In its current state, the database employs an easy-to-use, searchable interface for obtaining detailed data on the 109 currently known RNA modifications. Each entry provides the chemical structure, common name and symbol, elemental composition and mass, CA registry numbers and index name, phylogenetic source, type of RNA species in which it is found, and references to the first reported structure determination and synthesis.
MeT-DB was the first comprehensive database focusing on N6-methyladenosine (m6A) methyltranscriptome. MeT-DB V2.0 is the significantly improved and redesigned version that focuses more on elucidating context-specific m6A functions.
m6AVar is a comprehensive database of m6A-associated variants that potentially influence m6A modification, which will help to interpret variants by m6A function. The m6A-associated variants were derived from three different m6A sources including miCLIP/PA-m6A-seq experiments (high confidence), MeRIP-Seq experiments (medium confidence) and transcriptome-wide predictions (low confidence).
RMVar is the updated version of m6AVar renamed RMVar, which contains 1 678 126 RM-associated variants for 9 kinds of RNA modifications, namely m6A, m6Am, m1A, pseudouridine, m5C, m5U, 2'-O-Me, A-to-I and m7G, at three confidence levels. Moreover, RBP binding regions, miRNA targets, splicing events and circRNAs were integrated to assist investigations of the effects of RM-associated variants on posttranscriptional regulation. In addition, disease-related information was integrated from ClinVar and other genome-wide association studies (GWAS) to investigate the relationship between RM-associated variants and diseases.
m6A2Target is a comprehensive database for the target gene of writers, erasers and readers (WERs) of m 6A modification. It intergrates high confidential targets validated by low-throughput experiments and potential targets with binding evidence indicated by high-throughput sequencing such as CLIP-Seq, RIP-seq and ChIP-seq or inferred from m 6A WER perturbation followed by high-throughput sequencing such as RNA-Seq, m6A-Seq and Ribo-Seq.
REPIC (RNA Epitranscriptome Collection) is a database dedicated to provide a new resource to investigate potential functions and mechanisms of N6-adenosine methylation (m6A) modifications. Currently, the database includes about 700 samples of 50 public studies that were reprocessed by our refined pipeline. The database also contains ENCODE histone ChIP-seq and DNase-seq data to correlate m6A modifications. It provided multi-dimensional information (e.g. cell or tissue specificity) to query m6A modification relevance in cellular processes. In addition, a built-in modern genome browser presented a comprehensive atlas of m6A modifications which is helpful to visualize m6A modification sites between different samples and conditions.
m6ADD is a database containing manually collected experimentally confirmed m6A-disease data and data obtained from high-throughput disease m6A modification profiles, aimed at exploring the association between m6A modified gene disorders and diseases. The m6ADD database contains 222 experimentally confirmed m6A-disease relationship pairs (human 185, mouse 37). We screened out differential m6A data from 30 sets of sequencing data of 16 diseases by two calculation methods, and provided a statistical evaluation result. The m6A-disease data includes m6A genomic location, disease name, m6A protein, regulatory mode, tissue / cell line, experimental method, and data source. In addition, we have developed a ppi network tool for obtaining differential m6A genes to show the function of these genes and a tool to predict m6A regulatory proteins associated with 24 types of cancers.
m6A-Atlas v2.0 was expanded to include 797,091 reliable m6A sites among, with 13 high-resolution technologies and 109 conditions.Additionally, sites derived from single-cell technique were presented for the first time.It estimated quantitative epitranscriptome profiles under 241 conditions for human and 129 conditions for mouse.A user-friendly graphical interface was constructed to support the query, exploration and sharing of the m6A epitranscriptomes annotated with putative post-transcriptional machinery (SNP association, conservation, RBP-binding, microRNA interaction, splicing sites, subcellular location and circRNA generation).
ENCORE (The Encyclopedia of RNA Epitranscriptome) is an upgraded version of RMBase that mainly focuses on the mechanism and function of diverse RNA modifications. ENCORE is a comprehensive and convenient platform for efficient studying RNA modifications from large amounts of epitranscriptome high-throughput sequencing data. It provides multiple interfaces and web-based tools to integrate 73 types of RNA modification among 62 species, uncover the relationships between RNA modifications with a series of interacting factors, and reveal the distribution patterns, biochemical mechanisms, evolutionary conservation of RNA modifications and their biological roles in human diseases.
RMDisease, a database of genetic variants that can affect RNA modifications. By integrating the prediction results of 18 different RNA modification prediction tools and also 303,426 experimentally-validated RNA modification sites, RMDisease identified a total of 202,307 human SNPs that may affect (add or remove) sites of eight types of RNA modifications (m6A, m5C, m1A, m5U, Ψ, m6Am, m7G and Nm). These include 4,289 disease-associated variants that may imply disease pathogenesis functioning at the epitranscriptome layer. These SNPs were further annotated with essential information such as post-transcriptional regulations (sites for miRNA binding, interaction with RNA-binding proteins and alternative splicing) revealing putative regulatory circuits. A convenient graphical user interface was constructed to support the query, exploration and download of the relevant information.
RNA features annotation
RNAModR provides functions to map lists of genomic loci of RNA modifications to a reference mRNA transcriptome, and perform exploratory functional analyses of sites across the transcriptome trough visualisation and statistical analysis of the distribution of sites across transcriptome sections (5'UTR, CDS, 3'UTR).
The package is designed for transcriptomic visualization of RNA-related genomic features represented with genome-based coordinates with respect to the landmarks of RNA transcripts.
RCAS is an R/Bioconductor package designed as a generic reporting tool for the functional analysis of transcriptome-wide regions of interest detected by high-throughput experiments. Such transcriptomic regions could be, for instance, signal peaks detected by CLIP-Seq analysis for protein-RNA interaction sites, RNA modification sites (alias the epitranscriptome), CAGE-tag locations, or any other collection of query regions at the level of the transcriptome.
RNAmod is a very convenient web-based platform for the meta-analysis and functional annotation of modifications on mRNAs. RNAmod uses the commonly used BED format chromosomal location information of mRNA modifications as input, which can be generated with common peak-calling tools, such as MACS, Exomepeaks and MetaTeak, or can be easily converted from other text formats.
RNA modification high-throughput technologies ( Diverse modifications Specific modification)
ICE followed by NGS identifies adenosine-to-inosine editing. In this method, RNA is treated with acrylonitrile, while control RNA is untreated. Control and treated RNAs are reverse-transcribed and PCR-amplified. Inosines in RNA fragments treated with acrylonitrile cannot be reverse-transcribed. Deep sequencing of the cDNA prepared from control and treated RNA provides high-resolution reads of inosines in RNA fragments.
MeRIP-Seq (Methylated RNA Immunoprecipitation Sequencing) maps methylated RNA. In this method, modification-specific antibodies are used to immunoprecipitate RNA. RNA is reverse-transcribed to cDNA and sequenced. Deep sequencing provides high-resolution reads of methylated RNA.
miCLIP-m6A (m6A Individual-Nucleotide-Resolution Crosslinking and Immunoprecipitation) maps m6A locations in the transcriptome with single-nucleotide resolution. In this method, anti-m6A antibodies are crosslinked to mRNA sequences, and a cDNA library is prepared and sequenced. The cDNA library preparation in miCLIP follows the iCLIP protocol closely.
PA-m6A-seq is a photo-crosslinking-assisted m6A sequencing strategy to more accurately define sites with m6A modification.
m6A-LAIC-seq (m6A-level and isoform-characterization sequencing) is a method to quantify transcript copies of particular genes with m6A modified ('m6A levels') or the relationship of m6A modification(s) to alternative RNA isoforms.
Bisulfite-seq can be used to map modified cytosine sites across a human transcriptome
m5C-RIP (m5C RNA immunoprecipitation)
Aza-IP (5-azacytidine–mediated RNA immunoprecipitation) exploits the catalytic mechanisms of the m5C methyltransferases to covalently link methyltransferase to its RNA targets. First, the cytidine analog 5-azacytidine is randomly incorporated into the nascent RNA of cells overexpressing an epitope-tagged m5C RNA methyltransferase. Due to the nitrogen substitution at the C5 position, a stable covalent bond forms when the RNA methyltransferase attacks the C6 position of its RNA targets. These targets are enriched by immunoprecipitation and subsequently sequenced.
Ψ-seq is a method to transcriptome-wide quantitative mapping of Ψ, which has been used to identify the vast majority of Ψ sites in rRNA, tRNA and snRNA and dozens of novel sites within snoRNAs and mRNAs.
CeU-Seq (N3-CMC-enriched pseudouridine sequencing) is a selective chemical labeling and pulldown method, which identified 2,084 Ψ sites within 1,929 human transcripts.
Pseudo-Seq detects pseudouridylation sites in ncRNAs with single-nucleotide resolution using high-throughput sequencing. Pseudo-Seq is very similar to PSI-seq, in that both methods use CMC to modify pseudouridines selectively and halt reverse transcription. However, Pseudo-Seq circularizes cDNA strands before PCR amplification and purification, instead of using ARTseq.
PSI-Seq (Pseudouridine Site Identification Sequencing) identifies RNA sequences containing pseudouridine sites using high-throughput sequencing. PSI-Seq uses N-Cyclohexyl-N_-(2-morpholinoethyl)carbodiimide (CMC) to modify pseudouridines selectively, effectively halting reverse transcription. The cDNA libraries are prepared by the ARTseq method.
m1A-seq is based on methylated RNA immunoprecipitation sequencing (MeRIP-seq), which is used for transcriptome-wide localization of m1A sites and coupled it to an orthogonal chemical method based on Dimroth rearrangement to obtain high-resolution m1A maps
m1A-ID-seq technique is based on m1A immunoprecipitation and the inherent ability of m1A to stall reverse transcription, as a means for transcriptome-wide m1A profiling.
TRAC-seq is m7G methylated tRNA immunoprecipitation sequencing (MeRIP-seq) and tRNA reduction and cleavage sequencing to reveal the m7G tRNA methylome
AlkAniline-Seq is a new principle of RNAseq library preparation, which relies on a chemistry based positive enrichment of reads in the resulting libraries, and therefore leads to unprecedented signal-to-noise ratios. It enables a deep sequencing-based technology for the simultaneous detection of 7-methylguanosine (m7G) and 3-methylcytidine (m3C) in RNA at single nucleotide resolution.
Other
ChIPseeker is a Bioconductor package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statistical methods for estimate the significance of overlap among ChIP peak data sets.
The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
ENCODE is a public research consortium aimed at identifying all functional elements in the human and mouse genomes.
POSTAR is a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (~23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes.
The m6Acorr server could not only efficiently eliminate the potential bias in m6A methylation profiles, but also perform profile-profile comparisons and functional analysis of hyper- (hypo-) methylated genes based on the corrected methylation profiles.
m6Areader can predict the putative binding readers of m6A sites.