What is miSTAR?
In miRNA target prediction, typically two levels of information need to be modeled: the number of potential miRNA binding sites present in a target mRNA and the genomic context of each individual site. Single model structures insufficiently cope with this complex training data structure, consisting of feature vectors of unequal length as a consequence of the varying number of miRNA binding sites in different mRNAs. To circumvent this problem, we developed a two-layered, stacked model, named miSTAR (miRNA stacked model target prediction), in which the influence of binding site context is separately modeled. Using logistic regression and random forests, we trained a stacked model on a unique dataset of 7990 probed miRNA-mRNA interactions. The miSTAR model is described in Van Peer & De Paepe et al. (2016), Nucleic Acids Research.
What data sources are used for miSTAR predictions?
The miSTAR webtool catalogues predictions for all human miRNAs in release 21 of miRBase and human protein-coding transcripts in version 75 of Ensembl.
Ensembl transcripts mapped on genome patches and haplotypes are not included in miSTAR. Genomic regions without annotated PhyloP and PhastCons conservation scores are omitted from the alignment to scan for potential microRNA binding sites.