Prediction tool | Score range | Deleterious score cutoff | Training data | Features | Machine learning method |
---|---|---|---|---|---|
GERP++ | −12.0 to 6.17 | >0.047 | None | Infers conserved or constrained elements from 33 mammalian genomes | – |
fitCons | 0 to 1 | >0.4 | None | Functional genomics data mainly sourced from chromatin analysis, e.g. ChIP-seq, and evolutionary conservation data | – |
SIFT | 1 to 0 | <0.05 | None | Conservation data (MSA of homologous sequences) and transformed into normalised probability matrix | – |
PolyPhen | 0 to 1 | >0.5 | HumVar, HumDiv | Conservation data (MSA of homologous sequences), protein functional domain data and protein structural features | Naïve Bayes classifier |
CADD | 0 to 35+ | >15 | Simulated, Swissvar, HumVar | Integrates several annotations into a single score, e.g. SIFT, GERP++, PolyPhen, CPG distance, GC content | SVM |
Condel | 0 to 1 | >0.5 | Â | Builds a unified classification by integration output from a collection of tools, e.g. SIFT, PolyPhen | Weighted average normalised scores |
REVEL | 0 to 1 | >0.5 | HGMD, EPS | HGMD and rare EPS variants used for training | Random forest |
fathmm | 0 to 1 | >0.45 | HGMD, Swiss-Prot | Combines evolutionary conservation with disease-specific protein weights for intolerance to mutation | Hidden Markov models |