Skip to main content

Table 2 Common statistical methods and tests used in epidemiology, genetics, and metabolomics, with reference link to descriptive articles on appropriate general use

From: Beyond genomics: understanding exposotypes through metabolomics

Class of test

Type of test

Application/description

Refs

Descriptive

Mean

Median

Mode

The simplest of tests used to describe basic features within data.

Covered in all general statistical textbooks and used in most if not all scientific disciplines.

[67,68,69]

Range, variance, SD

Describe spreads of data within a population

Inferential

z test, t test, chi-square

Predicts/infers an observed mean, frequency, or proportion to a predetermined value, respectively.

ANOVA

Parametric method that tests the hypothesis that the means of two or more populations are equal. Frequently used to compare variance among groups relative to variance within groups

Kruskal-Wallis

Non-parametric method to rank statistical significant differences between two or more groups of an independent variable on a continuous/ordinal variable

Scaling

Centering, auto, pareto, log, MD

Data pretreatment methods aim at reducing biological and analytical bias

[70, 71]

Principal component

PCA

Unsupervised dimensional reduction procedure used to explain the maximum variance within complex datasets.

[72,73,74]

Multiblock PCA

PCA extension designed to find the underlying relationships between sets of related data

[65, 66, 75]

ANOVA-PCA

Uses PC dimensional reduction to determines the effect of the experimental factors on multiple dependent variables

[65, 76]

PC-DFA

Supervised test that summarizes the differentiation between groups while overlooking within-group variation.

[65, 77, 78]

Regression

Linear

Summarizes and quantifies the relationship between two continuous variables

[72, 79]

 

PLS

Used to predict a set of dependent variables from a large set of independent variables

[73, 77, 80,81,82]

O-PLS

orthogonal signal correction on PLS that maximizes the explained covariance on the first latent variable

[77, 81, 83]

PLS-R

Combination of the predictive power of regression alongside the ability to deal with high dimensionality and multicollinearity of variables.

[77, 84]

PLS-DA

Supervised approach to prediction on discrete variables

[77, 79, 83]

LASSO

Parsimonious approach to variable selection and regularization in order to enhance interpretability and reduce noise

[79, 80, 85,86,87]

Elastic net

Variable reduction approach where strongly correlated predictors coalesce in or out of the model together

[79, 80, 85, 87, 167]

  1. Definitions: SD standard deviation, MD median, PCA principal component analysis, ANOVA analysis of variance, PC-DFA principal component discriminant function analysis, PLS partial least squares (also known as projection of latent structures), O-PLS orthogonal PLS, PLS-R PLS regression, LASSO least absolute shrinkage and selection operator