A gene set is an unordered collection of genes that are functionally related.
The goal of Over Representation Analysis (ORA) is to determine whether the gene set of a known biological functions or processes are over-represented(enriched) in an experimentally-derived gene list L.
The goal of GSEA is to determine whether members of a gene set S tend to occur toward the top (or bottom) of the list L, in which case the gene set is correlated with the phenotypic class distinction.
Kolmogorov-Smirnov statistic
Gene set variation analysis (GSVA) is a particular type of gene set enrichment method that works on single samples. It enables pathway-centric analyses of molecular data by performing a conceptually simple but powerful change in the functional unit of analysis, from gene to gene set.
Gene set variation analysis (GSVA) provides an estimate of pathway activity by transforming an input gene-by-sample expression data matrix into a corresponding gene-set-by-sample expression data matrix.
gsva.es <- gsva(X, gs, verbose=FALSE, method="gsva")
Method to employ in the estimation of gene-set enrichment scores per sample. By default this is set to gsva(Hänzelmann, Castelo, and Guinney 2013) and other options are:
The only requirement to do the RNA-seq integer count data is to set the argument kcdf="Poisson", which is "Gaussian" by default.
If our RNA-seq derived expression levels would be continuous, such as log-CPMs, log-RPKMs or log-TPMs, the default value of the kcdf argument should remain unchanged.