Eigengene: explaination and reproduce

WGCNA

Posted by Yuan on February 3, 2022

wake up!

Eigengene

The concept of eigengene was first introduced by Orly et al. By definition, eigengenes are the right singular vectors of the SVD of the expression matrix(row as genes, column as samples). This concept become popular when WGCNA used it to represent the overal expression of a module. Actually, eigengene and WGCNA modules are perfect matchings! In a specific WGCNA module, all genes were in the same cluster by a distance which is closely related to the correlation matrix. In this senario, all genes in one module are highly correlated, and thus their PC1 could best represent their overal expression.

In WGCNA, by default and in most of times, the first principle component is used to represent the expression values of a group of genes, under the name of eigen gene. But sometimes, we just want to calculate eigengene genes, how to do that in a simple way?

After going through the codes in WGCNA functions, I have two solutions for this: data_PCA is row as genes,and columns as samples. This kind of row-column structure is the transpose of what are generally used in statistics and machine learning field.

  1. Using WGCNA
    1
    2
    3
    4
    5
    
     library(WGCNA)
     ##calculate eigen gene value of all MT genes
     complex_colors_genes <- rep("grey",dim(data_PCA)[1])
     MEList_genes <- moduleEigengenes(as.matrix(t(data_PCA)), colors = complex_colors_genes)
     gene_eigen <- MEList_genes$eigengenes[,"MEgrey"]
    
  2. Using SVD
    1
    2
    3
    4
    5
    6
    
     eigengenes <- svd(t(scale(t(data_PCA))),nu=1,nv=1)$v
     scaledExpr <- scale(t(data_PCA))
     averExpr <- rowMeans(scaledExpr, na.rm = TRUE)
     if(cor(averExpr,eigengenes) < 0){
         eigengenes <- -eigengenes
     }
    

SVD vs PRCOMP

A good online resource for this part is in the blog SVD vs PRCOMP in R, detailed summary in stackexchange, and a fantastica vedio from 3Blue1Brown

Below are the code examples They are correct only if 𝐗 is centered

Also, we should notice that in these examples,there is a transpose transformation to data_PCA_..

svd1 = svd(t(data_PCA_scaledbyrow))

Which is not the same to the one above. Thus, svd1$u here is the same to svd$v above, where columns are the unscaled PCs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
pca <- prcomp(t(data_PCA), scale=FALSE, center=TRUE)
summary(pca)

#X=U*D*t(V)
#X: a matrix s(samples) x g(genes) matrix 
#U is an s×p orthogonal matrix. Columns “p” are the unscaled PCs, or eigengenes in this case
#V is an g×p orthogonal matrix.
#D is an p×p diagonal matrix. It only has values on the diagonals (eigen values)
data_PCA_scaledbyrow <- data_PCA - rowMeans(data_PCA) #first, center the data on the genes
svd1 <- svd(t(data_PCA_Scaledbyrow)) # apply SVD on transposed data

#Eigen vectors
svd1$v
#or
pca$rotation
all(pca$rotation == svd1$v)

Z = t(data_PCA_scaledbyrow) %*% svd1$v
#Z==XV==UD==pca$x
pca$x
svd1$u #U are unscaled PCs, eigengenes
svd1$u %*% svd1$d #UD, the scaled PCs(principal components), D is the 'scaling factors',

#The proportion of variance explained by each PC
#method1
svd1$d^2/sum( svd1$d^2)
#method2
summary(pca)
#method3
pca$sdev^2 /sum(pca$sdev^2)

#loadings
svd1$v %*% svd1$d
#or
fviz_contrib(pca, choice="var", axes = 1,top=20 )