君子和而不同

This is the third part of the PRS blog, calculating PRS scores using PRSice2 and lassosum.

Tip: Key point to calculate PRS

The formula to calculate PRS score is given in "PRS score 2" blog: $$ PRS_j =\frac{ \sum_i^NS_i*G_{ij}}{P*M_j} $$ Here, the effect size of SNP $i$ is $S_i$; the number of effect alleles observed in sample $j$ is $G_{ij}$; the ploidy of the sample is $P$ (is genrally 2 for human); the total number of SNPs included in the PRS is N;and the number of non-missing SNPs observed in sample $j$ is $M_j$. If the sample has a missing genotype for SNP $i$, then the population minor allele frequency multiplied by the ploidy ($MAF_i * P$) is used instead of $G_{ij}$.

where 'a hyperparameter'(pvalue threshold) that determine the set of SNPs to be included when calculating PRS scores.

This hyperparameter is tuned by selecting the largest $R^2$ diff between Full regression model and the reduced model.

Full Model: $$ phenotypeTrail \backsim PRS + COV $$

Reduced Model: $$ phenotypeTrail \backsim COV $$

Required Files

File Name	Description
Height.QC.gz	The post QC base data file. While PRSice-2 can automatically apply most filtering on the base file, it cannot remove duplicated SNPs
EUR.QC.bed	This file contains the genotype data that passed the QC steps
EUR.QC.bim	This file contains the list of SNPs that passed the QC steps
EUR.QC.fam	This file contains the samples that passed the QC steps
EUR.height	This file contains the phenotype data of the samples
EUR.cov	This file contains the covariates of the samples
EUR.eigenvec	This file contains the principal components (PCs) of the samples

Running PRS analysis

To run PRSice-2, we need merge EUR.cov and EUR.eigenvec.

library(data.table)
covariate <- fread("EUR.cov")
pcs <- fread("EUR.eigenvec", header=F)
colnames(pcs) <- c("FID","IID", paste0("PC",1:6))
cov <- merge(covariate, pcs)
fwrite(cov,"EUR.covariate", sep="\t")
q()

Then, PRS could calculated through PRSice

Rscript ~/softwares/PRSice_mac/PRSice.R \
    --prsice ~/softwares/PRSice_mac/PRSice_mac \
    --base Height.QC.gz \
    --target EUR.QC \
    --binary-target F \
    --pheno EUR.height \
    --cov EUR.covariate \
    --base-maf MAF:0.01 \
    --base-info INFO:0.8 \
    --stat OR \
    --or \
    --out EUR

This will automatically perform “high-resolution scoring” and generate the “best-fit” PRS (in EUR.best), with associated plots of the results.

Tables of parameters

Parameter	Value	Description
prsice	PRSice_xxx	Informs PRSice.R that the location of the PRSice binary
base	Height.QC.gz	Informs PRSice that the name of the GWAS summary statistic
target	EUR.QC	Informs PRSice that the input genotype files should have a prefix of EUR.QC
binary-target	F	Indicate if the phenotype of interest is a binary trait. F for no, T for yes
pheno	EUR.height	Provide PRSice with the phenotype file
cov	EUR.covariate	Provide PRSice with the covariate file
base-maf	MAF:0.01	Filter out SNPs with MAF < 0.01 in the GWAS summary statistics, using information in the MAF column
base-info	INFO:0.8	Filter out SNPs with INFO < 0.8 in the GWAS summary statistics, using information in the INFO column
stat	OR	Column name of the column containing the effect size
or	-	Inform PRSice that the effect size is an Odd Ratio
out	EUR	Informs PRSice that all output should have a prefix of EUR

Using LDpred-2

LDpred-2 is an R package that uses a Bayesian approach to polygenic risk scoring. The tutorial along with LDpred-2 is also worthing trying. Based on their suggestions, “In practice, until we find a better set of variants, we recommend using the HapMap3 variants used in the PRS-CS and LDpred2 papers. If you do not have enough data to use as LD reference (e.g. at least 2000 individuals)”, you could use one a good LD references prepared by them. This part is too complicated for my current PRS applications, so… just ignore it for now.

Calculate PRS using lassosum

lassosum is an R package that uses penalised regression (LASSO) in its approach to PRS calculation.

#install.packages(c("devtools","RcppArmadillo", "data.table", "Matrix"), dependencies=TRUE)
#library(devtools)
#install_github("tshmak/lassosum")
library(lassosum)
# Prefer to work with data.table as it speeds up file reading
library(data.table)
library(methods)
library(magrittr)
# For multi-threading, you can use the parallel package and 
# invoke cl which is then passed to lassosum.pipeline
library(parallel)
# This will invoke 2 threads. 
cl <- makeCluster(2)

sum.stat <- "Height.QC.gz"
bfile <- "EUR.QC"
# Read in and process the covariates
covariate <- fread("EUR.cov")
pcs <- fread("EUR.eigenvec") %>%
    setnames(., colnames(.), c("FID","IID", paste0("PC",1:6)))
# Need as.data.frame here as lassosum doesn't handle data.table 
# covariates very well
cov <- merge(covariate, pcs)

# We will need the EUR.hg19 file provided by lassosum 
# which are LD regions defined in Berisa and Pickrell (2015) for the European population and the hg19 genome.
ld.file <- "EUR.hg19"
# output prefix
prefix <- "EUR"
# Read in the target phenotype file
target.pheno <- fread("EUR.height")[,c("FID", "IID", "Height")]
# Read in the summary statistics
ss <- fread(sum.stat)
# Remove P-value = 0, which causes problem in the transformation
ss <- ss[!P == 0]
# Transform the P-values into correlation
cor <- p2cor(p = ss$P,
        n = ss$N,
        sign = log(ss$OR)
        )
fam <- fread(paste0(bfile, ".fam"))
fam[,ID:=do.call(paste, c(.SD, sep=":")),.SDcols=c(1:2)]


# Run the lassosum pipeline
# The cluster parameter is used for multi-threading
# You can ignore that if you do not wish to perform multi-threaded processing
out <- lassosum.pipeline(
    cor = cor,
    chr = ss$CHR,
    pos = ss$BP,
    A1 = ss$A1,
    A2 = ss$A2,
    ref.bfile = bfile,
    test.bfile = bfile,
    LDblocks = ld.file, 
    cluster=cl
)
# Store the R2 results
target.res <- validate(out, pheno = as.data.frame(target.pheno), covar=as.data.frame(cov))
# Get the maximum R2
r2 <- max(target.res$validation.table$value)^2

PRS score (3)

using PRSice-2, lassosum

Required Files

Running PRS analysis

Using LDpred-2

Calculate PRS using lassosum

CATALOG

FEATURED TAGS

FRIENDS