SCENIC 的实现包括R版本和Python版本,本文参考SCENIC: Introduction and setup给出SCENIC R版本的安装教程及运行
淘宝商品: 转录组自动化分析流程
docker run --rm -it\ -e USERID=$(id -u) -e GROUPID=$(id -g) \ -e DISABLE_AUTH=true \ -p 8787:8787 \ -w $PWD \ -v $PWD:/home/rstudio \ -v $PWD:$PWD \ --entrypoint /init \ registry.cn-hangzhou.aliyuncs.com/wybioinfo/scenic:1.3.1
SCENIC数据库支持的物种包括:human, mouse and fly。如果要将SCENIC应用于其它物种,需要手动构建**RcisTarget databases**或者使用diferent motif-enrichment-analysis tool。
diferent motif-enrichment-analysis tool
SCENIC的输入数据是一个singlecell-RNA-seq的表达矩阵:
gene-symbol
gene-summarized counts
However, note that some authors recommend avoiding within sample normalization (i.e. TPM) for co-expression analysis (first step of SCENIC) because they may induce artificial co-variation (Crow et al. (2016)).
SCENIC的R版本实现是基于三个R包:
因此,需要安装以上三个包和一些额外的依赖,以运行SCIENC:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::version() BiocManager::install(c("AUCell", "RcisTarget")) BiocManager::install(c("GENIE3")) # Optional. Can be replaced by GRNBoost
# To score the network on cells (i.e. run AUCell): BiocManager::install(c("zoo", "mixtools", "rbokeh")) # For various visualizations and perform t-SNEs: BiocManager::install(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne")) # To support paralell execution (not available in Windows): BiocManager::install(c("doMC", "doRNG")) # To export/visualize in http://scope.aertslab.org if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") devtools::install_github("aertslab/SCENIC") packageVersion("SCENIC")
下载RcisTarget物种特异的数据库(motif 排名):https://resources.aertslab.org/cistarget/。默认情况下,SCENIC使用数据库对基因启动子中的motif(TSS上游高达500bp)和TSS周围20kb(+/-10kbp)进行评分。
例如小鼠的数据库:
dbFiles <- c("https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-500bp-upstream-7species.mc9nr.feather", "https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-tss-centered-10kb-7species.mc9nr.feather") # mc9nr: Motif collection version 9: 24k motifs
人类数据库
cd db wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather
加载library(SCENIC)后变量defaultDbNames
library(SCENIC)
> defaultDbNames $hgnc 500bp "hg19-500bp-upstream-7species.mc9nr.feather" 10kb "hg19-tss-centered-10kb-7species.mc9nr.feather" $mgi 500bp "mm9-500bp-upstream-7species.mc9nr.feather" 10kb "mm9-tss-centered-10kb-7species.mc9nr.feather" $dmel 5kbUpTx "dm6-5kb-upstream-full-tx-11species.mc8nr.feather"
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather
在SCENIC的R版本中目前不适用使用新版本的数据库,具体的issue见feather v1 or v2 for R package。因此为了使用R版本的issue使用的旧版本的数据库
以下内容参考Running SCENIC
library(tidyverse) exprMat <- read_tsv("https://gitee.com/bioinfoFungi/testData/raw/master/SCENIC/exprMat.tsv") |> column_to_rownames("gene") |> as.matrix() cellInfo <- read_tsv("https://gitee.com/bioinfoFungi/testData/raw/master/SCENIC/cellInfo.tsv") |> column_to_rownames("sample")
dir.create("int") saveRDS(cellInfo, file="int/cellInfo.Rds") colVars <- list(CellType=c("microglia"="forestgreen", "endothelial-mural"="darkorange", "astrocytes_ependymal"="magenta4", "oligodendrocytes"="hotpink", "interneurons"="red3", "pyramidal CA1"="skyblue", "pyramidal SS"="darkblue")) colVars$CellType <- colVars$CellType[intersect(names(colVars$CellType), cellInfo$CellType)] saveRDS(colVars, file="int/colVars.Rds")
library(SCENIC) org <- "mgi" dbDir <- "db" myDatasetTitle <- "SCENIC example on Mouse brain" dbs <- defaultDbNames[[org]] scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10,dbIndexCol='motifs') saveRDS(scenicOptions, file="int/scenicOptions.Rds")
遇到问题
> scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10,dbIndexCol='motifs') Motif databases selected: hg19-500bp-upstream-7species.mc9nr.feather hg19-tss-centered-10kb-7species.mc9nr.feather [1] "The index column 'motifs' is not available in the file." [1] "The index column 'motifs' is not available in the file." Error in eval(as.name(motifAnnotName)) : object 'motifAnnotations_hgnc' not found In addition: Warning message: In initializeScenic(org = org, dbDir = dbDir, dbs = dbs, datasetTitle = myDatasetTitle, : It was not possible to load the following databses; check whether they are downloaded correctly: hg19-500bp-upstream-7species.mc9nr.feather hg19-tss-centered-10kb-7species.mc9nr.feather
data(list="motifAnnotations_hgnc_v9", package="RcisTarget") motifAnnotations_hgnc <- motifAnnotations_hgnc_v9 fromgithub
data(list="motifAnnotations_hgnc_v9", package="RcisTarget") motifAnnotations_hgnc <- motifAnnotations_hgnc_v9
genesKept <- geneFiltering(exprMat, scenicOptions=scenicOptions, minCountsPerGene=3*.01*ncol(exprMat), minSamples=ncol(exprMat)*.01) exprMat_filtered <- exprMat[genesKept, ] #### Correlation runCorrelation(exprMat_filtered, scenicOptions) #### GENIE3 exprMat_filtered <- log2(exprMat_filtered+1) runGenie3(exprMat_filtered, scenicOptions)
使用reticulate从R语言调用Python中的grnboost2
reticulate
grnboost2
library(reticulate) use_python("/home/wangyang/anaconda3/bin/python") use_condaenv("base") arboreto.algo <- import("arboreto.algo") adjacencies <- arboreto.algo$grnboost2(as.data.frame(t(as.matrix(exprMat_filtered))), tf_names = tf_name, seed = 2022L) colnames(adjacencies) <- c("TF","Target","weight") saveRDS(adjacencies,file=getIntName(scenicOptions,"genie3ll"))
scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions) scenicOptions <- runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget")) scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_filtered) aucellApp <- plotTsne_AUCellApp(scenicOptions, exprMat_filtered) savedSelections <- shiny::runApp(aucellApp) scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
Projection the AUC and TF expression onto t-SNEs
tSNE_scenic <- readRDS(tsneFileName(scenicOptions)) aucell_regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC") # Show TF expression: png(file="a.png") par(mfrow=c(2,3)) AUCell::AUCell_plotTSNE(tSNE_scenic$Y, exprMat, aucell_regulonAUC[onlyNonDuplicatedExtended(rownames(aucell_regulonAUC))[c("Dlx5", "Sox10", "Sox9","Irf1", "Stat6")],], plots="Expression") dev.off()
# Save AUC png(file="a.png") par(mfrow=c(4,6)) AUCell::AUCell_plotTSNE(tSNE_scenic$Y, cellsAUC=aucell_regulonAUC, plots="AUC") dev.off()
regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC") regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),] regulonActivity_byCellType <- sapply(split(rownames(cellInfo), cellInfo$CellType), function(cells) rowMeans(getAUC(regulonAUC)[,cells])) regulonActivity_byCellType_Scaled <- t(scale(t(regulonActivity_byCellType), center = T, scale=T)) png(file="a.png") ComplexHeatmap::Heatmap(regulonActivity_byCellType_Scaled, name="Regulon activity") dev.off()
topRegulators <- reshape2::melt(regulonActivity_byCellType_Scaled) colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity") topRegulators <- topRegulators[which(topRegulators$RelativeActivity>0),] head(topRegulators) Regulon CellType RelativeActivity 5 Sox9 (189g) astrocytes_ependymal 1.7534026 8 Tef (405g) interneurons 0.7679025 9 Dlx5 (35g) interneurons 1.7294068 17 Irf1 (138g) microglia 1.7558155 18 Stat6 (101g) microglia 1.7880378 27 Sox8 (97g) oligodendrocytes 1.7634632
minPerc <- .7 binaryRegulonActivity <- loadInt(scenicOptions, "aucell_binary_nonDupl") cellInfo_binarizedCells <- cellInfo[which(rownames(cellInfo)%in% colnames(binaryRegulonActivity)),, drop=FALSE] regulonActivity_byCellType_Binarized <- sapply(split(rownames(cellInfo_binarizedCells), cellInfo_binarizedCells$CellType), function(cells) { rowMeans(binaryRegulonActivity[,cells, drop=FALSE]) }) binaryActPerc_subset <- regulonActivity_byCellType_Binarized[which(rowSums(regulonActivity_byCellType_Binarized>minPerc)>0),] png(file="a.png") ComplexHeatmap::Heatmap(binaryActPerc_subset, name="Regulon activity (%)", col = c("white","pink","red")) dev.off()
topRegulators <- reshape2::melt(regulonActivity_byCellType_Binarized) colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity") topRegulators <- topRegulators[which(topRegulators$RelativeActivity>minPerc),] head(topRegulators) Regulon CellType RelativeActivity 5 Sox9 (189g) astrocytes_ependymal 0.9655172 8 Dlx5 (35g) interneurons 0.7906977 14 Tef (405g) interneurons 0.9767442 16 Irf1 (138g) microglia 1.0000000 20 Stat6 (101g) microglia 1.0000000 24 Sox10 (108g) oligodendrocytes 0.9090909
rss <- calcRSS(AUC=getAUC(regulonAUC), cellAnnotation=cellInfo[colnames(regulonAUC), "CellType"]) png(file="a.png") plotRSS_oneSet(rss, setName = "interneurons") dev.off()