SCENIC-R安装及运行
SCENIC 的实现包括**R版本**和**Python版本**,本文参考SCENIC: Introduction and setup给出SCENIC R版本的安装教程及运行
docker run --rm -it\
-e USERID=$(id -u) -e GROUPID=$(id -g) \
-e DISABLE_AUTH=true \
-p 8787:8787 \
-w $PWD \
-v $PWD:/home/rstudio \
-v $PWD:$PWD \
--entrypoint /init \
registry.cn-hangzhou.aliyuncs.com/wybioinfo/scenic:1.3.1
先决条件
支持物种
SCENIC数据库支持的物种包括:human, mouse and fly。如果要将SCENIC应用于其它物种,需要手动构建**RcisTarget databases**或者使用diferent motif-enrichment-analysis tool
。
- human:
表达矩阵
SCENIC的输入数据是一个singlecell-RNA-seq的表达矩阵:
- 每列对应一个样本(细胞),每行对应一个基因
- 基因的ID是
gene-symbol
gene-summarized counts
是推荐的输入数据
However, note that some authors recommend avoiding within sample normalization (i.e. TPM) for co-expression analysis (first step of SCENIC) because they may induce artificial co-variation (Crow et al. (2016)).
软件安装
SCENIC的R版本实现是基于三个R包:
- GENIE3: to infer the co-expression network(faster alternative: GRNBoost2)
- RcisTarget:for the analysis of transcription factor binding motifs
- AUCell:to identify cells with active gene sets (gene-network) in scRNA-seq data(识别具有活性基因集的细胞)
因此,需要安装以上三个包和一些额外的依赖,以运行SCIENC:
Required
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::version()
BiocManager::install(c("AUCell", "RcisTarget"))
BiocManager::install(c("GENIE3")) # Optional. Can be replaced by GRNBoost
Optional
# To score the network on cells (i.e. run AUCell):
BiocManager::install(c("zoo", "mixtools", "rbokeh"))
# For various visualizations and perform t-SNEs:
BiocManager::install(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"))
# To support paralell execution (not available in Windows):
BiocManager::install(c("doMC", "doRNG"))
# To export/visualize in http://scope.aertslab.org
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)
安装SCENIC
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("aertslab/SCENIC")
packageVersion("SCENIC")
物种特异的数据库
下载RcisTarget物种特异的数据库(motif 排名):https://resources.aertslab.org/cistarget/。默认情况下,SCENIC使用数据库对基因启动子中的motif(TSS上游高达500bp)和TSS周围20kb(+/-10kbp)进行评分。
例如小鼠的数据库:
dbFiles <- c("https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-500bp-upstream-7species.mc9nr.feather",
"https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-tss-centered-10kb-7species.mc9nr.feather")
# mc9nr: Motif collection version 9: 24k motifs
人类数据库
cd db
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather
加载library(SCENIC)
后变量defaultDbNames
> defaultDbNames
$hgnc
500bp
"hg19-500bp-upstream-7species.mc9nr.feather"
10kb
"hg19-tss-centered-10kb-7species.mc9nr.feather"
$mgi
500bp
"mm9-500bp-upstream-7species.mc9nr.feather"
10kb
"mm9-tss-centered-10kb-7species.mc9nr.feather"
$dmel
5kbUpTx
"dm6-5kb-upstream-full-tx-11species.mc8nr.feather"
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather
在SCENIC的R版本中目前不适用使用新版本的数据库,具体的issue见feather v1 or v2 for R package。因此为了使用R版本的issue使用的旧版本的数据库
运行SCENIC
以下内容参考Running SCENIC
- 构建基因调控网络
- 基于共表达网络鉴定每一个TF潜在的targets
- 过滤共表达矩阵
- 运行GENIE3或者GRNBoost
- 将GENIE3/GRNBoost中的targets格式化为共表达模块。
- 基于DNA-motif的分析(RcisTarget: TF motif analysis)选择潜在的直接结合的targets(regulons)
- 基于共表达网络鉴定每一个TF潜在的targets
- 鉴定细胞的状态和其调控因子
- 分析每个单独细胞中的网络活性
- 计算细胞中的regulon得分(calculate AUC)
- (可选)转换矩阵为二分类的矩阵 (binary activity matrix)
- 基于基因调控网络的活性识别稳定的细胞状态
- 分析每个单独细胞中的网络活性
加载数据
library(tidyverse)
exprMat <- read_tsv("https://gitee.com/bioinfoFungi/testData/raw/master/SCENIC/exprMat.tsv") |>
column_to_rownames("gene") |> as.matrix()
cellInfo <- read_tsv("https://gitee.com/bioinfoFungi/testData/raw/master/SCENIC/cellInfo.tsv") |>
column_to_rownames("sample")
dir.create("int")
saveRDS(cellInfo, file="int/cellInfo.Rds")
colVars <- list(CellType=c("microglia"="forestgreen",
"endothelial-mural"="darkorange",
"astrocytes_ependymal"="magenta4",
"oligodendrocytes"="hotpink",
"interneurons"="red3",
"pyramidal CA1"="skyblue",
"pyramidal SS"="darkblue"))
colVars$CellType <- colVars$CellType[intersect(names(colVars$CellType), cellInfo$CellType)]
saveRDS(colVars, file="int/colVars.Rds")
Initialize SCENIC settings
library(SCENIC)
org <- "mgi"
dbDir <- "db"
myDatasetTitle <- "SCENIC example on Mouse brain"
dbs <- defaultDbNames[[org]]
scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10,dbIndexCol='motifs')
saveRDS(scenicOptions, file="int/scenicOptions.Rds")
遇到问题
> scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10,dbIndexCol='motifs')
Motif databases selected:
hg19-500bp-upstream-7species.mc9nr.feather
hg19-tss-centered-10kb-7species.mc9nr.feather
[1] "The index column 'motifs' is not available in the file."
[1] "The index column 'motifs' is not available in the file."
Error in eval(as.name(motifAnnotName)) :
object 'motifAnnotations_hgnc' not found
In addition: Warning message:
In initializeScenic(org = org, dbDir = dbDir, dbs = dbs, datasetTitle = myDatasetTitle, :
It was not possible to load the following databses; check whether they are downloaded correctly:
hg19-500bp-upstream-7species.mc9nr.feather
hg19-tss-centered-10kb-7species.mc9nr.feather
data(list="motifAnnotations_hgnc_v9", package="RcisTarget") motifAnnotations_hgnc <- motifAnnotations_hgnc_v9
fromgithub
Co-expression network
genesKept <- geneFiltering(exprMat, scenicOptions=scenicOptions,
minCountsPerGene=3*.01*ncol(exprMat),
minSamples=ncol(exprMat)*.01)
exprMat_filtered <- exprMat[genesKept, ]
#### Correlation
runCorrelation(exprMat_filtered, scenicOptions)
#### GENIE3
exprMat_filtered <- log2(exprMat_filtered+1)
runGenie3(exprMat_filtered, scenicOptions)
使用reticulate
从R语言调用Python中的grnboost2
library(reticulate)
use_python("/home/wangyang/anaconda3/bin/python")
use_condaenv("base")
arboreto.algo <- import("arboreto.algo")
adjacencies <- arboreto.algo$grnboost2(as.data.frame(t(as.matrix(exprMat_filtered))),
tf_names = tf_name,
seed = 2022L)
colnames(adjacencies) <- c("TF","Target","weight")
saveRDS(adjacencies,file=getIntName(scenicOptions,"genie3ll"))
Binarize the network activity
scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
scenicOptions <- runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget"))
scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_filtered)
aucellApp <- plotTsne_AUCellApp(scenicOptions, exprMat_filtered)
savedSelections <- shiny::runApp(aucellApp)
scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
Exploring/interpreting the results
Projection the AUC and TF expression onto t-SNEs
tSNE_scenic <- readRDS(tsneFileName(scenicOptions))
aucell_regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC")
# Show TF expression:
png(file="a.png")
par(mfrow=c(2,3))
AUCell::AUCell_plotTSNE(tSNE_scenic$Y, exprMat, aucell_regulonAUC[onlyNonDuplicatedExtended(rownames(aucell_regulonAUC))[c("Dlx5", "Sox10", "Sox9","Irf1", "Stat6")],], plots="Expression")
dev.off()
# Save AUC
png(file="a.png")
par(mfrow=c(4,6))
AUCell::AUCell_plotTSNE(tSNE_scenic$Y, cellsAUC=aucell_regulonAUC, plots="AUC")
dev.off()
regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC")
regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),]
regulonActivity_byCellType <- sapply(split(rownames(cellInfo), cellInfo$CellType),
function(cells) rowMeans(getAUC(regulonAUC)[,cells]))
regulonActivity_byCellType_Scaled <- t(scale(t(regulonActivity_byCellType), center = T, scale=T))
png(file="a.png")
ComplexHeatmap::Heatmap(regulonActivity_byCellType_Scaled, name="Regulon activity")
dev.off()
topRegulators <- reshape2::melt(regulonActivity_byCellType_Scaled)
colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity")
topRegulators <- topRegulators[which(topRegulators$RelativeActivity>0),]
head(topRegulators)
Regulon CellType RelativeActivity
5 Sox9 (189g) astrocytes_ependymal 1.7534026
8 Tef (405g) interneurons 0.7679025
9 Dlx5 (35g) interneurons 1.7294068
17 Irf1 (138g) microglia 1.7558155
18 Stat6 (101g) microglia 1.7880378
27 Sox8 (97g) oligodendrocytes 1.7634632
minPerc <- .7
binaryRegulonActivity <- loadInt(scenicOptions, "aucell_binary_nonDupl")
cellInfo_binarizedCells <- cellInfo[which(rownames(cellInfo)%in% colnames(binaryRegulonActivity)),, drop=FALSE]
regulonActivity_byCellType_Binarized <- sapply(split(rownames(cellInfo_binarizedCells), cellInfo_binarizedCells$CellType),
function(cells) {
rowMeans(binaryRegulonActivity[,cells, drop=FALSE])
})
binaryActPerc_subset <- regulonActivity_byCellType_Binarized[which(rowSums(regulonActivity_byCellType_Binarized>minPerc)>0),]
png(file="a.png")
ComplexHeatmap::Heatmap(binaryActPerc_subset, name="Regulon activity (%)", col = c("white","pink","red"))
dev.off()
topRegulators <- reshape2::melt(regulonActivity_byCellType_Binarized)
colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity")
topRegulators <- topRegulators[which(topRegulators$RelativeActivity>minPerc),]
head(topRegulators)
Regulon CellType RelativeActivity
5 Sox9 (189g) astrocytes_ependymal 0.9655172
8 Dlx5 (35g) interneurons 0.7906977
14 Tef (405g) interneurons 0.9767442
16 Irf1 (138g) microglia 1.0000000
20 Stat6 (101g) microglia 1.0000000
24 Sox10 (108g) oligodendrocytes 0.9090909
rss <- calcRSS(AUC=getAUC(regulonAUC), cellAnnotation=cellInfo[colnames(regulonAUC), "CellType"])
png(file="a.png")
plotRSS_oneSet(rss, setName = "interneurons")
dev.off()