展开

SCENIC-R安装及运行

最后发布时间 : 2024-11-13 19:48:51 浏览量 :

SCENIC 的实现包括**R版本****Python版本**,本文参考SCENIC: Introduction and setup给出SCENIC R版本的安装教程及运行

淘宝商品: 转录组自动化分析流程

淘宝商品: 转录组自动化分析流程

淘宝商品: 转录组自动化分析流程

docker run --rm -it\
	-e USERID=$(id -u) -e GROUPID=$(id -g) \
	-e DISABLE_AUTH=true \
	 -p 8787:8787  \
	-w $PWD \
	-v $PWD:/home/rstudio  \
	-v $PWD:$PWD \
	--entrypoint /init  \
   registry.cn-hangzhou.aliyuncs.com/wybioinfo/scenic:1.3.1

先决条件

支持物种

SCENIC数据库支持的物种包括:human, mouse and fly。如果要将SCENIC应用于其它物种,需要手动构建**RcisTarget databases**或者使用diferent motif-enrichment-analysis tool

表达矩阵

SCENIC的输入数据是一个singlecell-RNA-seq的表达矩阵:

  • 每列对应一个样本(细胞),每行对应一个基因
  • 基因的ID是gene-symbol
  • gene-summarized counts是推荐的输入数据
TPM

However, note that some authors recommend avoiding within sample normalization (i.e. TPM) for co-expression analysis (first step of SCENIC) because they may induce artificial co-variation (Crow et al. (2016)).

软件安装

SCENIC的R版本实现是基于三个R包:

  • GENIE3: to infer the co-expression network(faster alternative: GRNBoost2)
  • RcisTarget:for the analysis of transcription factor binding motifs
  • AUCell:to identify cells with active gene sets (gene-network) in scRNA-seq data(识别具有活性基因集的细胞)

因此,需要安装以上三个包和一些额外的依赖,以运行SCIENC:

Required

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::version()

BiocManager::install(c("AUCell", "RcisTarget"))
BiocManager::install(c("GENIE3")) # Optional. Can be replaced by GRNBoost

Optional

# To score the network on cells (i.e. run AUCell):
BiocManager::install(c("zoo", "mixtools", "rbokeh"))
# For various visualizations and perform t-SNEs:
BiocManager::install(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"))
# To support paralell execution (not available in Windows):
BiocManager::install(c("doMC", "doRNG"))
# To export/visualize in http://scope.aertslab.org
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)

安装SCENIC

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("aertslab/SCENIC") 
packageVersion("SCENIC")

物种特异的数据库

下载RcisTarget物种特异的数据库(motif 排名):https://resources.aertslab.org/cistarget/。默认情况下,SCENIC使用数据库对基因启动子中的motif(TSS上游高达500bp)和TSS周围20kb(+/-10kbp)进行评分。

例如小鼠的数据库:

dbFiles <- c("https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-500bp-upstream-7species.mc9nr.feather",
"https://resources.aertslab.org/cistarget/databases/old/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-tss-centered-10kb-7species.mc9nr.feather")
# mc9nr: Motif collection version 9: 24k motifs

人类数据库

cd db
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather

加载library(SCENIC)后变量defaultDbNames

> defaultDbNames
$hgnc
                                          500bp 
   "hg19-500bp-upstream-7species.mc9nr.feather" 
                                           10kb 
"hg19-tss-centered-10kb-7species.mc9nr.feather" 

$mgi
                                         500bp 
   "mm9-500bp-upstream-7species.mc9nr.feather" 
                                          10kb 
"mm9-tss-centered-10kb-7species.mc9nr.feather" 

$dmel
                                           5kbUpTx 
"dm6-5kb-upstream-full-tx-11species.mc8nr.feather" 
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-500bp-upstream-7species.mc9nr.feather
wget https://resources.aertslab.org/cistarget/databases/old/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather

在SCENIC的R版本中目前不适用使用新版本的数据库,具体的issue见feather v1 or v2 for R package。因此为了使用R版本的issue使用的旧版本的数据库

生信小木屋

运行SCENIC

以下内容参考Running SCENIC

  • 构建基因调控网络
    • 基于共表达网络鉴定每一个TF潜在的targets
      • 过滤共表达矩阵
      • 运行GENIE3或者GRNBoost
      • 将GENIE3/GRNBoost中的targets格式化为共表达模块。
      • 基于DNA-motif的分析(RcisTarget: TF motif analysis)选择潜在的直接结合的targets(regulons)
  • 鉴定细胞的状态和其调控因子
    • 分析每个单独细胞中的网络活性
      • 计算细胞中的regulon得分(calculate AUC)
      • (可选)转换矩阵为二分类的矩阵 (binary activity matrix)
    • 基于基因调控网络的活性识别稳定的细胞状态

加载数据

library(tidyverse)
exprMat <- read_tsv("https://gitee.com/bioinfoFungi/testData/raw/master/SCENIC/exprMat.tsv") |>
    column_to_rownames("gene")  |> as.matrix()
cellInfo <- read_tsv("https://gitee.com/bioinfoFungi/testData/raw/master/SCENIC/cellInfo.tsv") |>
    column_to_rownames("sample")
dir.create("int")
saveRDS(cellInfo, file="int/cellInfo.Rds")

colVars <- list(CellType=c("microglia"="forestgreen", 
                           "endothelial-mural"="darkorange", 
                           "astrocytes_ependymal"="magenta4", 
                           "oligodendrocytes"="hotpink", 
                           "interneurons"="red3", 
                           "pyramidal CA1"="skyblue", 
                           "pyramidal SS"="darkblue"))
colVars$CellType <- colVars$CellType[intersect(names(colVars$CellType), cellInfo$CellType)]
saveRDS(colVars, file="int/colVars.Rds")

Initialize SCENIC settings

library(SCENIC)
org <- "mgi" 
dbDir <- "db"
myDatasetTitle <- "SCENIC example on Mouse brain"

dbs <- defaultDbNames[[org]]
scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10,dbIndexCol='motifs') 
saveRDS(scenicOptions, file="int/scenicOptions.Rds") 

遇到问题

> scenicOptions <- initializeScenic(org=org, dbDir=dbDir, dbs=dbs, datasetTitle=myDatasetTitle, nCores=10,dbIndexCol='motifs') 
Motif databases selected: 
  hg19-500bp-upstream-7species.mc9nr.feather 
  hg19-tss-centered-10kb-7species.mc9nr.feather
[1] "The index column 'motifs' is not available in the file."
[1] "The index column 'motifs' is not available in the file."
Error in eval(as.name(motifAnnotName)) : 
  object 'motifAnnotations_hgnc' not found
In addition: Warning message:
In initializeScenic(org = org, dbDir = dbDir, dbs = dbs, datasetTitle = myDatasetTitle,  :
  It was not possible to load the following databses; check whether they are downloaded correctly: 
hg19-500bp-upstream-7species.mc9nr.feather
hg19-tss-centered-10kb-7species.mc9nr.feather

data(list="motifAnnotations_hgnc_v9", package="RcisTarget") motifAnnotations_hgnc <- motifAnnotations_hgnc_v9 fromgithub

Co-expression network

genesKept <- geneFiltering(exprMat, scenicOptions=scenicOptions,
                           minCountsPerGene=3*.01*ncol(exprMat),
                           minSamples=ncol(exprMat)*.01)

exprMat_filtered <- exprMat[genesKept, ]
#### Correlation
runCorrelation(exprMat_filtered, scenicOptions)

#### GENIE3
exprMat_filtered <- log2(exprMat_filtered+1) 
runGenie3(exprMat_filtered, scenicOptions)

使用reticulate从R语言调用Python中的grnboost2

library(reticulate)
use_python("/home/wangyang/anaconda3/bin/python")
use_condaenv("base")
arboreto.algo <- import("arboreto.algo")
adjacencies <- arboreto.algo$grnboost2(as.data.frame(t(as.matrix(exprMat_filtered))), 
                                       tf_names = tf_name,
                                       seed = 2022L)
colnames(adjacencies) <- c("TF","Target","weight")
saveRDS(adjacencies,file=getIntName(scenicOptions,"genie3ll"))

Binarize the network activity

scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
scenicOptions <- runSCENIC_2_createRegulons(scenicOptions, coexMethod=c("top5perTarget")) 
scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_filtered)

aucellApp <- plotTsne_AUCellApp(scenicOptions, exprMat_filtered)
savedSelections <- shiny::runApp(aucellApp)

scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)

Exploring/interpreting the results

Projection the AUC and TF expression onto t-SNEs

tSNE_scenic <- readRDS(tsneFileName(scenicOptions))
aucell_regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC")

# Show TF expression:
png(file="a.png")
par(mfrow=c(2,3))
AUCell::AUCell_plotTSNE(tSNE_scenic$Y, exprMat, aucell_regulonAUC[onlyNonDuplicatedExtended(rownames(aucell_regulonAUC))[c("Dlx5", "Sox10", "Sox9","Irf1", "Stat6")],], plots="Expression")
dev.off()

生信小木屋

# Save AUC 
png(file="a.png")
par(mfrow=c(4,6))
AUCell::AUCell_plotTSNE(tSNE_scenic$Y, cellsAUC=aucell_regulonAUC, plots="AUC")
dev.off()

生信小木屋

regulonAUC <- loadInt(scenicOptions, "aucell_regulonAUC")
regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),]
regulonActivity_byCellType <- sapply(split(rownames(cellInfo), cellInfo$CellType),
                                     function(cells) rowMeans(getAUC(regulonAUC)[,cells]))
regulonActivity_byCellType_Scaled <- t(scale(t(regulonActivity_byCellType), center = T, scale=T))

png(file="a.png")
ComplexHeatmap::Heatmap(regulonActivity_byCellType_Scaled, name="Regulon activity")
dev.off()

生信小木屋

topRegulators <- reshape2::melt(regulonActivity_byCellType_Scaled)
colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity")
topRegulators <- topRegulators[which(topRegulators$RelativeActivity>0),]
head(topRegulators)
        Regulon             CellType RelativeActivity
5   Sox9 (189g) astrocytes_ependymal        1.7534026
8    Tef (405g)         interneurons        0.7679025
9    Dlx5 (35g)         interneurons        1.7294068
17  Irf1 (138g)            microglia        1.7558155
18 Stat6 (101g)            microglia        1.7880378
27   Sox8 (97g)     oligodendrocytes        1.7634632
minPerc <- .7
binaryRegulonActivity <- loadInt(scenicOptions, "aucell_binary_nonDupl")
cellInfo_binarizedCells <- cellInfo[which(rownames(cellInfo)%in% colnames(binaryRegulonActivity)),, drop=FALSE]
regulonActivity_byCellType_Binarized <- sapply(split(rownames(cellInfo_binarizedCells), cellInfo_binarizedCells$CellType),  
    function(cells) {
    rowMeans(binaryRegulonActivity[,cells, drop=FALSE])
})
binaryActPerc_subset <- regulonActivity_byCellType_Binarized[which(rowSums(regulonActivity_byCellType_Binarized>minPerc)>0),]

png(file="a.png")
ComplexHeatmap::Heatmap(binaryActPerc_subset, name="Regulon activity (%)", col = c("white","pink","red"))
dev.off()

生信小木屋

topRegulators <- reshape2::melt(regulonActivity_byCellType_Binarized)
colnames(topRegulators) <- c("Regulon", "CellType", "RelativeActivity")
topRegulators <- topRegulators[which(topRegulators$RelativeActivity>minPerc),]
head(topRegulators)
        Regulon             CellType RelativeActivity
5   Sox9 (189g) astrocytes_ependymal        0.9655172
8    Dlx5 (35g)         interneurons        0.7906977
14   Tef (405g)         interneurons        0.9767442
16  Irf1 (138g)            microglia        1.0000000
20 Stat6 (101g)            microglia        1.0000000
24 Sox10 (108g)     oligodendrocytes        0.9090909
rss <- calcRSS(AUC=getAUC(regulonAUC), cellAnnotation=cellInfo[colnames(regulonAUC), "CellType"])
png(file="a.png")
plotRSS_oneSet(rss, setName = "interneurons")
dev.off()

生信小木屋