Pathview: pathway based data integration and visualization

最后发布时间:2022-11-14 17:12:47 浏览量:

pathview^[1]是一个基于通路的可视化工具集。
首先安装pathview R package.

BiocManager::install("pathview")

使用内部数据快速开始

library(pathview)
data(gse16873.d)
library(pathview)
data(gse16873.d)
pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id = "04110",species = "hsa", gene.idtype ="entrez",out.suffix = "gse16873")

gene.data: either vector (single sample) or a matrix-like data (multiple sample). Vector should be numeric with gene IDs as names or it may also be character of gene IDs. Character vector is treated as discrete or count data. Matrix-like data structure has genes as rows and samples as columns. Row names should be gene IDs. Here gene ID is a generic concepts, including multiple types of gene, transcript and protein uniquely mappable to KEGG gene IDs. KEGG ortholog IDs are also treated as gene IDs as to handle metagenomic data. Check details for mappable ID types.
pathway.id: character vector, the KEGG pathway ID(s), usually 5 digit, may also include the 3 letter KEGG species code.
species: character, either the kegg code, scientific name or the common name of the target species. This applies to both pathway and gene.data or cpd.data. When KEGG ortholog pathway is considered, species="ko". Default species="hsa", it is equivalent to use either "Homo sapiens" (scientific name) or "human" (common name).
gene.idtype: character, ID type used for the gene.data, case insensitive. Default gene.idtype="entrez", i.e. Entrez Gene, which are the primary KEGG gene ID for many common model organisms. For other species, gene.idtype should be set to "KEGG" as KEGG use other types of gene IDs. For the common model organisms (to check the list, do: data(bods); bods), you may also specify other types of valid IDs. To check the ID list, do: data(gene.idtype.list); gene.idtype.list.
out.suffix: character, the suffix to be added after the pathway name as part of the output graph file. Sample names or column names of the gene.data or cpd.data are also added when there are multiple samples. Default out.suffix="pathview".

上图中红色表示，相对于对照组，基因表达上调的基因，绿色表示基因表达下调的基因；颜色越深，基因上调或下调的倍数越高。

接下来，我们查看KEGG ID为04110所对应的KEGG名称

data(paths.hsa)
paths.hsa["hsa04110"]
# hsa04110: 'Cell cycle'

gse16873.d[, 1]的数据格式如下：

head(data.frame(gse16873.d[, 1]))

# 	gse16873.d[, 1]
# 10000	-0.30764480
# 10001	0.41586805
# 10002	0.19854925
# 10003	-0.23155297
# 100048912	-0.04490724
# 10004	-0.08756237

第一列为Entrez Gene的基因id，第二列为logFC

pathview 函数的输出结果pv.out如下，其中行表示映射的基因/化合物

kegg.names	labels	all.mapped	type	x	y	width	height	mol.data	mol.col
1029	CDKN2A	1029	gene	532	124	46	17	0.129198738972622	#BEBEBE
51343	FZR1	51343	gene	919	536	46	17	-0.404325630326951	#5FDF5F
4171	MCM2	4171,4172,4173,4174,4175,4176	gene	553	556	46	17	-0.420218063479512	#5FDF5F
4998	ORC1	4998,4999,5000,5001,23594,23595	gene	494	556	46	17	0.986487281754076	#FF0000
996	CDC27	996,8697,8881,10393,25847,25906,29882,51433	gene	919	297	46	17	0.936301774095574	#FF0000
996	CDC27	996,8697,8881,10393,25847,25906,29882,51433	gene	919	519	46	17	0.936301774095574	#FF0000

kegg.names: standard KEGG IDs/Names for mapped nodes. It's Entrez Gene ID or KEGG Compound Accessions.
labels: Node labels to be used when needed.
all.mapped: All molecule (gene or compound) IDs mapped to this node.
type: node type, currently 4 types are supported: "gene","enzyme", "compound" and "ortholog".
x: x coordinate in the original KEGG pathway graph.
y: y coordinate in the original KEGG pathway graph.
width:node width in the original KEGG pathway graph.
height: node height in the original KEGG pathway graph.
other columns: columns of the mapped gene/compound data and corresponding pseudo-color codes for individual samples

Compound and gene data

In examples above, we viewed gene data with canonical signaling pathways. We frequently want to look at metabolic pathways too.Besides gene nodes, these pathways also have compound nodes. Therefore, we may integrate or visualize both gene data and compound data with metabolic pathways. Here gene data is a broad concept including genes, transcripts, protein , enzymes and their expression, modifications and any measurable attributes. Same is compound data, including metabolites, drugs, their measurements and attributes.^[1]

Here we use the breast cancer microarray dataset as gene data. We then generate simulated compound or metabolomic data, and load proper compound ID types (with sufficient number of unique entries) for demonstration.

# data(gene.idtype.list)
# gene.idtype.list
# data(bods)
# bods

http://www.bioconductor.org/packages/release/bioc/vignettes/pathview/inst/doc/pathview.pdf
↩ ↩

: admin
: 联系作者

快捷入口: 生信组学绘图思维导图浏览PDF 下载PDF

分享到：

标签

使用内部数据快速开始
Compound and gene data