展开

HUMAnN

最后发布时间 : 2024-09-26 13:32:49 浏览量 :

本文要讨论的是在研究宏基因组数据时,要解决的问题"What are they doing?"。我们很容易想到从meta'omic数据分析微生物群落的功能,包括以下几个方式:

  • 使用balstp在蛋白数据库中搜索所有reads

HUMAnN通过分层搜索生成宏基因组的reads已知未知物种分类的基因序列映射,这些映射按照质量和序列长度进行加权,以估计单个物种或者整个群落基因家族的丰度。根据基因家族丰度可以重新组合到其他功能系统(例如 COGs, KOs, Pfam domains, 和GO terms)。最终基因家族被注释为代谢酶,对代谢酶进一步分析,以重建和量化每个物种和群落的完整的代谢途径(默认通过MetaCyc)。

HUMAnN命令、数据库文件、及结果文件
HUMAnN命令如下:

humann  --nucleotide-database /data/databases/humann/chocophlan_v31_201901 \
	--protein-database /data/databases/humann/uniref90_v201901_ec_filtered \
	--threads 10 \
	--remove-temp-output  \
	--input test.clean.gz \
	--output test   \
	--metaphlan-options="--index mpa_vJan21_CHOCOPhlAnSGB_202103     \
		--bowtie2db /data/databases/metaphlan_databases_202103"

数据库文件包括3个文件:

  • chocophlan_v31_201901
  • uniref90_v201901_ec_filtered
  • metaphlan_databases_202103

chocophlan_v31_201901的目录如下:

生信小木屋

生信小木屋

uniref90_v201901_ec_filtered的目录如下:

生信小木屋

metaphlan_databases_202103的目录如下:
生信小木屋

结果文件如下:

  • test_genefamilies.tsv
  • test_pathabundance.tsv
  • test_pathcoverage.tsv

生信小木屋

生信小木屋

生信小木屋

根据基因家族丰度可以重新组合到KO系统

humann_regroup_table --input ${meta.id}_genefamilies.tsv -g uniref90_go       -u N -p N -o   ${meta.id}/go/${meta.id}.GO.txt
humann_regroup_table --input ${meta.id}_genefamilies.tsv -g uniref90_ko       -u N -p N -o   ${meta.id}/${meta.id}.KO.txt

HUMAnN(HMP Unified Metabolic Analysis Network),用于从metagenomicsmetatranscriptomics量化microbial metabolic pathwaysother molecular functions

从HUMAnN的源代码开始学习

git clone git@github.com:biobakery/humann.git
cd humann
pip install .
.
├── genefamilies.tsv
├── humann_temp_y68c605p
├── pathabundance.tsv
├── pathcoverage.tsv
├── eggnog
│   ├── eggnog.rpkm_stratified.txt
│   ├── eggnog.rpkm.txt
│   ├── eggnog.rpkm_unstratified.txt
│   └── eggnog.txt
├── gbm
│   ├── coverage-plot.svg
│   ├── Abundance-RPKs.modules
│   └── GBM.txt
├── genefamilies
│   ├── genefamilies.rpkm_stratified.tsv
│   ├── genefamilies.rpkm.tsv
│   ├── genefamilies.rpkm_unstratified.tsv
│   ├── genefamilies_stratified.tsv
│   └── genefamilies_unstratified.tsv
├── gmm
│   ├── coverage-plot.svg
│   ├── Abundance-RPKs.modules
│   └── GMM.txt
├── go
│   ├── go.rpkm_stratified.txt
│   ├── go.rpkm.txt
│   ├── go.rpkm_unstratified.txt
│   └── go.txt
├── ko
│   ├── ko.rpkm_stratified.txt
│   ├── ko.rpkm.txt
│   ├── ko.rpkm_unstratified.txt
│   └── ko.txt
├── level4ec
│   ├── level4ec.rpkm_stratified.txt
│   ├── level4ec.rpkm.txt
│   ├── level4ec.rpkm_unstratified.txt
│   └── level4ec.txt
├── module
│   └── Module.txt
├── pathabundance
│   ├── pathabundance.rpkm_stratified.tsv
│   ├── pathabundance.rpkm.tsv
│   ├── pathabundance.rpkm_unstratified.tsv
│   ├── pathabundance_stratified.tsv
│   └── pathabundance_unstratified.tsv
├── pathway
│   └── Pathway.txt
├── pfam
│   ├── pfam.rpkm_stratified.txt
│   ├── pfam.rpkm.txt
│   ├── pfam.rpkm_unstratified.txt
│   └── pfam.txt
└── rxn
    ├── rxn.rpkm_stratified.txt
    ├── rxn.rpkm.txt
    ├── rxn.rpkm_unstratified.txt
    └── rxn.txt

pathabundance.tsv

# PathwayDM1_1_Abundance
UNMAPPED21221155.3616584
UNINTEGRATED5690441.85241983
UNINTEGRATED|g__Lachnospiraceae_unclassified.s__Lachnospiraceae_bacterium_3_1878965.418971247
UNINTEGRATED|g__Mucispirillum.s__Mucispirillum_schaedleri682884.014131736
UNINTEGRATED|unclassified564228.771907507
UNINTEGRATED|g__Muribaculum.s__Muribaculum_intestinale438537.115838629

pathcoverage.tsv

# PathwayDM1_1_Coverage
UNMAPPED1
UNINTEGRATED1
UNINTEGRATED|g__Acutalibacter.s__Acutalibacter_muris1
UNINTEGRATED|g__Adlercreutzia.s__Adlercreutzia_equolifaciens1
UNINTEGRATED|g__Akkermansia.s__Akkermansia_muciniphila1
UNINTEGRATED|g__Asaccharobacter.s__Asaccharobacter_celatus1

genefamilies.tsv

# Gene FamilyDM1_1_Abundance-RPKs
UNMAPPED57762775
UniRef90_A7V4G289245.5797634043
UniRef90_A7V4G2|g__Bacteroides.s__Bacteroides_uniformis89245.5797634043
UniRef90_A0A3A9B61739787.5705471666
UniRef90_A0A3A9B617|g__Muribaculum.s__Muribaculum_intestinale39787.5705471666
UniRef90_V2QIK634333.3333333333