HUMAnN
最后发布时间 : 2024-09-26 13:32:49
浏览量 :
本文要讨论的是在研究宏基因组数据时,要解决的问题"What are they doing?"。我们很容易想到从meta'omic数据分析微生物群落的功能,包括以下几个方式:
- 使用balstp在蛋白数据库中搜索所有reads
HUMAnN通过分层搜索生成宏基因组的reads
到已知
或未知
物种分类的基因序列映射,这些映射按照质量和序列长度进行加权,以估计单个物种或者整个群落基因家族的丰度。根据基因家族丰度可以重新组合到其他功能系统(例如 COGs, KOs, Pfam domains, 和GO terms)。最终基因家族被注释为代谢酶,对代谢酶进一步分析,以重建和量化每个物种和群落的完整的代谢途径(默认通过MetaCyc)。
HUMAnN命令、数据库文件、及结果文件
HUMAnN命令如下:
humann --nucleotide-database /data/databases/humann/chocophlan_v31_201901 \
--protein-database /data/databases/humann/uniref90_v201901_ec_filtered \
--threads 10 \
--remove-temp-output \
--input test.clean.gz \
--output test \
--metaphlan-options="--index mpa_vJan21_CHOCOPhlAnSGB_202103 \
--bowtie2db /data/databases/metaphlan_databases_202103"
数据库文件包括3个文件:
- chocophlan_v31_201901
- uniref90_v201901_ec_filtered
- metaphlan_databases_202103
chocophlan_v31_201901的目录如下:
uniref90_v201901_ec_filtered的目录如下:
metaphlan_databases_202103的目录如下:
结果文件如下:
- test_genefamilies.tsv
- test_pathabundance.tsv
- test_pathcoverage.tsv
根据基因家族丰度可以重新组合到KO系统
humann_regroup_table --input ${meta.id}_genefamilies.tsv -g uniref90_go -u N -p N -o ${meta.id}/go/${meta.id}.GO.txt
humann_regroup_table --input ${meta.id}_genefamilies.tsv -g uniref90_ko -u N -p N -o ${meta.id}/${meta.id}.KO.txt
HUMAnN(HMP Unified Metabolic Analysis Network),用于从metagenomics
或metatranscriptomics
量化microbial metabolic pathways
和other molecular functions
从HUMAnN的源代码开始学习
git clone git@github.com:biobakery/humann.git
cd humann
pip install .
.
├── genefamilies.tsv
├── humann_temp_y68c605p
├── pathabundance.tsv
├── pathcoverage.tsv
├── eggnog
│ ├── eggnog.rpkm_stratified.txt
│ ├── eggnog.rpkm.txt
│ ├── eggnog.rpkm_unstratified.txt
│ └── eggnog.txt
├── gbm
│ ├── coverage-plot.svg
│ ├── Abundance-RPKs.modules
│ └── GBM.txt
├── genefamilies
│ ├── genefamilies.rpkm_stratified.tsv
│ ├── genefamilies.rpkm.tsv
│ ├── genefamilies.rpkm_unstratified.tsv
│ ├── genefamilies_stratified.tsv
│ └── genefamilies_unstratified.tsv
├── gmm
│ ├── coverage-plot.svg
│ ├── Abundance-RPKs.modules
│ └── GMM.txt
├── go
│ ├── go.rpkm_stratified.txt
│ ├── go.rpkm.txt
│ ├── go.rpkm_unstratified.txt
│ └── go.txt
├── ko
│ ├── ko.rpkm_stratified.txt
│ ├── ko.rpkm.txt
│ ├── ko.rpkm_unstratified.txt
│ └── ko.txt
├── level4ec
│ ├── level4ec.rpkm_stratified.txt
│ ├── level4ec.rpkm.txt
│ ├── level4ec.rpkm_unstratified.txt
│ └── level4ec.txt
├── module
│ └── Module.txt
├── pathabundance
│ ├── pathabundance.rpkm_stratified.tsv
│ ├── pathabundance.rpkm.tsv
│ ├── pathabundance.rpkm_unstratified.tsv
│ ├── pathabundance_stratified.tsv
│ └── pathabundance_unstratified.tsv
├── pathway
│ └── Pathway.txt
├── pfam
│ ├── pfam.rpkm_stratified.txt
│ ├── pfam.rpkm.txt
│ ├── pfam.rpkm_unstratified.txt
│ └── pfam.txt
└── rxn
├── rxn.rpkm_stratified.txt
├── rxn.rpkm.txt
├── rxn.rpkm_unstratified.txt
└── rxn.txt
pathabundance.tsv
# Pathway | DM1_1_Abundance |
---|---|
UNMAPPED | 21221155.3616584 |
UNINTEGRATED | 5690441.85241983 |
UNINTEGRATED|g__Lachnospiraceae_unclassified.s__Lachnospiraceae_bacterium_3_1 | 878965.418971247 |
UNINTEGRATED|g__Mucispirillum.s__Mucispirillum_schaedleri | 682884.014131736 |
UNINTEGRATED|unclassified | 564228.771907507 |
UNINTEGRATED|g__Muribaculum.s__Muribaculum_intestinale | 438537.115838629 |
pathcoverage.tsv
# Pathway | DM1_1_Coverage |
---|---|
UNMAPPED | 1 |
UNINTEGRATED | 1 |
UNINTEGRATED|g__Acutalibacter.s__Acutalibacter_muris | 1 |
UNINTEGRATED|g__Adlercreutzia.s__Adlercreutzia_equolifaciens | 1 |
UNINTEGRATED|g__Akkermansia.s__Akkermansia_muciniphila | 1 |
UNINTEGRATED|g__Asaccharobacter.s__Asaccharobacter_celatus | 1 |
genefamilies.tsv
# Gene Family | DM1_1_Abundance-RPKs |
---|---|
UNMAPPED | 57762775 |
UniRef90_A7V4G2 | 89245.5797634043 |
UniRef90_A7V4G2|g__Bacteroides.s__Bacteroides_uniformis | 89245.5797634043 |
UniRef90_A0A3A9B617 | 39787.5705471666 |
UniRef90_A0A3A9B617|g__Muribaculum.s__Muribaculum_intestinale | 39787.5705471666 |
UniRef90_V2QIK6 | 34333.3333333333 |