本文要讨论的是在研究宏基因组数据时,要解决的问题"What are they doing?"。我们很容易想到从meta'omic数据分析微生物群落的功能,包括以下几个方式:
HUMAnN通过分层搜索生成宏基因组的reads到已知或未知物种分类的基因序列映射,这些映射按照质量和序列长度进行加权,以估计单个物种或者整个群落基因家族的丰度。根据基因家族丰度可以重新组合到其他功能系统(例如 COGs, KOs, Pfam domains, 和GO terms)。最终基因家族被注释为代谢酶,对代谢酶进一步分析,以重建和量化每个物种和群落的完整的代谢途径(默认通过MetaCyc)。
宏基因组的reads
已知
未知
HUMAnN命令、数据库文件、及结果文件HUMAnN命令如下:
humann --nucleotide-database /data/databases/humann/chocophlan_v31_201901 \ --protein-database /data/databases/humann/uniref90_v201901_ec_filtered \ --threads 10 \ --remove-temp-output \ --input test.clean.gz \ --output test \ --metaphlan-options="--index mpa_vJan21_CHOCOPhlAnSGB_202103 \ --bowtie2db /data/databases/metaphlan_databases_202103"
数据库文件包括3个文件:
chocophlan_v31_201901的目录如下:
uniref90_v201901_ec_filtered的目录如下:
结果文件如下:
根据基因家族丰度可以重新组合到KO系统
KO系统
humann_regroup_table --input ${meta.id}_genefamilies.tsv -g uniref90_go -u N -p N -o ${meta.id}/go/${meta.id}.GO.txt humann_regroup_table --input ${meta.id}_genefamilies.tsv -g uniref90_ko -u N -p N -o ${meta.id}/${meta.id}.KO.txt
HUMAnN(HMP Unified Metabolic Analysis Network),用于从metagenomics或metatranscriptomics量化microbial metabolic pathways和other molecular functions
metagenomics
metatranscriptomics
microbial metabolic pathways
other molecular functions
从HUMAnN的源代码开始学习
git clone git@github.com:biobakery/humann.git cd humann pip install .
. ├── genefamilies.tsv ├── humann_temp_y68c605p ├── pathabundance.tsv ├── pathcoverage.tsv ├── eggnog │ ├── eggnog.rpkm_stratified.txt │ ├── eggnog.rpkm.txt │ ├── eggnog.rpkm_unstratified.txt │ └── eggnog.txt ├── gbm │ ├── coverage-plot.svg │ ├── Abundance-RPKs.modules │ └── GBM.txt ├── genefamilies │ ├── genefamilies.rpkm_stratified.tsv │ ├── genefamilies.rpkm.tsv │ ├── genefamilies.rpkm_unstratified.tsv │ ├── genefamilies_stratified.tsv │ └── genefamilies_unstratified.tsv ├── gmm │ ├── coverage-plot.svg │ ├── Abundance-RPKs.modules │ └── GMM.txt ├── go │ ├── go.rpkm_stratified.txt │ ├── go.rpkm.txt │ ├── go.rpkm_unstratified.txt │ └── go.txt ├── ko │ ├── ko.rpkm_stratified.txt │ ├── ko.rpkm.txt │ ├── ko.rpkm_unstratified.txt │ └── ko.txt ├── level4ec │ ├── level4ec.rpkm_stratified.txt │ ├── level4ec.rpkm.txt │ ├── level4ec.rpkm_unstratified.txt │ └── level4ec.txt ├── module │ └── Module.txt ├── pathabundance │ ├── pathabundance.rpkm_stratified.tsv │ ├── pathabundance.rpkm.tsv │ ├── pathabundance.rpkm_unstratified.tsv │ ├── pathabundance_stratified.tsv │ └── pathabundance_unstratified.tsv ├── pathway │ └── Pathway.txt ├── pfam │ ├── pfam.rpkm_stratified.txt │ ├── pfam.rpkm.txt │ ├── pfam.rpkm_unstratified.txt │ └── pfam.txt └── rxn ├── rxn.rpkm_stratified.txt ├── rxn.rpkm.txt ├── rxn.rpkm_unstratified.txt └── rxn.txt
pathabundance.tsv
pathcoverage.tsv
genefamilies.tsv