宏基因组软件

最后发布时间 : 2023-10-12 15:17:33 浏览量 :

I’ve had really good results with SPAdes for isolate or enrichment cultures when I’m trying to reconstruct just one or a few genomes. But when working with high diversity metagenomic samples, sometimes SPAdes can’t handle it and MEGAHIT is pretty awesome with how well it does with such a small memory footprint – and it’s insanely fast.

A practical guide to amplicon and metagenomic analysis of microbiome data

“Introduction to software for amplicon and metagenomic analysis” (Liu et al., 2021, p. 319) (pdf)

Name	Link	Description and advantages	Reference
QIIME	http://qiime.org	The most highly cited and comprehensive amplicon analysis pipeline, providing hundreds of scripts for analyzing various data types and visualizations	(Caporaso et al., 2010)
QIIME 2	https://qiime2.org https://github.com/YongxinLiu/QIIME2ChineseManual	This next-generation amplicon pipeline provides integrated command lines and GUI, and supports reproducible analysis and big data. Provides interactive visualization and Chinese tutorial documents and videos	(Bolyen et al., 2019)
USEARCH	http://www.drive5.com/usearch https://github.com/YongxinLiu/UsearchChineseManual	Alignment tool includes more than 200 subcommands for amplicon analysis with a small size (1 Mb), cross-platform, high-speed calculation, and free 32-bit version. The 64-bit version is commercial ($1485)	(Edgar, 2010)
VSEARCH	https://github.com/torognes/vsearch	A free USEARCH-like software tool. We recommend using it alone or in addition to USEARCH. Available as a plugin in QIIME 2	(Rognes et al., 2016)
Trimmomatic	http://www.usadellab.org/cms/index.php?page=trimmomatic	Java based software for quality control of metagenomic raw reads	(Bolger et al., 2014)
Bowtie 2	http://bowtie-bio.sourceforge.net/bowtie2	Rapid alignment tool used to remove host contamination or for quantification	(Langmead and Salzberg, 2012)
MetaPhlAn2	https://bitbucket.org/biobakery/metaphlan2	Taxonomic profiling tool with a marker gene database from more than 10,000 species. The output is relative abundance of strains	(Truong et al., 2015)
Kraken 2	https://ccb.jhu.edu/software/kraken2	A taxonomic classification tool that uses exact k-mer matches to the NCBI database, high accuracy and rapid classification, and outputs reads counts for each species	(Wood et al., 2019)
HUMAnN2	https://bitbucket.org/biobakery/humann2	Based on the UniRef protein database, calculates gene family abundance, pathway coverage, and pathway abundance from metagenomic or metatranscriptomic data. Provide species’ contributions to a specific function	(Franzosa et al., 2018)
MEGAN	https://github.com/husonlab/megan-ce http://www-ab.informatik.uni-tuebingen.de/software/megan6	A GUI, cross-platform software for taxonomic and functional analysis of metagenomic data. Supports many types of visualizations with metadata, including scatter plot, word clouds, Voronoi tree maps, clustering, and networks	(Huson et al., 2016)
MEGAHIT	https://github.com/voutcn/megahit	Ultra-fast and memory-efficient metagenomic assembler	(Li et al., 2015)
metaSPAdes	http://cab.spbu.ru/software/spades	High-quality metagenomic assembler but time-consuming and large memory requirement	(Nurk et al., 2017)
MetaQUAST	http://quast.sourceforge.net/metaquast	Evaluates the quality of metagenomic assemblies, including N50 and misassemble, and outputs PDF and interactive HTML reports	(Mikheenko et al., 2016)
MetaGeneMark	http://exon.gatech.edu/GeneMark/	Gene prediction in bacteria, archaea, metagenome and metatranscriptome. Support Linux/MacOSX system. Provides webserver for online analysis	(Zhu et al., 2010)
Prokka	http://www.vicbioinformatics.com/software.prokka.shtml	Provides rapid prokaryotic genome annotation, calls metaProdigal (Hyatt et al., 2012) for metagenomic gene prediction. Outputs nucleotide sequences, protein sequences, and annotation files of genes	(Seemann, 2014)
CD-HIT	http://weizhongli-lab.org/cd-hit	Used to construct non-redundant gene catalogs	(Fu et al., 2012)
Salmon	https://combine-lab.github.io/salmon	Provides ultra-fast quantification of reads counts of genes using a k-mer-based method	(Patro et al., 2017)
metaWRAP	https://github.com/bxlab/metaWRAP	Binning pipeline includes 140 tools and supports conda install, default binning by MetaBAT, MaxBin, and CONCOCT. Provides refinement, quantification, taxonomic classification and visualization of bins	(Uritskiy et al., 2018)
DAS Tool	https://github.com/cmks/DAS_Tool	Binning pipeline that integrates five binning software packages and performs refinement	(Sieber et al., 2018)

A review of methods and databases for metagenomic classification and assembly

“A selection of quality control software tools for metagenomics data” (Breitwieser et al., 2019, p. 1127) (pdf)

Tool .	Synopsis .	Reference .	Web site .
FastQC	Quality control tool showing statics such as quality values, sequence length distribution and GC content distribution	[33]	http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQ Screen	Screen a library against sequence databases to see if composition of library matches expectations	[37]	http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen
BBtools	BBDuk trims and filters reads using k-mers and entropy information. BBNorm normalizes coverage by down-sampling reads (digital normalization)	[35]	http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/
Trimmomatic	Flexible read trimming tool for Illumina data	[36]	http://www.usadellab.org/cms/?page=trimmomatic
Cutadapt	Find and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence	[34]	https://cutadapt.readthedocs.io
khmer/diginorm	Tools for k-mer error trimming of reads and digital normalization of samples	[38, 39]	http://khmer.readthedocs.io
MultiQC	Summarize results from different analysis (such as FastQC) into one report	[40]	http://multiqc.info

“Metagenomic classifiers, aligners and profilers” (Breitwieser et al., 2019, p. 1129) (pdf)

Tool .	Synopsis .	Reference .	Web site .
Kraken	Fast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from multiple genomes	[64]	https://ccb.jhu.edu/software/kraken/
Kraken-HLL	Extension of Kraken counting unique k-mers for taxa and allowing multiple databases		https://github.com/fbreitwieser/kraken-hll
CLARK(-S)	Fast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from completed genomes. S extension uses spaced k-mer seeds for better classification	[65, 66]	http://clark.cs.ucr.edu
Kallisto	Taxonomic profiler using pseudo-alignment with k-mers using techniques based on transcript (RNA-seq) quantification	[67]	https://github.com/pachterlab/kallisto
k-SLAM	Taxonomic classifier using database of nonoverlapping k-mers in genomes. Reads are split into k-mers, and overlaps found by lexicographical ordering are pseudo-assembled	[68]	https://github.com/aindj/k-SLAM
Kaiju	Fast taxonomic classifier against protein sequences using FM-index with reduced amino acid alphabet	[69]	https://github.com/bioinformatics-centre/kaiju
DIAMOND	Protein homology search using spaced seeds with a reduced amino acid alphabet, 2000–20 000 times faster than BLASTX	[70]	https://github.com/bbuchfink/diamond
BLAST+	Highly sensitive nucleotide and translated-nucleotide protein alignment	[61, 71]	https://blast.ncbi.nlm.nih.gov
MEGAN6/CE	Desktop and Web metagenomics analysis suite. Uses BLAST or diamond to match sequences and assigns LCA of matches	[72, 73]	http://ab.inf.uni-tuebingen.de/software/megan6/
DUDes	Top-down assignment of metagenomics reads	[74]	https://sourceforge.net/projects/dudes/
Taxonomer	Web-based metagenomics classifier including binning and visualization	[75]	http://taxonomer.iobio.io/
GOTTCHA	Taxonomic profiler that maps reads against short unique subsequences (‘signature’) at multiple taxonomic ranks	[76]	http://lanl-bioinformatics.github.io/GOTTCHA/
LMAT(-ML)	K-mer-based taxonomic read classifier using extensive database including draft genomes and eukaryotes. ML (Marker Library) extension reduces RAM requirements by stringent pruning of non-informative and overlapping k-mers	[77, 78]	https://sourceforge.net/projects/lmat/
taxator-tk	Uses BLAST or LAST output for binning and taxonomic assignment via overlapping regions and pairwise distance measures	[79]	https://github.com/fungs/taxator-tk
Centrifuge	Fast taxonomic classifier using database compressed with FM-index, database and output format similar to Kraken	[80]	http://ccb.jhu.edu/software/centrifuge/
MetaPhlAn 2	Marker gene-based taxonomic profiler	[81]	https://bitbucket.org/biobakery/metaphlan2
mOTU	Taxonomic profiler based on a set of 40 prokaryotic marker genes	[82]	http://www.bork.embl.de/software/mOTU/
Mash	MinHash-based taxonomic profiler enabling super-fast overlap estimations	[83]	http://mash.readthedocs.io
sourmash	Alternative implementation of MinHash algorithm using fast searches with sequence bloom trees for taxonomic profiling	[84]	https://github.com/dib-lab/sourmash
PanPhlAn	Pan-genome-based phylogenomic analysis	[2]	http://segatalab.cibio.unitn.it/tools/panphlan/

“Tools for whole-genome assembly and metagenomics assembly” (Breitwieser et al., 2019, p. 1131) (pdf)

Tool .	Synopsis .	Reference .	Web site .
Megahit	Co-assembly of metagenomic reads with variable k-mer lengths and low memory usage	[101]	https://github.com/voutcn/megahit
SPAdes	DBG assembler using multiple k-mers, works also for simple metagenomes	[102]	http://cab.spbu.ru/software/spades
MetaSPAdes	Extension of SPADES with better assemblies with different abundances, conserved regions and strain mixtures	[103]	http://cab.spbu.ru/software/spades/
Ray Meta	DBG assembler with fixed k-mer size	[104]	http://denovoassembler.sourceforge.net/
MetaVelvet(-SL)	DBG assembler using fixed k-mer size. SL extension identifies and splits chimeric nodes	[105, 106]	http://metavelvet.dna.bio.keio.ac.jp
IDBA-UD	DBG assembler using multiple k-mer sizes, analyzes coverages between paths to give better assemblies in complex metagenomes with uneven coverage	[107]	http://i.cs.hku.hk/∼alse/hkubrg/projects/idba_ud/
MetAMOS	Framework for metagenomic assembly, analysis and validation	[108]	http://metamos.readthedocs.io
MOCAT2	Pipeline for read filtering, taxonomic profiling, assembly, gene prediction and functional analysis	[109]	http://mocat.embl.de/
Anvi’o	Analysis and visualization platform for metagenomics assembly and binning	[110]	http://merenlab.org/software/anvio/
*Contig binning*
MaxBin	Efficient binning of metagenomic contigs based on EM algorithm using nucleotide composition	[111]	https://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html
CONCOCT	Bins contigs using nucleotide composition, coverage data in multiple samples and paired-end read information	[112]	https://github.com/BinPro/CONCOCT
COCACOLA	Binning contigs in using read coverage, correlation, sequence composition and paired-end read linkage	[113]	https://github.com/younglululu/COCACOLA
MetaBAT	Metagenome binning with abundance and tetra-nucleotide frequencies	[114]	https://bitbucket.org/berkeleylab/metabat
VizBin	Visualization of metagenomic data based on nonlinear dimension reduction	[115]	http://claczny.github.io/VizBin/
AbundanceBin	Binning method based on k-mer frequency in reads	[116]	http://omics.informatics.indiana.edu/AbundanceBin/
GroopM	Identifies population genomes using differential coverage of contigs	[117]	http://ecogenomics.github.io/GroopM/
MetaCluster	Read and contig binning in two rounds for low- and high-abundance organisms using various k-mer lengths	[118, 119]	http://i.cs.hku.hk/∼alse/MetaCluster/
PhyloPythiaS(+)	Assigns contigs to taxonomic bin using support vector machine trained on reference sequences	[120, 121]	https://github.com/algbioi/ppsp/wiki
*Assembly and binning quality assessment*
MetaQuast	Evaluate and compare metagenomics assemblies based on alignments with reference genomes	[122]	http://quast.sourceforge.net/metaquast
BUSCO	Assess genome assembly and gene set completeness based on single-copy orthologs, also for eukaryotes	[123]	http://busco.ezlab.org/
CheckM	Tools for assessing quality of (meta)genomic assemblies providing genome completion and contamination estimates, especially for bacteria and viruses	[56]	http://ecogenomics.github.io/CheckM/

病原微生物检测寻找保守序列