宏基因组软件
最后发布时间 : 2023-10-12 15:17:33
浏览量 :
I’ve had really good results with SPAdes
for isolate or enrichment cultures when I’m trying to reconstruct just one or a few genomes. But when working with high diversity metagenomic samples, sometimes SPAdes can’t handle it and MEGAHIT
is pretty awesome with how well it does with such a small memory footprint – and it’s insanely fast.
A practical guide to amplicon and metagenomic analysis of microbiome data
“Introduction to software for amplicon and metagenomic analysis” (Liu et al., 2021, p. 319) (pdf)
Name | Link | Description and advantages | Reference |
---|---|---|---|
QIIME | http://qiime.org | The most highly cited and comprehensive amplicon analysis pipeline, providing hundreds of scripts for analyzing various data types and visualizations | (Caporaso et al., 2010) |
QIIME 2 | https://qiime2.org https://github.com/YongxinLiu/QIIME2ChineseManual | This next-generation amplicon pipeline provides integrated command lines and GUI, and supports reproducible analysis and big data. Provides interactive visualization and Chinese tutorial documents and videos | (Bolyen et al., 2019) |
USEARCH | http://www.drive5.com/usearch https://github.com/YongxinLiu/UsearchChineseManual | Alignment tool includes more than 200 subcommands for amplicon analysis with a small size (1 Mb), cross-platform, high-speed calculation, and free 32-bit version. The 64-bit version is commercial ($1485) | (Edgar, 2010) |
VSEARCH | https://github.com/torognes/vsearch | A free USEARCH-like software tool. We recommend using it alone or in addition to USEARCH. Available as a plugin in QIIME 2 | (Rognes et al., 2016) |
Trimmomatic | http://www.usadellab.org/cms/index.php?page=trimmomatic | Java based software for quality control of metagenomic raw reads | (Bolger et al., 2014) |
Bowtie 2 | http://bowtie-bio.sourceforge.net/bowtie2 | Rapid alignment tool used to remove host contamination or for quantification | (Langmead and Salzberg, 2012) |
MetaPhlAn2 | https://bitbucket.org/biobakery/metaphlan2 | Taxonomic profiling tool with a marker gene database from more than 10,000 species. The output is relative abundance of strains | (Truong et al., 2015) |
Kraken 2 | https://ccb.jhu.edu/software/kraken2 | A taxonomic classification tool that uses exact k-mer matches to the NCBI database, high accuracy and rapid classification, and outputs reads counts for each species | (Wood et al., 2019) |
HUMAnN2 | https://bitbucket.org/biobakery/humann2 | Based on the UniRef protein database, calculates gene family abundance, pathway coverage, and pathway abundance from metagenomic or metatranscriptomic data. Provide species’ contributions to a specific function | (Franzosa et al., 2018) |
MEGAN | https://github.com/husonlab/megan-ce http://www-ab.informatik.uni-tuebingen.de/software/megan6 | A GUI, cross-platform software for taxonomic and functional analysis of metagenomic data. Supports many types of visualizations with metadata, including scatter plot, word clouds, Voronoi tree maps, clustering, and networks | (Huson et al., 2016) |
MEGAHIT | https://github.com/voutcn/megahit | Ultra-fast and memory-efficient metagenomic assembler | (Li et al., 2015) |
metaSPAdes | http://cab.spbu.ru/software/spades | High-quality metagenomic assembler but time-consuming and large memory requirement | (Nurk et al., 2017) |
MetaQUAST | http://quast.sourceforge.net/metaquast | Evaluates the quality of metagenomic assemblies, including N50 and misassemble, and outputs PDF and interactive HTML reports | (Mikheenko et al., 2016) |
MetaGeneMark | http://exon.gatech.edu/GeneMark/ | Gene prediction in bacteria, archaea, metagenome and metatranscriptome. Support Linux/MacOSX system. Provides webserver for online analysis | (Zhu et al., 2010) |
Prokka | http://www.vicbioinformatics.com/software.prokka.shtml | Provides rapid prokaryotic genome annotation, calls metaProdigal (Hyatt et al., 2012) for metagenomic gene prediction. Outputs nucleotide sequences, protein sequences, and annotation files of genes | (Seemann, 2014) |
CD-HIT | http://weizhongli-lab.org/cd-hit | Used to construct non-redundant gene catalogs | (Fu et al., 2012) |
Salmon | https://combine-lab.github.io/salmon | Provides ultra-fast quantification of reads counts of genes using a k-mer-based method | (Patro et al., 2017) |
metaWRAP | https://github.com/bxlab/metaWRAP | Binning pipeline includes 140 tools and supports conda install, default binning by MetaBAT, MaxBin, and CONCOCT. Provides refinement, quantification, taxonomic classification and visualization of bins | (Uritskiy et al., 2018) |
DAS Tool | https://github.com/cmks/DAS_Tool | Binning pipeline that integrates five binning software packages and performs refinement | (Sieber et al., 2018) |
A review of methods and databases for metagenomic classification and assembly
“A selection of quality control software tools for metagenomics data” (Breitwieser et al., 2019, p. 1127) (pdf)
Tool . | Synopsis . | Reference . | Web site . |
---|---|---|---|
FastQC | Quality control tool showing statics such as quality values, sequence length distribution and GC content distribution | [33] | http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |
FastQ Screen | Screen a library against sequence databases to see if composition of library matches expectations | [37] | http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen |
BBtools | BBDuk trims and filters reads using k-mers and entropy information. BBNorm normalizes coverage by down-sampling reads (digital normalization) | [35] | http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/ |
Trimmomatic | Flexible read trimming tool for Illumina data | [36] | http://www.usadellab.org/cms/?page=trimmomatic |
Cutadapt | Find and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence | [34] | https://cutadapt.readthedocs.io |
khmer/diginorm | Tools for k-mer error trimming of reads and digital normalization of samples | [38, 39] | http://khmer.readthedocs.io |
MultiQC | Summarize results from different analysis (such as FastQC) into one report | [40] | http://multiqc.info |
“Metagenomic classifiers, aligners and profilers” (Breitwieser et al., 2019, p. 1129) (pdf)
Tool . | Synopsis . | Reference . | Web site . |
---|---|---|---|
Kraken | Fast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from multiple genomes | [64] | https://ccb.jhu.edu/software/kraken/ |
Kraken-HLL | Extension of Kraken counting unique k-mers for taxa and allowing multiple databases | https://github.com/fbreitwieser/kraken-hll | |
CLARK(-S) | Fast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from completed genomes. S extension uses spaced k-mer seeds for better classification | [65, 66] | http://clark.cs.ucr.edu |
Kallisto | Taxonomic profiler using pseudo-alignment with k-mers using techniques based on transcript (RNA-seq) quantification | [67] | https://github.com/pachterlab/kallisto |
k-SLAM | Taxonomic classifier using database of nonoverlapping k-mers in genomes. Reads are split into k-mers, and overlaps found by lexicographical ordering are pseudo-assembled | [68] | https://github.com/aindj/k-SLAM |
Kaiju | Fast taxonomic classifier against protein sequences using FM-index with reduced amino acid alphabet | [69] | https://github.com/bioinformatics-centre/kaiju |
DIAMOND | Protein homology search using spaced seeds with a reduced amino acid alphabet, 2000–20 000 times faster than BLASTX | [70] | https://github.com/bbuchfink/diamond |
BLAST+ | Highly sensitive nucleotide and translated-nucleotide protein alignment | [61, 71] | https://blast.ncbi.nlm.nih.gov |
MEGAN6/CE | Desktop and Web metagenomics analysis suite. Uses BLAST or diamond to match sequences and assigns LCA of matches | [72, 73] | http://ab.inf.uni-tuebingen.de/software/megan6/ |
DUDes | Top-down assignment of metagenomics reads | [74] | https://sourceforge.net/projects/dudes/ |
Taxonomer | Web-based metagenomics classifier including binning and visualization | [75] | http://taxonomer.iobio.io/ |
GOTTCHA | Taxonomic profiler that maps reads against short unique subsequences (‘signature’) at multiple taxonomic ranks | [76] | http://lanl-bioinformatics.github.io/GOTTCHA/ |
LMAT(-ML) | K-mer-based taxonomic read classifier using extensive database including draft genomes and eukaryotes. ML (Marker Library) extension reduces RAM requirements by stringent pruning of non-informative and overlapping k-mers | [77, 78] | https://sourceforge.net/projects/lmat/ |
taxator-tk | Uses BLAST or LAST output for binning and taxonomic assignment via overlapping regions and pairwise distance measures | [79] | https://github.com/fungs/taxator-tk |
Centrifuge | Fast taxonomic classifier using database compressed with FM-index, database and output format similar to Kraken | [80] | http://ccb.jhu.edu/software/centrifuge/ |
MetaPhlAn 2 | Marker gene-based taxonomic profiler | [81] | https://bitbucket.org/biobakery/metaphlan2 |
mOTU | Taxonomic profiler based on a set of 40 prokaryotic marker genes | [82] | http://www.bork.embl.de/software/mOTU/ |
Mash | MinHash-based taxonomic profiler enabling super-fast overlap estimations | [83] | http://mash.readthedocs.io |
sourmash | Alternative implementation of MinHash algorithm using fast searches with sequence bloom trees for taxonomic profiling | [84] | https://github.com/dib-lab/sourmash |
PanPhlAn | Pan-genome-based phylogenomic analysis | [2] | http://segatalab.cibio.unitn.it/tools/panphlan/ |
“Tools for whole-genome assembly and metagenomics assembly” (Breitwieser et al., 2019, p. 1131) (pdf)
Tool . | Synopsis . | Reference . | Web site . |
---|---|---|---|
Megahit | Co-assembly of metagenomic reads with variable k-mer lengths and low memory usage | [101] | https://github.com/voutcn/megahit |
SPAdes | DBG assembler using multiple k-mers, works also for simple metagenomes | [102] | http://cab.spbu.ru/software/spades |
MetaSPAdes | Extension of SPADES with better assemblies with different abundances, conserved regions and strain mixtures | [103] | http://cab.spbu.ru/software/spades/ |
Ray Meta | DBG assembler with fixed k-mer size | [104] | http://denovoassembler.sourceforge.net/ |
MetaVelvet(-SL) | DBG assembler using fixed k-mer size. SL extension identifies and splits chimeric nodes | [105, 106] | http://metavelvet.dna.bio.keio.ac.jp |
IDBA-UD | DBG assembler using multiple k-mer sizes, analyzes coverages between paths to give better assemblies in complex metagenomes with uneven coverage | [107] | http://i.cs.hku.hk/∼alse/hkubrg/projects/idba_ud/ |
MetAMOS | Framework for metagenomic assembly, analysis and validation | [108] | http://metamos.readthedocs.io |
MOCAT2 | Pipeline for read filtering, taxonomic profiling, assembly, gene prediction and functional analysis | [109] | http://mocat.embl.de/ |
Anvi’o | Analysis and visualization platform for metagenomics assembly and binning | [110] | http://merenlab.org/software/anvio/ |
Contig binning | |||
MaxBin | Efficient binning of metagenomic contigs based on EM algorithm using nucleotide composition | [111] | https://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html |
CONCOCT | Bins contigs using nucleotide composition, coverage data in multiple samples and paired-end read information | [112] | https://github.com/BinPro/CONCOCT |
COCACOLA | Binning contigs in using read coverage, correlation, sequence composition and paired-end read linkage | [113] | https://github.com/younglululu/COCACOLA |
MetaBAT | Metagenome binning with abundance and tetra-nucleotide frequencies | [114] | https://bitbucket.org/berkeleylab/metabat |
VizBin | Visualization of metagenomic data based on nonlinear dimension reduction | [115] | http://claczny.github.io/VizBin/ |
AbundanceBin | Binning method based on k-mer frequency in reads | [116] | http://omics.informatics.indiana.edu/AbundanceBin/ |
GroopM | Identifies population genomes using differential coverage of contigs | [117] | http://ecogenomics.github.io/GroopM/ |
MetaCluster | Read and contig binning in two rounds for low- and high-abundance organisms using various k-mer lengths | [118, 119] | http://i.cs.hku.hk/∼alse/MetaCluster/ |
PhyloPythiaS(+) | Assigns contigs to taxonomic bin using support vector machine trained on reference sequences | [120, 121] | https://github.com/algbioi/ppsp/wiki |
Assembly and binning quality assessment | |||
MetaQuast | Evaluate and compare metagenomics assemblies based on alignments with reference genomes | [122] | http://quast.sourceforge.net/metaquast |
BUSCO | Assess genome assembly and gene set completeness based on single-copy orthologs, also for eukaryotes | [123] | http://busco.ezlab.org/ |
CheckM | Tools for assessing quality of (meta)genomic assemblies providing genome completion and contamination estimates, especially for bacteria and viruses | [56] | http://ecogenomics.github.io/CheckM/ |