展开

宏基因组软件

最后发布时间 : 2023-10-12 15:17:33 浏览量 :

I’ve had really good results with SPAdes for isolate or enrichment cultures when I’m trying to reconstruct just one or a few genomes. But when working with high diversity metagenomic samples, sometimes SPAdes can’t handle it and MEGAHIT is pretty awesome with how well it does with such a small memory footprint – and it’s insanely fast.

A practical guide to amplicon and metagenomic analysis of microbiome data

“Introduction to software for amplicon and metagenomic analysis” (Liu et al., 2021, p. 319) (pdf)

NameLinkDescription and advantagesReference
QIIMEhttp://qiime.orgThe most highly cited and comprehensive amplicon analysis pipeline, providing hundreds of scripts for analyzing various data types and visualizations(Caporaso et al., 2010)
QIIME 2https://qiime2.org

https://github.com/YongxinLiu/QIIME2ChineseManual
This next-generation amplicon pipeline provides integrated command lines and GUI, and supports reproducible analysis and big data. Provides interactive visualization and Chinese tutorial documents and videos(Bolyen et al., 2019)
USEARCHhttp://www.drive5.com/usearch

https://github.com/YongxinLiu/UsearchChineseManual
Alignment tool includes more than 200 subcommands for amplicon analysis with a small size (1 Mb), cross-platform, high-speed calculation, and free 32-bit version. The 64-bit version is commercial ($1485)(Edgar, 2010)
VSEARCHhttps://github.com/torognes/vsearchA free USEARCH-like software tool. We recommend using it alone or in addition to USEARCH. Available as a plugin in QIIME 2(Rognes et al., 2016)
Trimmomatichttp://www.usadellab.org/cms/index.php?page=trimmomaticJava based software for quality control of metagenomic raw reads(Bolger et al., 2014)
Bowtie 2http://bowtie-bio.sourceforge.net/bowtie2Rapid alignment tool used to remove host contamination or for quantification(Langmead and Salzberg, 2012)
MetaPhlAn2https://bitbucket.org/biobakery/metaphlan2Taxonomic profiling tool with a marker gene database from more than 10,000 species. The output is relative abundance of strains(Truong et al., 2015)
Kraken 2https://ccb.jhu.edu/software/kraken2A taxonomic classification tool that uses exact k-mer matches to the NCBI database, high accuracy and rapid classification, and outputs reads counts for each species(Wood et al., 2019)
HUMAnN2https://bitbucket.org/biobakery/humann2Based on the UniRef protein database, calculates gene family abundance, pathway coverage, and pathway abundance from metagenomic or metatranscriptomic data. Provide species’ contributions to a specific function(Franzosa et al., 2018)
MEGANhttps://github.com/husonlab/megan-ce

http://www-ab.informatik.uni-tuebingen.de/software/megan6
A GUI, cross-platform software for taxonomic and functional analysis of metagenomic data. Supports many types of visualizations with metadata, including scatter plot, word clouds, Voronoi tree maps, clustering, and networks(Huson et al., 2016)
MEGAHIThttps://github.com/voutcn/megahitUltra-fast and memory-efficient metagenomic assembler(Li et al., 2015)
metaSPAdeshttp://cab.spbu.ru/software/spadesHigh-quality metagenomic assembler but time-consuming and large memory requirement(Nurk et al., 2017)
MetaQUASThttp://quast.sourceforge.net/metaquastEvaluates the quality of metagenomic assemblies, including N50 and misassemble, and outputs PDF and interactive HTML reports(Mikheenko et al., 2016)
MetaGeneMarkhttp://exon.gatech.edu/GeneMark/Gene prediction in bacteria, archaea, metagenome and metatranscriptome. Support Linux/MacOSX system. Provides webserver for online analysis(Zhu et al., 2010)
Prokkahttp://www.vicbioinformatics.com/software.prokka.shtmlProvides rapid prokaryotic genome annotation, calls metaProdigal (Hyatt et al., 2012) for metagenomic gene prediction. Outputs nucleotide sequences, protein sequences, and annotation files of genes(Seemann, 2014)
CD-HIThttp://weizhongli-lab.org/cd-hitUsed to construct non-redundant gene catalogs(Fu et al., 2012)
Salmonhttps://combine-lab.github.io/salmonProvides ultra-fast quantification of reads counts of genes using a k-mer-based method(Patro et al., 2017)
metaWRAPhttps://github.com/bxlab/metaWRAPBinning pipeline includes 140 tools and supports conda install, default binning by MetaBAT, MaxBin, and CONCOCT. Provides refinement, quantification, taxonomic classification and visualization of bins(Uritskiy et al., 2018)
DAS Toolhttps://github.com/cmks/DAS_ToolBinning pipeline that integrates five binning software packages and performs refinement(Sieber et al., 2018)

A review of methods and databases for metagenomic classification and assembly

“A selection of quality control software tools for metagenomics data” (Breitwieser et al., 2019, p. 1127) (pdf)

Tool .Synopsis .Reference .Web site .
FastQCQuality control tool showing statics such as quality values, sequence length distribution and GC content distribution[33]http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQ ScreenScreen a library against sequence databases to see if composition of library matches expectations[37]http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen
BBtoolsBBDuk trims and filters reads using k-mers and entropy information. BBNorm normalizes coverage by down-sampling reads (digital normalization)[35]http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/
TrimmomaticFlexible read trimming tool for Illumina data[36]http://www.usadellab.org/cms/?page=trimmomatic
CutadaptFind and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence[34]https://cutadapt.readthedocs.io
khmer/diginormTools for k-mer error trimming of reads and digital normalization of samples[38, 39]http://khmer.readthedocs.io
MultiQCSummarize results from different analysis (such as FastQC) into one report[40]http://multiqc.info

“Metagenomic classifiers, aligners and profilers” (Breitwieser et al., 2019, p. 1129) (pdf)

Tool .Synopsis .Reference .Web site .
KrakenFast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from multiple genomes[64]https://ccb.jhu.edu/software/kraken/
Kraken-HLLExtension of Kraken counting unique k-mers for taxa and allowing multiple databases https://github.com/fbreitwieser/kraken-hll
CLARK(-S)Fast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from completed genomes. S extension uses spaced k-mer seeds for better classification[65, 66]http://clark.cs.ucr.edu
KallistoTaxonomic profiler using pseudo-alignment with k-mers using techniques based on transcript (RNA-seq) quantification[67]https://github.com/pachterlab/kallisto
k-SLAMTaxonomic classifier using database of nonoverlapping k-mers in genomes. Reads are split into k-mers, and overlaps found by lexicographical ordering are pseudo-assembled[68]https://github.com/aindj/k-SLAM
KaijuFast taxonomic classifier against protein sequences using FM-index with reduced amino acid alphabet[69]https://github.com/bioinformatics-centre/kaiju
DIAMONDProtein homology search using spaced seeds with a reduced amino acid alphabet, 2000–20 000 times faster than BLASTX[70]https://github.com/bbuchfink/diamond
BLAST+Highly sensitive nucleotide and translated-nucleotide protein alignment[61, 71]https://blast.ncbi.nlm.nih.gov
MEGAN6/CEDesktop and Web metagenomics analysis suite. Uses BLAST or diamond to match sequences and assigns LCA of matches[72, 73]http://ab.inf.uni-tuebingen.de/software/megan6/
DUDesTop-down assignment of metagenomics reads[74]https://sourceforge.net/projects/dudes/
TaxonomerWeb-based metagenomics classifier including binning and visualization[75]http://taxonomer.iobio.io/
GOTTCHATaxonomic profiler that maps reads against short unique subsequences (‘signature’) at multiple taxonomic ranks[76]http://lanl-bioinformatics.github.io/GOTTCHA/
LMAT(-ML)K-mer-based taxonomic read classifier using extensive database including draft genomes and eukaryotes. ML (Marker Library) extension reduces RAM requirements by stringent pruning of non-informative and overlapping k-mers[77, 78]https://sourceforge.net/projects/lmat/
taxator-tkUses BLAST or LAST output for binning and taxonomic assignment via overlapping regions and pairwise distance measures[79]https://github.com/fungs/taxator-tk
CentrifugeFast taxonomic classifier using database compressed with FM-index, database and output format similar to Kraken[80]http://ccb.jhu.edu/software/centrifuge/
MetaPhlAn 2Marker gene-based taxonomic profiler[81]https://bitbucket.org/biobakery/metaphlan2
mOTUTaxonomic profiler based on a set of 40 prokaryotic marker genes[82]http://www.bork.embl.de/software/mOTU/
MashMinHash-based taxonomic profiler enabling super-fast overlap estimations[83]http://mash.readthedocs.io
sourmashAlternative implementation of MinHash algorithm using fast searches with sequence bloom trees for taxonomic profiling[84]https://github.com/dib-lab/sourmash
PanPhlAnPan-genome-based phylogenomic analysis[2]http://segatalab.cibio.unitn.it/tools/panphlan/

“Tools for whole-genome assembly and metagenomics assembly” (Breitwieser et al., 2019, p. 1131) (pdf)

Tool .Synopsis .Reference .Web site .
MegahitCo-assembly of metagenomic reads with variable k-mer lengths and low memory usage[101]https://github.com/voutcn/megahit
SPAdesDBG assembler using multiple k-mers, works also for simple metagenomes[102]http://cab.spbu.ru/software/spades
MetaSPAdesExtension of SPADES with better assemblies with different abundances, conserved regions and strain mixtures[103]http://cab.spbu.ru/software/spades/
Ray MetaDBG assembler with fixed k-mer size[104]http://denovoassembler.sourceforge.net/
MetaVelvet(-SL)DBG assembler using fixed k-mer size. SL extension identifies and splits chimeric nodes[105, 106]http://metavelvet.dna.bio.keio.ac.jp
IDBA-UDDBG assembler using multiple k-mer sizes, analyzes coverages between paths to give better assemblies in complex metagenomes with uneven coverage[107]http://i.cs.hku.hk/∼alse/hkubrg/projects/idba_ud/
MetAMOSFramework for metagenomic assembly, analysis and validation[108]http://metamos.readthedocs.io
MOCAT2Pipeline for read filtering, taxonomic profiling, assembly, gene prediction and functional analysis[109]http://mocat.embl.de/
Anvi’oAnalysis and visualization platform for metagenomics assembly and binning[110]http://merenlab.org/software/anvio/
Contig binning
MaxBinEfficient binning of metagenomic contigs based on EM algorithm using nucleotide composition[111]https://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html
CONCOCTBins contigs using nucleotide composition, coverage data in multiple samples and paired-end read information[112]https://github.com/BinPro/CONCOCT
COCACOLABinning contigs in using read coverage, correlation, sequence composition and paired-end read linkage[113]https://github.com/younglululu/COCACOLA
MetaBATMetagenome binning with abundance and tetra-nucleotide frequencies[114]https://bitbucket.org/berkeleylab/metabat
VizBinVisualization of metagenomic data based on nonlinear dimension reduction[115]http://claczny.github.io/VizBin/
AbundanceBinBinning method based on k-mer frequency in reads[116]http://omics.informatics.indiana.edu/AbundanceBin/
GroopMIdentifies population genomes using differential coverage of contigs[117]http://ecogenomics.github.io/GroopM/
MetaClusterRead and contig binning in two rounds for low- and high-abundance organisms using various k-mer lengths[118, 119]http://i.cs.hku.hk/∼alse/MetaCluster/
PhyloPythiaS(+)Assigns contigs to taxonomic bin using support vector machine trained on reference sequences[120, 121]https://github.com/algbioi/ppsp/wiki
Assembly and binning quality assessment
MetaQuastEvaluate and compare metagenomics assemblies based on alignments with reference genomes[122]http://quast.sourceforge.net/metaquast
BUSCOAssess genome assembly and gene set completeness based on single-copy orthologs, also for eukaryotes[123]http://busco.ezlab.org/
CheckMTools for assessing quality of (meta)genomic assemblies providing genome completion and contamination estimates, especially for bacteria and viruses[56]http://ecogenomics.github.io/CheckM/