展开

去除质体和非细菌

最后发布时间 : 2022-11-04 11:23:23 浏览量 :

R脚本选择细菌古菌(真核)、去除叶绿体、线粒体并统计比例;输出筛选并排序的OTU表
输入为OTU表result/raw/otutab.txt和物种注释result/raw/otus.sintax
输出筛选并排序的特征表result/otutab.txt和
统计污染比例文件result/raw/otutab_nonBac.txt和过滤细节otus.sintax.discard
真菌ITS数据,请改用otutab_filter_nonFungi.R脚本,只筛选真菌

过滤特征表

otutab.txt

#OTU IDKO1KO2KO3KO4KO5KO6OE1OE2OE3OE4OE5OE6WT1WT2WT3WT4WT5WT6
ASV_1382671282438381390476590500360490357799830567664508519
ASV_10107129231264372240706582105138103138201839418993
ASV_1002234461022432027232338191228162227
ASV_1000113002113120001143
ASV_1001310003330112103401
ASV_1002411442522537673501

otus.sintax

ASV_13d:Bacteria(1.00),p:Bacteroidetes(1.00),c:Flavobacteriia(1.00),o:Flavobacteriales(1.00),f:Flavobacteriaceae(1.00),g:Flavobacterium(1.00)+d:Bacteria,p:Bacteroidetes,c:Flavobacteriia,o:Flavobacteriales,f:Flavobacteriaceae,g:Flavobacterium
ASV_17d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Piscinibacter(0.32)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Piscinibacter
ASV_7d:Bacteria(1.00),p:Bacteroidetes(1.00),c:Flavobacteriia(1.00),o:Flavobacteriales(1.00),f:Flavobacteriaceae(1.00),g:Flavobacterium(1.00)+d:Bacteria,p:Bacteroidetes,c:Flavobacteriia,o:Flavobacteriales,f:Flavobacteriaceae,g:Flavobacterium
ASV_5d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Chloroplast(1.00),f:Chloroplast(1.00),g:Streptophyta(1.00)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Chloroplast,f:Chloroplast,g:Streptophyta
ASV_1d:Bacteria(1.00),p:Actinobacteria(1.00),c:Actinobacteria(1.00),o:Streptosporangiales(0.85),f:Thermomonosporaceae(0.80),g:Actinocorallia(0.48)+d:Bacteria,p:Actinobacteria,c:Actinobacteria,o:Streptosporangiales,f:Thermomonosporaceae,g:Actinocorallia
ASV_2d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Pelomonas(1.00)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Pelomonas
ASV_3d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Rhizobacter(0.92)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Rhizobacter
Rscript script/otutab_filter_nonBac.R \
      --input results/matrix/otutab.txt \
      --taxonomy results/matrix/otus.sintax \
      --output results/otutab.txt\
      --stat results/matrix/otutab_nonBac.stat \
      --discard results/matrix/otus.sintax.discard

otutab.txt

#OTUIDKO1KO2KO3KO4KO5KO6OE1OE2OE3OE4OE5OE6WT1WT2WT3WT4WT5WT6
ASV_1382671282438381390476590500360490357799830567664508519
ASV_26714128637931032631245169414440348239440536267386497424
ASV_3213173369296445308211151372367316242327635357416413274
ASV_4506141181257480178203196134210214129362473196192394290
ASV_818919521514877101132183540300211190230275199356169184
ASV_63022323073094903658995114143134137178191154184158158

otutab_nonBac.stat

SampleIDtotal_readsnonspecific_readschloroplast_readsmitochondria_readsfiltered_reads
KO111355817011330
KO212473452012417
KO313494174013419
KO413171113013157
KO513847061013786
KO612598176012521
OE11199214513011465
OE2117386431011301
OE3119566282011668
OE41179910143011646
OE5126924629012059
OE6115855552011028
WT1131629520012633
WT21338911450012928
WT31237010372011988
WT41280712325012470
WT51330310592012701
WT61261511170012434

otus.sintax.discard

ASV_944d:Eukaryota(0.68),p:Plantae(0.68),c:Liliopsida(0.68),o:Poales(0.68),f:Poaceae(0.68),g:Zea(0.68)+d:Eukaryota,p:Plantae,c:Liliopsida,o:Poales,f:Poaceae,g:Zea
ASV_1428NANANA
ASV_5d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Chloroplast(1.00),f:Chloroplast(1.00),g:Streptophyta(1.00)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Chloroplast,f:Chloroplast,g:Streptophyta
ASV_137d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_IX(0.01),g:GpIX(0.01)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria
ASV_131d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_I(0.82),g:Potamolinea(0.08)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_I
ASV_597d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_I(1.00),g:Desikacharya(0.58)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_I,g:Desikacharya
ASV_978d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Chloroplast(1.00),f:Chloroplast(1.00),g:Streptophyta(1.00)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Chloroplast,f:Chloroplast,g:Streptophyta
ASV_979d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(0.97),c:Cyanobacteria(0.84),f:Family_XIII(0.12),g:Cephalothrix(0.07)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_XIII
ASV_1147d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_I(0.68),g:Potamolinea(0.14)+d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_I,g:Potamolinea

过滤特征表对应序列

cut -f 1 results/otutab.txt | tail -n+2 > results/otutab.id
usearch -fastx_getseqs results/raw/otus.fa \
    -labels results/otutab.id -fastaout results/otus.fa

otus.fa

>ASV_1
GTAGTCCACGCCGTAAACGGTGGGCGCTAGATGTGGGGACCTTCCACGGTTTCTGCGTCGCAGCTAACGCATTAAGCGCC
CCGCCTGGGGAGTACGGTCGCAAGACTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTTGCTTA
ATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATCGCCGGAAAACTCGCAGAGATGCGGGGTCCTTTTGGGCCGG
TGACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCT
ATGTTGCCAGCACGCCCTTCGGGGTGGTGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGA
>ASV_2
GTAGTCCACGCCCTAAACGATGTCAACTGGTTGTTGGGAGGGTTTCTTCTCAGTAACGTAGCTAACGCGTGAAGTTGACC
GCCTGGGGAGTACGGCCGCAAGGTTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGTTTAAT
TCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCTGGAATCCTGAAGAGATTTGGGAGTGCTCGAAAGAGAGCC
AGAACACAGGTGCTGCATGGCCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGT
CATTAGTTGCTACGAAAGGGCACTCTAATGAGACTGCCGGTGACAAACCGGA

过滤特征表对应序列注释

awk 'NR==FNR{a[$1]=$0}NR>FNR{print a[$1]}'\
    results/matrix/otus.sintax results/otutab.id \
    > results/otus.sintax

otus.sintax

ASV_1d:Bacteria(1.00),p:Actinobacteria(1.00),c:Actinobacteria(1.00),o:Streptosporangiales(0.85),f:Thermomonosporaceae(0.80),g:Actinocorallia(0.48)+d:Bacteria,p:Actinobacteria,c:Actinobacteria,o:Streptosporangiales,f:Thermomonosporaceae,g:Actinocorallia
ASV_2d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Pelomonas(1.00)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Pelomonas
ASV_3d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Rhizobacter(0.92)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Rhizobacter
ASV_4d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Rhizobacter(1.00)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Rhizobacter
ASV_8d:Bacteria(1.00),p:Actinobacteria(1.00),c:Actinobacteria(1.00),o:Streptomycetales(1.00),f:Streptomycetaceae(1.00),g:Streptomyces(0.99)+d:Bacteria,p:Actinobacteria,c:Actinobacteria,o:Streptomycetales,f:Streptomycetaceae,g:Streptomyces
ASV_6d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(0.93),g:Piscinibacter(0.28)+d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Piscinibacter
ASV_7d:Bacteria(1.00),p:Bacteroidetes(1.00),c:Flavobacteriia(1.00),o:Flavobacteriales(1.00),f:Flavobacteriaceae(1.00),g:Flavobacterium(1.00)+d:Bacteria,p:Bacteroidetes,c:Flavobacteriia,o:Flavobacteriales,f:Flavobacteriaceae,g:Flavobacterium

统计OTU

usearch -otutab_stats results/otutab.txt \
    -output results/otutab.stat

otutab.stat

  220951  Reads (221.0k)
        18  Samples
      1521  OTUs

     27378  Counts
      5170  Count  =0  (18.9%)
      4891  Count  =1  (17.9%)
      4017  Count >=10 (14.7%)

       437  OTUs found in all samples (28.7%)
       609  OTUs found in 90% of samples (40.0%)
      1426  OTUs found in 50% of samples (93.8%)

Sample sizes: min 11028, lo 11646, med 12434, mean 12275.1, hi 12701, max 13786

此时的得到以下文件用于后续分析

results/otutab.txt
results/otus.fa
results/otus.sintax