去除质体和非细菌
最后发布时间 : 2022-11-04 11:23:23
浏览量 :
R脚本选择细菌古菌(真核)、去除叶绿体、线粒体并统计比例;输出筛选并排序的OTU表
输入为OTU表result/raw/otutab.txt和物种注释result/raw/otus.sintax
输出筛选并排序的特征表result/otutab.txt和
统计污染比例文件result/raw/otutab_nonBac.txt和过滤细节otus.sintax.discard
真菌ITS数据,请改用otutab_filter_nonFungi.R脚本,只筛选真菌
过滤特征表
otutab.txt
#OTU ID | KO1 | KO2 | KO3 | KO4 | KO5 | KO6 | OE1 | OE2 | OE3 | OE4 | OE5 | OE6 | WT1 | WT2 | WT3 | WT4 | WT5 | WT6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASV_1 | 382 | 671 | 282 | 438 | 381 | 390 | 476 | 590 | 500 | 360 | 490 | 357 | 799 | 830 | 567 | 664 | 508 | 519 |
ASV_10 | 107 | 129 | 231 | 264 | 372 | 240 | 70 | 65 | 82 | 105 | 138 | 103 | 138 | 201 | 83 | 94 | 189 | 93 |
ASV_100 | 22 | 34 | 4 | 6 | 10 | 22 | 43 | 20 | 27 | 23 | 23 | 38 | 19 | 12 | 28 | 16 | 22 | 27 |
ASV_1000 | 1 | 1 | 3 | 0 | 0 | 2 | 1 | 1 | 3 | 1 | 2 | 0 | 0 | 0 | 1 | 1 | 4 | 3 |
ASV_1001 | 3 | 1 | 0 | 0 | 0 | 3 | 3 | 3 | 0 | 1 | 1 | 2 | 1 | 0 | 3 | 4 | 0 | 1 |
ASV_1002 | 4 | 1 | 1 | 4 | 4 | 2 | 5 | 2 | 2 | 5 | 3 | 7 | 6 | 7 | 3 | 5 | 0 | 1 |
otus.sintax
ASV_13 | d:Bacteria(1.00),p:Bacteroidetes(1.00),c:Flavobacteriia(1.00),o:Flavobacteriales(1.00),f:Flavobacteriaceae(1.00),g:Flavobacterium(1.00) | + | d:Bacteria,p:Bacteroidetes,c:Flavobacteriia,o:Flavobacteriales,f:Flavobacteriaceae,g:Flavobacterium |
---|---|---|---|
ASV_17 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Piscinibacter(0.32) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Piscinibacter |
ASV_7 | d:Bacteria(1.00),p:Bacteroidetes(1.00),c:Flavobacteriia(1.00),o:Flavobacteriales(1.00),f:Flavobacteriaceae(1.00),g:Flavobacterium(1.00) | + | d:Bacteria,p:Bacteroidetes,c:Flavobacteriia,o:Flavobacteriales,f:Flavobacteriaceae,g:Flavobacterium |
ASV_5 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Chloroplast(1.00),f:Chloroplast(1.00),g:Streptophyta(1.00) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Chloroplast,f:Chloroplast,g:Streptophyta |
ASV_1 | d:Bacteria(1.00),p:Actinobacteria(1.00),c:Actinobacteria(1.00),o:Streptosporangiales(0.85),f:Thermomonosporaceae(0.80),g:Actinocorallia(0.48) | + | d:Bacteria,p:Actinobacteria,c:Actinobacteria,o:Streptosporangiales,f:Thermomonosporaceae,g:Actinocorallia |
ASV_2 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Pelomonas(1.00) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Pelomonas |
ASV_3 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Rhizobacter(0.92) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Rhizobacter |
Rscript script/otutab_filter_nonBac.R \
--input results/matrix/otutab.txt \
--taxonomy results/matrix/otus.sintax \
--output results/otutab.txt\
--stat results/matrix/otutab_nonBac.stat \
--discard results/matrix/otus.sintax.discard
otutab.txt
#OTUID | KO1 | KO2 | KO3 | KO4 | KO5 | KO6 | OE1 | OE2 | OE3 | OE4 | OE5 | OE6 | WT1 | WT2 | WT3 | WT4 | WT5 | WT6 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASV_1 | 382 | 671 | 282 | 438 | 381 | 390 | 476 | 590 | 500 | 360 | 490 | 357 | 799 | 830 | 567 | 664 | 508 | 519 |
ASV_2 | 671 | 412 | 863 | 793 | 1032 | 631 | 245 | 169 | 414 | 440 | 348 | 239 | 440 | 536 | 267 | 386 | 497 | 424 |
ASV_3 | 213 | 173 | 369 | 296 | 445 | 308 | 211 | 151 | 372 | 367 | 316 | 242 | 327 | 635 | 357 | 416 | 413 | 274 |
ASV_4 | 506 | 141 | 181 | 257 | 480 | 178 | 203 | 196 | 134 | 210 | 214 | 129 | 362 | 473 | 196 | 192 | 394 | 290 |
ASV_8 | 189 | 195 | 215 | 148 | 77 | 101 | 132 | 183 | 540 | 300 | 211 | 190 | 230 | 275 | 199 | 356 | 169 | 184 |
ASV_6 | 302 | 232 | 307 | 309 | 490 | 365 | 89 | 95 | 114 | 143 | 134 | 137 | 178 | 191 | 154 | 184 | 158 | 158 |
otutab_nonBac.stat
SampleID | total_reads | nonspecific_reads | chloroplast_reads | mitochondria_reads | filtered_reads |
---|---|---|---|---|---|
KO1 | 11355 | 8 | 17 | 0 | 11330 |
KO2 | 12473 | 4 | 52 | 0 | 12417 |
KO3 | 13494 | 1 | 74 | 0 | 13419 |
KO4 | 13171 | 1 | 13 | 0 | 13157 |
KO5 | 13847 | 0 | 61 | 0 | 13786 |
KO6 | 12598 | 1 | 76 | 0 | 12521 |
OE1 | 11992 | 14 | 513 | 0 | 11465 |
OE2 | 11738 | 6 | 431 | 0 | 11301 |
OE3 | 11956 | 6 | 282 | 0 | 11668 |
OE4 | 11799 | 10 | 143 | 0 | 11646 |
OE5 | 12692 | 4 | 629 | 0 | 12059 |
OE6 | 11585 | 5 | 552 | 0 | 11028 |
WT1 | 13162 | 9 | 520 | 0 | 12633 |
WT2 | 13389 | 11 | 450 | 0 | 12928 |
WT3 | 12370 | 10 | 372 | 0 | 11988 |
WT4 | 12807 | 12 | 325 | 0 | 12470 |
WT5 | 13303 | 10 | 592 | 0 | 12701 |
WT6 | 12615 | 11 | 170 | 0 | 12434 |
otus.sintax.discard
ASV_944 | d:Eukaryota(0.68),p:Plantae(0.68),c:Liliopsida(0.68),o:Poales(0.68),f:Poaceae(0.68),g:Zea(0.68) | + | d:Eukaryota,p:Plantae,c:Liliopsida,o:Poales,f:Poaceae,g:Zea |
---|---|---|---|
ASV_1428 | NA | NA | NA |
ASV_5 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Chloroplast(1.00),f:Chloroplast(1.00),g:Streptophyta(1.00) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Chloroplast,f:Chloroplast,g:Streptophyta |
ASV_137 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_IX(0.01),g:GpIX(0.01) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria |
ASV_131 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_I(0.82),g:Potamolinea(0.08) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_I |
ASV_597 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_I(1.00),g:Desikacharya(0.58) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_I,g:Desikacharya |
ASV_978 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Chloroplast(1.00),f:Chloroplast(1.00),g:Streptophyta(1.00) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Chloroplast,f:Chloroplast,g:Streptophyta |
ASV_979 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(0.97),c:Cyanobacteria(0.84),f:Family_XIII(0.12),g:Cephalothrix(0.07) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_XIII |
ASV_1147 | d:Bacteria(1.00),p:Cyanobacteria/Chloroplast(1.00),c:Cyanobacteria(1.00),f:Family_I(0.68),g:Potamolinea(0.14) | + | d:Bacteria,p:Cyanobacteria/Chloroplast,c:Cyanobacteria,f:Family_I,g:Potamolinea |
过滤特征表对应序列
cut -f 1 results/otutab.txt | tail -n+2 > results/otutab.id
usearch -fastx_getseqs results/raw/otus.fa \
-labels results/otutab.id -fastaout results/otus.fa
otus.fa
>ASV_1
GTAGTCCACGCCGTAAACGGTGGGCGCTAGATGTGGGGACCTTCCACGGTTTCTGCGTCGCAGCTAACGCATTAAGCGCC
CCGCCTGGGGAGTACGGTCGCAAGACTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTTGCTTA
ATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATCGCCGGAAAACTCGCAGAGATGCGGGGTCCTTTTGGGCCGG
TGACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCT
ATGTTGCCAGCACGCCCTTCGGGGTGGTGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGA
>ASV_2
GTAGTCCACGCCCTAAACGATGTCAACTGGTTGTTGGGAGGGTTTCTTCTCAGTAACGTAGCTAACGCGTGAAGTTGACC
GCCTGGGGAGTACGGCCGCAAGGTTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGTTTAAT
TCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCTGGAATCCTGAAGAGATTTGGGAGTGCTCGAAAGAGAGCC
AGAACACAGGTGCTGCATGGCCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGT
CATTAGTTGCTACGAAAGGGCACTCTAATGAGACTGCCGGTGACAAACCGGA
过滤特征表对应序列注释
awk 'NR==FNR{a[$1]=$0}NR>FNR{print a[$1]}'\
results/matrix/otus.sintax results/otutab.id \
> results/otus.sintax
otus.sintax
ASV_1 | d:Bacteria(1.00),p:Actinobacteria(1.00),c:Actinobacteria(1.00),o:Streptosporangiales(0.85),f:Thermomonosporaceae(0.80),g:Actinocorallia(0.48) | + | d:Bacteria,p:Actinobacteria,c:Actinobacteria,o:Streptosporangiales,f:Thermomonosporaceae,g:Actinocorallia |
---|---|---|---|
ASV_2 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Pelomonas(1.00) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Pelomonas |
ASV_3 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Rhizobacter(0.92) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Rhizobacter |
ASV_4 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(1.00),g:Rhizobacter(1.00) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Rhizobacter |
ASV_8 | d:Bacteria(1.00),p:Actinobacteria(1.00),c:Actinobacteria(1.00),o:Streptomycetales(1.00),f:Streptomycetaceae(1.00),g:Streptomyces(0.99) | + | d:Bacteria,p:Actinobacteria,c:Actinobacteria,o:Streptomycetales,f:Streptomycetaceae,g:Streptomyces |
ASV_6 | d:Bacteria(1.00),p:Proteobacteria(1.00),c:Betaproteobacteria(1.00),o:Burkholderiales(1.00),f:Comamonadaceae(0.93),g:Piscinibacter(0.28) | + | d:Bacteria,p:Proteobacteria,c:Betaproteobacteria,o:Burkholderiales,f:Comamonadaceae,g:Piscinibacter |
ASV_7 | d:Bacteria(1.00),p:Bacteroidetes(1.00),c:Flavobacteriia(1.00),o:Flavobacteriales(1.00),f:Flavobacteriaceae(1.00),g:Flavobacterium(1.00) | + | d:Bacteria,p:Bacteroidetes,c:Flavobacteriia,o:Flavobacteriales,f:Flavobacteriaceae,g:Flavobacterium |
统计OTU
usearch -otutab_stats results/otutab.txt \
-output results/otutab.stat
otutab.stat
220951 Reads (221.0k)
18 Samples
1521 OTUs
27378 Counts
5170 Count =0 (18.9%)
4891 Count =1 (17.9%)
4017 Count >=10 (14.7%)
437 OTUs found in all samples (28.7%)
609 OTUs found in 90% of samples (40.0%)
1426 OTUs found in 50% of samples (93.8%)
Sample sizes: min 11028, lo 11646, med 12434, mean 12275.1, hi 12701, max 13786
此时的得到以下文件用于后续分析
results/otutab.txt
results/otus.fa
results/otus.sintax
-
test2022-11-06 23:06:31删除 回复 取消回复