展开

物种注释分类汇总

最后发布时间 : 2022-11-04 13:49:24 浏览量 :

OTU对应物种注释2列格式:去除sintax中置信值,只保留物种注释,替换:为_,删除引号
这里的文件otus.sintax是去除质体和非细菌后,过滤得到的

cut -f 1,4 results/文件otus.sintax是 \
    |sed 's/\td/\tk/;s/:/__/g;s/,/;/g;s/"//g' \
    > results/taxonomy2.txt
ASV_1k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Streptosporangiales;f__Thermomonosporaceae;g__Actinocorallia
ASV_2k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Pelomonas
ASV_3k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhizobacter
ASV_4k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhizobacter
ASV_8k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Streptomycetales;f__Streptomycetaceae;g__Streptomyces
ASV_6k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Piscinibacter
ASV_7k__Bacteria;p__Bacteroidetes;c__Flavobacteriia;o__Flavobacteriales;f__Flavobacteriaceae;g__Flavobacterium

OTU对应物种8列格式:注意注释是非整齐

生成物种表格OTU/ASV中空白补齐为Unassigned

awk 'BEGIN{OFS=FS="\t"}{delete a; a["k"]="Unassigned";a["p"]="Unassigned";a["c"]="Unassigned";a["o"]="Unassigned";a["f"]="Unassigned";a["g"]="Unassigned";a["s"]="Unassigned";\
    split($2,x,";");for(i in x){split(x[i],b,"__");a[b[1]]=b[2];} \
    print $1,a["k"],a["p"],a["c"],a["o"],a["f"],a["g"],a["s"];}' \
    results/taxonomy2.txt > results/matrix/otus.tax

sed 's/;/\t/g;s/.__//g;' results/matrix/otus.tax|cut -f 1-8 | \
    sed '1 s/^/OTUID\tKingdom\tPhylum\tClass\tOrder\tFamily\tGenus\tSpecies\n/' \
    > results/taxonomy.txt

taxonomy.txt

OTUIDKingdomPhylumClassOrderFamilyGenusSpecies
ASV_1BacteriaActinobacteriaActinobacteriaStreptosporangialesThermomonosporaceaeActinocoralliaUnassigned
ASV_2BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaePelomonasUnassigned
ASV_3BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaeRhizobacterUnassigned
ASV_4BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaeRhizobacterUnassigned
ASV_8BacteriaActinobacteriaActinobacteriaStreptomycetalesStreptomycetaceaeStreptomycesUnassigned
ASV_6BacteriaProteobacteriaBetaproteobacteriaBurkholderialesComamonadaceaePiscinibacterUnassigned

统计门纲目科属,使用 rank参数 p c o f g,为phylum, class, order, family, genus缩写
界(Kingdom)、门(Phylum)、纲(Class)、目(Order)、科(Family)、属(Genus)、种(Species)

mkdir -p results/tax
for i in p c o f g;do
    usearch -sintax_summary results/otus.sintax \
    -otutabin results/otutab_rare.txt -rank ${i} \
    -output results/tax/sum_${i}.txt
done
sed -i 's/(//g;s/)//g;s/\"//g;s/\#//g;s/\/Chloroplast//g' results/tax/sum_*.txt

sum_p.txt

PhylumKO1KO2KO3KO4KO5KO6OE1OE2OE3OE4OE5OE6WT1WT2WT3WT4WT5WT6All
Proteobacteria65.951.662.46374.87057.548.25156.959.557.351.661.257.456.75757.853.9
Actinobacteria26.540.327.728.717.424.329.53634.229.13129.736.13032.534.534.83228.5
Bacteroidetes3.032.677.293.054.752.123.628.078.717.43.723.535.366.034.74.114.835.755
Firmicutes1.741.961.223.230.731.525.192.812.563.371.795.20.820.621.221.020.531.113.9
Chloroflexi1.852.430.631.381.561.21.643.031.71.882.121.934.760.991.872.042.012.13.7
Acidobacteria0.30.340.190.130.220.270.950.410.550.270.720.840.330.340.920.460.130.331.4

sum_g.txt

GenusKO1KO2KO3KO4KO5KO6OE1OE2OE3OE4OE5OE6WT1WT2WT3WT4WT5WT6All
Unassigned10.610.611.39.837.3211.212.410.29.7210.510.612.39.148.8512.410.38.4110.717.1
Nocardioides1.672.442.581.841.792.291.722.81.961.62.251.751.681.452.191.81.851.832.8
Gaiella0.570.920.510.710.390.60.841.090.60.810.8310.970.740.80.80.750.662.6
Steroidobacter2.522.211.771.741.261.913.452.532.512.463.072.971.952.773.313.032.553.382.4
Acidibacter1.611.51.191.140.871.192.491.311.461.361.752.061.351.192.131.591.192.081.6
Streptomyces3.222.833.142.271.171.842.613.748.574.993.163.244.193.583.366.344.63.51.6