one vs one
Alignment
- 通常指低通量的比对,强调两条序列比对的情况;
- 双序列比对:pairwise alignment;
- Needleman-Wunsch(双序列全局比对)
- Smith-Waterman(局部序列比对)
- 多序列比对:multiple sequence alignment
one vs many
blast
many vs one
Mapping
- 通常指高通量测序结果回溯基因组的位置,强调回溯的动作;
- 序列比对:reads mapping
BWT算法
比对软件时间线
- bowtie提高了短序列mapping到基因组的速度,但是不支持gap。
- 在测序双端150的长度时,很容易测序得到indel。使用bowtie就不能贴回去,bowtie2支持gap。
- bwa mem 可以mapping更长的reads(三代测序的reads)
测试数据
HISAT2 summary stats:
Total pairs: 32617089
Aligned concordantly or discordantly 0 time: 8741165 (26.80%)
Aligned concordantly 1 time: 21568529 (66.13%)
Aligned concordantly >1 times: 1367753 (4.19%)
Aligned discordantly 1 time: 939642 (2.88%)
Total unpaired reads: 17482330
Aligned 0 time: 10036571 (57.41%)
Aligned 1 time: 6761396 (38.68%)
Aligned >1 times: 684363 (3.91%)
Overall alignment rate: 84.61%
fastp:
reads passed filters: 65.234178 M (97.655265%)
32617089*2=65234178
#!/bin/bash
if [ "$#" -ne 1 ]; then
echo -e "Incorrect number of parameters! Usage:\n index-gtf.sh <file.gtf(.gz)>" >&2
exit 1
fi
gtf="$1"
if [[ $gtf =~ \.gz$ ]]; then
output=${gtf%.gtf.gz}.sorted.gtf.gz
zcat $gtf | awk '!( $0 ~ /^#/ )' | sort --parallel=4 -S4G -k1,1 -k4,4n | bgzip -c > $output
else
output=${gtf%.gtf}.sorted.gtf.gz
cat $gtf | awk '!( $0 ~ /^#/ )' | sort --parallel=4 -S4G -k1,1 -k4,4n | bgzip -c > $output
fi
tabix $output
cat Rattus_norvegicus.Rnor_6.0.104.gtf | awk '!( $0 ~ /^#/ )' | sort --parallel=4 -S4G -k1,1 -k4,4n | > Rattus_norvegicus.Rnor_6.0.104.sorted.gtf
bgzip Rattus_norvegicus.Rnor_6.0.104.sorted.gtf
tabix Rattus_norvegicus.Rnor_6.0.104.sorted.gtf