展开

one vs one

Alignment

  • 通常指低通量的比对,强调两条序列比对的情况;
  • 双序列比对:pairwise alignment;
    • Needleman-Wunsch(双序列全局比对)
    • Smith-Waterman(局部序列比对)
  • 多序列比对:multiple sequence alignment

one vs many

blast

many vs one

Mapping

  • 通常指高通量测序结果回溯基因组的位置,强调回溯的动作;
  • 序列比对:reads mapping

BWT算法

比对软件时间线

生信小木屋

  • bowtie提高了短序列mapping到基因组的速度,但是不支持gap。
  • 在测序双端150的长度时,很容易测序得到indel。使用bowtie就不能贴回去,bowtie2支持gap。
  • bwa mem 可以mapping更长的reads(三代测序的reads)

测试数据

HISAT2 summary stats:
	Total pairs: 32617089
		Aligned concordantly or discordantly 0 time: 8741165 (26.80%)
		Aligned concordantly 1 time: 21568529 (66.13%)
		Aligned concordantly >1 times: 1367753 (4.19%)
		Aligned discordantly 1 time: 939642 (2.88%)
	Total unpaired reads: 17482330
		Aligned 0 time: 10036571 (57.41%)
		Aligned 1 time: 6761396 (38.68%)
		Aligned >1 times: 684363 (3.91%)
	Overall alignment rate: 84.61%

fastp:
reads passed filters:	65.234178 M (97.655265%)
32617089*2=65234178


#!/bin/bash                                                                                                                                    

if [ "$#" -ne 1 ]; then
  echo -e "Incorrect number of parameters! Usage:\n    index-gtf.sh <file.gtf(.gz)>" >&2
  exit 1
fi

gtf="$1"

if [[ $gtf =~ \.gz$ ]]; then
  output=${gtf%.gtf.gz}.sorted.gtf.gz
  zcat $gtf | awk '!( $0 ~ /^#/ )' | sort --parallel=4 -S4G -k1,1 -k4,4n | bgzip -c > $output
else
  output=${gtf%.gtf}.sorted.gtf.gz
  cat $gtf | awk '!( $0 ~ /^#/ )' | sort --parallel=4 -S4G -k1,1 -k4,4n | bgzip -c > $output
fi
tabix $output

cat Rattus_norvegicus.Rnor_6.0.104.gtf  | awk '!( $0 ~ /^#/ )' | sort --parallel=4 -S4G -k1,1 -k4,4n | >   Rattus_norvegicus.Rnor_6.0.104.sorted.gtf
bgzip  Rattus_norvegicus.Rnor_6.0.104.sorted.gtf 
tabix  Rattus_norvegicus.Rnor_6.0.104.sorted.gtf