markdown

最后发布时间:2023-07-04 15:11:13 浏览量:

sss
test
test
test

infer_experiment.py

  • 该程序用于“猜测”RNA-seq测序是如何配置的, 特别是 链特异性RNA-seq数据的读数是在哪一条链,通过比较strandness of readsstandness of transcripts
  • strandness of reads决定比对;standness of transcripts决定注释
  • 对于非链特异性的RNA-seq数据,strandness of readsstandness of transcripts是独立的
  • 对于链特异性的RNA-seq数据,strandness of reads在很大程度上取决于standness of transcripts,有关详细信息,请参见以下3个示例。
  • 在将读数映射到参考基因组之前,您不需要知道RNA测序方案。使用非链特异性比对RNA-seq数据,这个脚本可以“猜测”RNA-seq是哪一种链特异性。

对于双端RNA-seq,有两种不同的strand reads 方式(such as Illumina ScriptSeq protocol)

  • 1++,1–,2+-,2-+
    • read1 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
    • read1 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
    • read2 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
    • read2 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
  • 1+-,1-+,2++,2–
    • read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
    • read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
    • read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
    • read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand

对于单端RNA-seq,还有两种不同的链读方式:

  • ++,–
    • read mapped to ‘+’ strand indicates parental gene on ‘+’ strand
    • read mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
  • +-,-+
    • read mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
    • read mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

gtf转bed文件

Pair-end non strand specific

infer_experiment.py -r hg19.refseq.bed12 -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam
This is PairEnd Data
Fraction of reads failed to determine: 0.0172
Fraction of reads explained by "1++,1--,2+-,2-+": 0.4903
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4925

1.72%的reads映射到两个位置(基因组区域正链和负链都有基因);剩余 98.28% (1 - 0.0172 = 0.9828) 的reads,一半可以用1++,1–,2+-,2-+解释,一半可以用1+-,1-+,2++,2–解释。最终得出结论,这不是一个链特异性的数据集,因为strandness of reads独立于standness of transcripts

Pair-end strand specific

infer_experiment.py -r hg19.refseq.bed12 -i Pairend_StrandSpecific_51mer_Human_hg19.bam
This is PairEnd Data
Fraction of reads failed to determine: 0.0072
Fraction of reads explained by "1++,1--,2+-,2-+": 0.9441
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0487

0.72%的reads映射到两个位置(基因组区域正链和负链都有基因);剩余 99.28% (1 - 0.0072 = 0.9928)的reads,绝大多数可以用1++,1–,2+-,2-+解释。最终得出结论,这不是一个链特异性的数据集,因为strandness of reads独立于standness of transcripts,因此表明是链特异性的数据集。

Single-end strand specific

infer_experiment.py -r hg19.refseq.bed12 -i SingleEnd_StrandSpecific_36mer_Human_hg19.bam
This is SingleEnd Data
Fraction of reads failed to determine: 0.0170
Fraction of reads explained by "++,--": 0.9669
Fraction of reads explained by "+-,-+": 0.0161