PDF预览

infer_experiment.py

sss
test
test
test

infer_experiment.py

该程序用于“猜测”RNA-seq测序是如何配置的，特别是链特异性RNA-seq数据的读数是在哪一条链，通过比较strandness of reads和standness of transcripts。
strandness of reads决定比对；standness of transcripts决定注释
对于非链特异性的RNA-seq数据，strandness of reads和standness of transcripts是独立的
对于链特异性的RNA-seq数据，strandness of reads在很大程度上取决于standness of transcripts，有关详细信息，请参见以下3个示例。
在将读数映射到参考基因组之前，您不需要知道RNA测序方案。使用非链特异性比对RNA-seq数据，这个脚本可以“猜测”RNA-seq是哪一种链特异性。

对于双端RNA-seq，有两种不同的strand reads 方式(such as Illumina ScriptSeq protocol)

1++,1–,2+-,2-+
- read1 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
- read1 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
- read2 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
- read2 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
1+-,1-+,2++,2–
- read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
- read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
- read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
- read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand

对于单端RNA-seq，还有两种不同的链读方式：

++,–
- read mapped to ‘+’ strand indicates parental gene on ‘+’ strand
- read mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
+-,-+
- read mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
- read mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

gtf转bed文件

Pair-end non strand specific

infer_experiment.py -r hg19.refseq.bed12 -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam

This is PairEnd Data
Fraction of reads failed to determine: 0.0172
Fraction of reads explained by "1++,1--,2+-,2-+": 0.4903
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4925

1.72%的reads映射到两个位置(基因组区域正链和负链都有基因)；剩余 98.28% (1 - 0.0172 = 0.9828) 的reads，一半可以用1++,1–,2+-,2-+解释，一半可以用1+-,1-+,2++,2–解释。最终得出结论，这不是一个链特异性的数据集，因为strandness of reads独立于standness of transcripts

Pair-end strand specific

infer_experiment.py -r hg19.refseq.bed12 -i Pairend_StrandSpecific_51mer_Human_hg19.bam

This is PairEnd Data
Fraction of reads failed to determine: 0.0072
Fraction of reads explained by "1++,1--,2+-,2-+": 0.9441
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0487

0.72%的reads映射到两个位置(基因组区域正链和负链都有基因)；剩余 99.28% (1 - 0.0072 = 0.9928)的reads，绝大多数可以用1++,1–,2+-,2-+解释。最终得出结论，这不是一个链特异性的数据集，因为strandness of reads独立于standness of transcripts，因此表明是链特异性的数据集。

Single-end strand specific

infer_experiment.py -r hg19.refseq.bed12 -i SingleEnd_StrandSpecific_36mer_Human_hg19.bam

This is SingleEnd Data
Fraction of reads failed to determine: 0.0170
Fraction of reads explained by "++,--": 0.9669
Fraction of reads explained by "+-,-+": 0.0161