Mapping RNA-seq Reads with STAR

最后发布时间:2022-04-02 10:14:43 浏览量:

Output files

  • Log.out: 主日志文件,包含大量有关运行的详细信息,此文件对于故障排除和调试非常有用。
STAR version=2.7.10a
STAR compilation time,server,dir=2022-01-14T18:50:00-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
STAR git: On branch master ; commit ae26add7ea1724f3281ec8abedb71bcff6a4ae73 ; diff files: CHANGES.md README.md RELEASEnotes.md doc/STARmanual.pdf extras/doc-latex/STARmanual.tex extras/doc-latex/convertParDefT>
 Command Line:
STAR --genomeDir genome_index --runThreadN 15 --readFilesIn fastp_output/SRR15881871_1_good.fq.gz fastp_output/SRR15881871_2_good.fq.gz --readFilesCommand zcat --outFileNamePrefix mapping/SRR15881871/SRR1588187>
Initial USER parameters from Command Line:
outFileNamePrefix                 mapping/SRR15881871/SRR15881871_
 All USER parameters from Command Line:
genomeDir                     genome_index     ~RE-DEFINED
runThreadN                    15     ~RE-DEFINED
readFilesIn                   fastp_output/SRR15881871_1_good.fq.gz   fastp_output/SRR15881871_2_good.fq.gz        ~RE-DEFINED
readFilesCommand              zcat        ~RE-DEFINED
outFileNamePrefix             mapping/SRR15881871/SRR15881871_     ~RE-DEFINED
outSAMtype                    BAM   SortedByCoordinate        ~RE-DEFINED
outSAMstrandField             intronMotif     ~RE-DEFINED
outSAMattributes              All        ~RE-DEFINED
outFilterIntronMotifs         RemoveNoncanonical     ~RE-DEFINED
 Finished reading parameters from all sources

 Final user re-defined parameters-----------------:
runThreadN                        15
genomeDir                         genome_index
readFilesIn                       fastp_output/SRR15881871_1_good.fq.gz   fastp_output/SRR15881871_2_good.fq.gz   
readFilesCommand                  zcat   
outFileNamePrefix                 mapping/SRR15881871/SRR15881871_
outSAMtype                        BAM   SortedByCoordinate   
outSAMstrandField                 intronMotif
outSAMattributes                  All   
  • Log.progress.out:报告job进度统计信息,如已处理reads的数量、mapped reads的百分比等。每隔1分钟更新一次。
           Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
                    M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other
Mar 29 21:36:51    189.6     3160425      199    85.7%    198.9     0.3%    12.5%     0.2%     0.0%     1.3%     0.2%
Mar 29 21:37:55    214.4     7383400      199    85.7%    198.8     0.3%    12.5%     0.2%     0.0%     1.4%     0.2%
Mar 29 21:38:58    237.6    12340759      199    85.7%    198.8     0.3%    12.5%     0.2%     0.0%     1.4%     0.2%
Mar 29 21:39:59    243.6    16778478      199    85.7%    198.8     0.3%    12.5%     0.2%     0.0%     1.4%     0.2%
Mar 29 21:40:59    233.4    19966239      199    85.7%    198.8     0.3%    12.5%     0.2%     0.0%     1.4%     0.2%
Mar 29 21:42:01    222.2    22841274      199    85.6%    198.8     0.3%    12.5%     0.2%     0.0%     1.4%     0.2%
Mar 29 21:44:36    172.1    25095163      199    85.6%    198.8     0.3%    12.5%     0.2%     0.0%     1.4%     0.2%
  • Log.final.out:绘图工作完成后的绘图统计摘要,对质量控制非常有用。
                                 Started job on |       Mar 29 21:35:30
                             Started mapping on |       Mar 29 21:35:51
                                    Finished on |       Mar 29 21:44:36
       Mapping speed, Million of reads per hour |       172.08

                          Number of input reads |       25095163
                      Average input read length |       199
                                    UNIQUE READS:
                   Uniquely mapped reads number |       21489980
                        Uniquely mapped reads % |       85.63%
                          Average mapped length |       198.84
                       Number of splices: Total |       4482974
            Number of splices: Annotated (sjdb) |       4414683
                       Number of splices: GT/AG |       4446804
                       Number of splices: GC/AG |       32461
                       Number of splices: AT/AC |       3709
               Number of splices: Non-canonical |       0
                      Mismatch rate per base, % |       0.26%
                         Deletion rate per base |       0.01%
                        Deletion average length |       1.82
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.45
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       3149053
             % of reads mapped to multiple loci |       12.55%
        Number of reads mapped to too many loci |       55833
             % of reads mapped to too many loci |       0.22%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       344611
                 % of reads unmapped: too short |       1.37%
                Number of reads unmapped: other |       55686
                     % of reads unmapped: other |       0.22%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Note that STAR counts a paired-end read as one read(unlike the samtools flagstat/idxstats, which count each mate separately).

  • SJ.out.tab
    每个拼接都以拼接数计算,这将对应于以SJ求和的计数
    SJ.out.tab 包含制表符分隔格式的高置信度折叠拼接接头。请注意,Star将junction start/end点定义为intronic bases,而许多其他软件将其定义为exonic bases。这些列具有以下含义:
  • column 1: chromosome
  • column 2: first base of the intron (1-based)
  • column 3: last base of the intron (1-based)
  • column 4: strand (0: undefined, 1: +, 2: -)
  • column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT
  • column 6: 0: unannotated, 1: annotated in the splice junctions database. Note that in 2-pass mode, junctions detected in the 1st pass are reported as annotated, in addition to annotated junctions from GTF.
  • column 7: number of uniquely mapping reads crossing the junction
  • column 8: number of multi-mapping reads crossing the junction
  • column 9: maximum spliced alignment overhang
chr1    10060   10106   2       2       0       0       1       41
chr1    10060   10178   2       2       0       0       1       46
chr1    10066   10106   2       2       0       0       1       41
chr1    10066   10178   2       2       0       0       1       46
chr1    10072   10106   2       2       0       0       1       41
chr1    10072   10178   2       2       0       0       1       46
chr1    10078   10106   2       2       0       0       1       41
chr1    10078   10178   2       2       0       0       1       46
chr1    10084   10106   2       2       0       0       1       41

参考