Salmon
- https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html
- https://bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf
Tools that have been found to be most accurate for this step in the analysis are referred to as lightweight alignment tools, which include Kallisto, Sailfish and Salmon; each working slightly different from one another. We will focus on Salmon for this workshop, which is the successor of Sailfish. However, Kallisto is an equally good choice with similar performance metrics for speed and accuracy.
Common to all of these tools is that base-to-base alignment of the reads is avoided, which is the time-consuming step of older splice-aware alignment tools such as STAR and HISAT2. These lightweight alignment tools provide quantification estimates much faster than older tools (typically more than 20 times faster) with improvements in accuracy [1]. These transcript expression estimates, often referred to as ‘pseudocounts’ or ‘abundance estimates’, can be aggregated to the gene level for use with differential gene expression tools like DESeq2 or the estimates can be used directly for isoform-level differential expression using a tool like Sleuth.
mkdir salmon_index
salmon index \
-t ${testData}/RNA-seq/genomic/Homo_sapiens.GRCh38.cdna.all.fa \
-i salmon_index \
-k 31
salmon quant -i salmon_index \
-l A \
-1 ${testData}/RNA-seq/reads/HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq.gz \
-2 ${testData}/RNA-seq/reads/HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq.gz \
-o salmon_output \
--useVBOpt \
--seqBias \
--validateMappings
测试数据
https://www.hadriengourle.com/tutorials/rna/#indexing-transcriptome
curl -O -J -L https://osf.io/7zepj/download
tar xzf toy_rna.tar.gz
cd toy_rna
使用文件gentrome.fa
创建salmon索引?
https://github.com/COMBINE-lab/SalmonTools/blob/master/README.md