R语言小RNA的靶基因预测

最后发布时间:2022-03-01 09:37:54 浏览量:

TargetScan

图片alt

图片alt


http://www.targetscan.org/cgi-bin/targetscan/data_download.vert80.cgi

targetScan <- read_tsv("/home/wangyang/Documents/miRNA/Predicted_Targets_Info.default_predictions.txt.gz")
colnames(targetScan) <- make.names(colnames(targetScan))
head(targetScan)
targetScan |>
    separate_rows(miR.Family,sep="/") |> 
    filter(Species.ID==9606) |>
    mutate(mature_miRNA=case_when(
                grepl("miR",miR.Family)~paste0("hsa-",miR.Family),
                T~paste0("hsa-miR-",miR.Family)
                )) |>
    mutate(miRNA=str_extract(mature_miRNA,"[a-z]*?-miR-[0-9 a-z]+")) |>
    mutate(miRNA = str_replace(miRNA,"R","r")) |>
    select(mature_miRNA,miRNA,mRNA=Gene.ID,Species.ID,UTR.start,UTR.end) |>
    distinct() |>
    write_tsv(file="/home/wangyang/Documents/miRNA/Predicted_Targets_Info.default_predictions.txt.processs.gz")

miRanda

http://cbio.mskcc.org/miRNA2003/miranda.html
查询的序列

>embl|AJ550546|DME550546 Drosophila melanogaster microRNA miR-bantam 
gtgagatcattttgaaagctg

参考序列

>embl|U31226|DM31226	Drosophila melanogaster		head involution defective protein (hid) mRNA, complete cds. 
xxxxxxxxxxgaaagcgcaggagacgtgtaatcgaatgatctatagtgaaatcagctagc
ccttaagatatatgccgatctaaacatagttgtagttaaaccgtacataagtgcaacgaa
tttattgaactgcaggagcgaaagcagaaagtcattaattcgtaaacggattgttagata
cacaaacagccaacatacacgaagagtgtgcctaagattaagaaggttgacgggacacaa
gaacaatatattctatctgtctatggtaactgcatttgtatttctaaaacgaaacgaaag
ataacaatcttaactgctcaaagtaatgaaaactcttagactggcaagagactcaaatca
cacttatttttttgctgatccatatttttgtacaaccttttgagcgatatttacaaatta
tactagtacaaaaaaaagagagagagagataagcaaaagaaaactgccacttttgagata
cttttgataatctttgatttgcatttaatcatttccacacttgcattttttataaacaac
aaacaaaattacttccattgtagaacaaagtaaactgcaatttcaatgtcttcgcatttg
taattccgaattgcaagaaaaacaaaaatattttaaatatgtttaactagtagaattttt
taaacgtaagtccacaaaaacaagcacatctagctttaattgttgaaacaaaagcagaaa
aaacgcaacaaaaaaatgaatgaaaatcattaaattaattttgtatatagtttttatgcc
atttttgtgatgttttgtgtctacggtttatgtcatgttattttagttaaatttcttatg
atttatgtttatttgtaatattttttgtcattgtttgttcatcatcatattcaaattggt
ctcacaatataatagttttaagctccacgcccgggagattgatggcaaaacgattgaaat
ttggccagaagagagatagttttccccattcgtacacagtcttttttggaatgcacatta
atgatctctcacaatggaaattaatgaaaattgatctccgcagctagccaaagttaaaaa
agaaatgaagaggaaaacatattctataggcaattttcactatatgctagaatttcccgg
gcgtttcaatgctaatcgaatacagtgacatgaaagcaaacatagcgaaaatattaagaa
aatcaatcaaaaagaaagaaaaaccaattcccaaaaatcgcattgatctcatggatttat
acaatacaattacatcaaccgtttttttacaatgagaaatgttataaaaagcagaaagtg
aaacacagaaacataaacaaaaattaacgaaaagcttagatataagttcgccaagcgttt
tagttctattttctagaatgtctaagtcggtttagtgagtttattaagctgtcttcggac
acaagtttatttgtatataagcaatattatttgtgtagcctaagtgacagtcccaatcaa
atccaatccaatatcacccagtcccggacatttcccagcaaaacaatagactattctcgc
gttcacatgtatcaatcttaatttgaattaccacaaaatgaaatgaaatactaaaaccat
acacaaatgaaaaattatttttgtaaattgtttgcatcaagtgagcaaggggattagatt
aaggaatcatccttgctttatcccctgcttattgctaattagttttcacaatgatctcgg
taaagttttgtggccttgcgcccaaaagtcgtacagatttttggtttgccataaatactc
gaacaaaaagttaatgaaaaacgaagcaaatggaaaaaaaatcagaatgaaacacaagaa
atttatatttttgacccaatgctacttaatccgtttttgtaatttaagtatctttactcg
accttgtatatagcgcagttcgaatcacagaatcaaatgccatttttgtatagaatttta
tttggtgccaaaacagtgacagataattaaatgtctatgaacccgtgtatttcgcatatt
atacatttatacatatatcgtaacttcaatgataagtttgattctgaaattttgtcaact
caatttaagaaacatttctgttgtagtttagtgattgctagcagaaagcactttgtttaa
ttgtacattttatattatgctgtaatattttaatatacataaatatcattattgatctca
tgaatatgttcataagacaacaaaaattatatatatgaatacatctatgtgtatgtgtaa
ag

结果

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
miRanda v1.9    microRNA Target Scanning Algorithm
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
(c) 2003 Memorial Sloan-Kettering Cancer Center, New York

Authors: Anton Enright, Bino John, Chris Sander and Debora Marks
(mirnatargets@cbio.mskcc.org - reaches all authors)

Software written by: Anton Enright
Distributed for anyone to use under the GNU Public License (GPL),
See the files 'COPYING' and 'LICENSE' for details

If you use this software please cite:
Enright AJ, John B, Gaul U, Tuschl T, Sander C and Marks DS;
(2003) Genome Biology; 5(1):R1.

   miRanda comes with ABSOLUTELY NO WARRANTY;
   This is free software, and you are welcome to redistribute it
   under certain conditions; type `miranda --license' for details.

Current Settings:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Query Filename:		bantam_stRNA.fasta
Reference Filename:	hid_UTR.fasta
Gap Open Penalty:	-8.000000
Gap Extend:		-2.000000
Score Threshold		50.000000
Energy Threshold	-20.000000 kcal/mol
Scaling Parameter:	4.000000
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Read Sequence:embl|AJ550546|DME550546  Drosophila melanogaster microRNA miR-bantam (21 nt)
Read Sequence:embl|U31226|DM31226  Drosophila melanogaster  head involution defective protein (hid) mRNA, complete cds. (2282 nt)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing Scan: embl|AJ550546|DME550546 vs embl|U31226|DM31226
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

   Forward:	Score: 161.000000  Q:2 to 20  R:1720 to 1740 Align Len (18) (88.89%) (94.44%)

   Query:    3' gTCGAAAGTTTTACTAGAGtg 5'
                 ||:||||| |||||||||  
   Ref:      5' tAGTTTTCACAATGATCTCgg 3'

   Energy:  -24.540001 kCal/Mol

Scores for this hit:
>embl|AJ550546|DME550546	embl|U31226|DM31226	161.00	-24.54	0.00	2 20	1720 1740	18	88.89%	94.44%


   Forward:	Score: 154.000000  Q:1 to 22  R:883 to 905 Align Len (23) (78.26%) (82.61%)

   Query:    3' GTCG-A-AAGTTTTACTAGAGTG 5'
                || | | |||||| ||:||||||
   Ref:      5' CATCATATTCAAATTGGTCTCAC 3'

   Energy:  -20.030001 kCal/Mol

Scores for this hit:
>embl|AJ550546|DME550546	embl|U31226|DM31226	154.00	-20.03	0.00	1 22	883 905	23	78.26%	82.61%

Score for this Scan:
Seq1,Seq2,Tot Score,Tot Energy,Max Score,Max Energy,Strand,Len1,Len2,Positions
>>embl|AJ550546|DME550546	embl|U31226|DM31226	315.00	-44.57	161.00	-24.54	1	21	2282	 1719 882
Complete

Run Complete
  • 参数含义
    • -sc 分数: 将对齐分数阈值设置为score。只有分数 >=分数的比对将用于进一步分析。
    • -en 能量: 将能量阈值设置为energy。只有能量 <=能量的排列将用于进一步分析。过滤需要负值。

3'UTR下载

http://uswest.ensembl.org/biomart/

图片alt

图片alt

参考

https://www.jianshu.com/p/6fbfebbc818b/