R语言小RNA的靶基因预测
最后发布时间:2022-03-01 09:37:54
浏览量:
TargetScan
http://www.targetscan.org/cgi-bin/targetscan/data_download.vert80.cgi
targetScan <- read_tsv("/home/wangyang/Documents/miRNA/Predicted_Targets_Info.default_predictions.txt.gz")
colnames(targetScan) <- make.names(colnames(targetScan))
head(targetScan)
targetScan |>
separate_rows(miR.Family,sep="/") |>
filter(Species.ID==9606) |>
mutate(mature_miRNA=case_when(
grepl("miR",miR.Family)~paste0("hsa-",miR.Family),
T~paste0("hsa-miR-",miR.Family)
)) |>
mutate(miRNA=str_extract(mature_miRNA,"[a-z]*?-miR-[0-9 a-z]+")) |>
mutate(miRNA = str_replace(miRNA,"R","r")) |>
select(mature_miRNA,miRNA,mRNA=Gene.ID,Species.ID,UTR.start,UTR.end) |>
distinct() |>
write_tsv(file="/home/wangyang/Documents/miRNA/Predicted_Targets_Info.default_predictions.txt.processs.gz")
miRanda
http://cbio.mskcc.org/miRNA2003/miranda.html
查询的序列
>embl|AJ550546|DME550546 Drosophila melanogaster microRNA miR-bantam
gtgagatcattttgaaagctg
参考序列
>embl|U31226|DM31226 Drosophila melanogaster head involution defective protein (hid) mRNA, complete cds.
xxxxxxxxxxgaaagcgcaggagacgtgtaatcgaatgatctatagtgaaatcagctagc
ccttaagatatatgccgatctaaacatagttgtagttaaaccgtacataagtgcaacgaa
tttattgaactgcaggagcgaaagcagaaagtcattaattcgtaaacggattgttagata
cacaaacagccaacatacacgaagagtgtgcctaagattaagaaggttgacgggacacaa
gaacaatatattctatctgtctatggtaactgcatttgtatttctaaaacgaaacgaaag
ataacaatcttaactgctcaaagtaatgaaaactcttagactggcaagagactcaaatca
cacttatttttttgctgatccatatttttgtacaaccttttgagcgatatttacaaatta
tactagtacaaaaaaaagagagagagagataagcaaaagaaaactgccacttttgagata
cttttgataatctttgatttgcatttaatcatttccacacttgcattttttataaacaac
aaacaaaattacttccattgtagaacaaagtaaactgcaatttcaatgtcttcgcatttg
taattccgaattgcaagaaaaacaaaaatattttaaatatgtttaactagtagaattttt
taaacgtaagtccacaaaaacaagcacatctagctttaattgttgaaacaaaagcagaaa
aaacgcaacaaaaaaatgaatgaaaatcattaaattaattttgtatatagtttttatgcc
atttttgtgatgttttgtgtctacggtttatgtcatgttattttagttaaatttcttatg
atttatgtttatttgtaatattttttgtcattgtttgttcatcatcatattcaaattggt
ctcacaatataatagttttaagctccacgcccgggagattgatggcaaaacgattgaaat
ttggccagaagagagatagttttccccattcgtacacagtcttttttggaatgcacatta
atgatctctcacaatggaaattaatgaaaattgatctccgcagctagccaaagttaaaaa
agaaatgaagaggaaaacatattctataggcaattttcactatatgctagaatttcccgg
gcgtttcaatgctaatcgaatacagtgacatgaaagcaaacatagcgaaaatattaagaa
aatcaatcaaaaagaaagaaaaaccaattcccaaaaatcgcattgatctcatggatttat
acaatacaattacatcaaccgtttttttacaatgagaaatgttataaaaagcagaaagtg
aaacacagaaacataaacaaaaattaacgaaaagcttagatataagttcgccaagcgttt
tagttctattttctagaatgtctaagtcggtttagtgagtttattaagctgtcttcggac
acaagtttatttgtatataagcaatattatttgtgtagcctaagtgacagtcccaatcaa
atccaatccaatatcacccagtcccggacatttcccagcaaaacaatagactattctcgc
gttcacatgtatcaatcttaatttgaattaccacaaaatgaaatgaaatactaaaaccat
acacaaatgaaaaattatttttgtaaattgtttgcatcaagtgagcaaggggattagatt
aaggaatcatccttgctttatcccctgcttattgctaattagttttcacaatgatctcgg
taaagttttgtggccttgcgcccaaaagtcgtacagatttttggtttgccataaatactc
gaacaaaaagttaatgaaaaacgaagcaaatggaaaaaaaatcagaatgaaacacaagaa
atttatatttttgacccaatgctacttaatccgtttttgtaatttaagtatctttactcg
accttgtatatagcgcagttcgaatcacagaatcaaatgccatttttgtatagaatttta
tttggtgccaaaacagtgacagataattaaatgtctatgaacccgtgtatttcgcatatt
atacatttatacatatatcgtaacttcaatgataagtttgattctgaaattttgtcaact
caatttaagaaacatttctgttgtagtttagtgattgctagcagaaagcactttgtttaa
ttgtacattttatattatgctgtaatattttaatatacataaatatcattattgatctca
tgaatatgttcataagacaacaaaaattatatatatgaatacatctatgtgtatgtgtaa
ag
结果
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
miRanda v1.9 microRNA Target Scanning Algorithm
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
(c) 2003 Memorial Sloan-Kettering Cancer Center, New York
Authors: Anton Enright, Bino John, Chris Sander and Debora Marks
(mirnatargets@cbio.mskcc.org - reaches all authors)
Software written by: Anton Enright
Distributed for anyone to use under the GNU Public License (GPL),
See the files 'COPYING' and 'LICENSE' for details
If you use this software please cite:
Enright AJ, John B, Gaul U, Tuschl T, Sander C and Marks DS;
(2003) Genome Biology; 5(1):R1.
miRanda comes with ABSOLUTELY NO WARRANTY;
This is free software, and you are welcome to redistribute it
under certain conditions; type `miranda --license' for details.
Current Settings:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Query Filename: bantam_stRNA.fasta
Reference Filename: hid_UTR.fasta
Gap Open Penalty: -8.000000
Gap Extend: -2.000000
Score Threshold 50.000000
Energy Threshold -20.000000 kcal/mol
Scaling Parameter: 4.000000
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Read Sequence:embl|AJ550546|DME550546 Drosophila melanogaster microRNA miR-bantam (21 nt)
Read Sequence:embl|U31226|DM31226 Drosophila melanogaster head involution defective protein (hid) mRNA, complete cds. (2282 nt)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing Scan: embl|AJ550546|DME550546 vs embl|U31226|DM31226
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Forward: Score: 161.000000 Q:2 to 20 R:1720 to 1740 Align Len (18) (88.89%) (94.44%)
Query: 3' gTCGAAAGTTTTACTAGAGtg 5'
||:||||| |||||||||
Ref: 5' tAGTTTTCACAATGATCTCgg 3'
Energy: -24.540001 kCal/Mol
Scores for this hit:
>embl|AJ550546|DME550546 embl|U31226|DM31226 161.00 -24.54 0.00 2 20 1720 1740 18 88.89% 94.44%
Forward: Score: 154.000000 Q:1 to 22 R:883 to 905 Align Len (23) (78.26%) (82.61%)
Query: 3' GTCG-A-AAGTTTTACTAGAGTG 5'
|| | | |||||| ||:||||||
Ref: 5' CATCATATTCAAATTGGTCTCAC 3'
Energy: -20.030001 kCal/Mol
Scores for this hit:
>embl|AJ550546|DME550546 embl|U31226|DM31226 154.00 -20.03 0.00 1 22 883 905 23 78.26% 82.61%
Score for this Scan:
Seq1,Seq2,Tot Score,Tot Energy,Max Score,Max Energy,Strand,Len1,Len2,Positions
>>embl|AJ550546|DME550546 embl|U31226|DM31226 315.00 -44.57 161.00 -24.54 1 21 2282 1719 882
Complete
Run Complete
- 参数含义
- -sc 分数: 将对齐分数阈值设置为score。只有分数 >=分数的比对将用于进一步分析。
- -en 能量: 将能量阈值设置为energy。只有能量 <=能量的排列将用于进一步分析。过滤需要负值。
3'UTR下载
http://uswest.ensembl.org/biomart/