310 likes | 416 Views
RiboSearch. Ben Daniel Ariel Kirshner Naomi Instructor : Dr. Danny Barash Adaya Cohen. Introduction. Biological Introduction Method Layout “The merge strategy” Results and Conclusions. RNA. A single-stranded nucleic acid made up of 4 nucleotides :
E N D
RiboSearch Ben Daniel Ariel Kirshner Naomi Instructor : Dr. Danny Barash Adaya Cohen
Introduction • Biological Introduction • Method Layout • “The merge strategy” • Results and Conclusions
RNA A single-stranded nucleic acid made up of 4 nucleotides : Purines : adenine (A), guanine (G) Pyramidines: cytosine (C), and uracil (U). WC pairs: A-U G-C
DNA RNA Protein IntroductionBiological Old scheme • Protein carry out all biological functions • RNA : only a stage between DNA to protein with no catalytic function
Biological introduction New scheme • Since the discovery of self-splicing RNAs in the early 1980’s, a number of new structural and catalytic RNAs have been discovered. • Recent studies focusing on non-coding and small RNAs have led to discovery of RNA molecules that posses essential regulatory functions DNA RNA Protein
The secondary structure of many RNAs is usually more conserved than their sequence RNA Secondary Structure • Hairpin • Internal loop • Bulge loop • Junction • Stem (double strand) • pseudoknot
5’ UTR 3’ UTR Coding section Aptamer 3’ 5’ Expression platform Riboswitch • RNA control elements that regulates gene expression, without the participation of proteins • Utilize a unique mechanism where by small molecules bind to aptamer/box region causing a conformational switch • Were found initially in 5’ UTR of bacteria with successive discoveries in prokaryotes • There are evidence suggesting riboswitches could be found in eukaryotes.
Riboswitch mechanism Guanine bind to aptamer region with cause conformational change in the expression platform, which regulates the guanine metabolism.
G-box • Regulates genes related to purine metabolism and transport • Binds purines • Consists of 2 hairpins and 1 internal junction
RiboSearch Goal • Finding G-box in eukaryotic genomes Method • Combining existing search methods into one overall package
Search Methods • Whiffer – CS department, BGU • RNAMotif – Macke et al. , 2001 • RNAProfile – Pavesi et al. , 2004 • STR2 – CS department, BGU
Whiffer Input • Pattern that consists of : • Sequence information • Variable gaps • Base pairing brackets representing WC pairs Output • Candidates locations that meet constraints imposed by the method <<<< [2] TA [5] GTNTCTAC [3] <<<<< [3] CCNNNAA [3] >>>>> [5] >>>>
Whiffer Method • Uses simple matching ,based on the constraints ,as opposed to dynamic programming.
RNAMotif Input • Database of nucleotide sequences • Description file that consists of: • Descriptor section • Score section (optional) Output • Candidates that meet the conditions of the descriptor and the scoring scheme
RNAMotif Sample descriptor file : descr h5 (minlen=6, maxlen=8) ss (minlen=4, maxlen=6) h3 score { gcnt = 0; glen = 0; for( i = 1; i <= NSE; i++ ){ llen=length( se[i] ); glen=glen+llen; for( j = 1; j <= glen; j++ ){ b = se[i,j,1]; if( b == "g" || b == "c" ) gcnt++; { { SCORE = 1.0 * gcnt / glen; if( SCORE < .4 ) REJECT; } ss h5 h3
RNAMotif Method • Two-stage algorithm • Stage I : Compilation stage • Analyzing the specific motif, called a descriptor and converting it into a search tree based on the helical nesting of the motif
RNAMotif Method • Two-stage algorithm • Stage II : DFS • Depth first search of the tree that was created by the compilation stage • Each time a complete solution to the descriptor is found, the candidate is passed to an optional score section for scoring and ranking • In absence of score section the candidate is accepted
RNAProfile Input • Number of distinct hairpins a motif has to contain • Set of unaligned RNA sequences expected to share a common motif
RNAProfile Output • Regions that are most conserved throughout the sequences, according to • sequence of the regions • Secondary structure that can be formed according to base-pairing and thermodynamic rules
RNAProfile Method • Two phases • Phase I : Extracting a set of candidate regions from each input sequence, whose predicted optimal secondary structure contains the number of hairpins given as input • Phase II : The regions selected are compared with each other to find the group of most similar ones, formed by a region taken from each sequence
Method Summery • Whiffer • Combines sequence and structure similarity • Very high specifity – potential candidates may be ruled out • RNAMotif • Similarity based mostly on structural elements, according to the descriptor • RNAProfile • Similarity based on both sequence and structure • Recommended as a post-processing step
The merge strategy Query: Sequence Structure (bracket notation) Input (((..((((…)))).)) Parsing Whiffer RNAMotif Parsing Candidates
Candidates • The location contained within a gene • The gene is relevant to the requested function (purine metabolism) Filtering RNAProfile Post processing Final candidates
Final candidates Sequence alignment Biological experiments
Results – eukaryoteArabidopsis Thaliana • Most promising candidates Arabidopsis Thaliana
c2__11199940_11199996 queryGBox CGTGGATATGGCACGCAAGTTTCTACCGGGCACCGTAAATGTCCGACTAT 50 c2__11199940_11199996_ --TTCAGGTC-CATCTTTGGCTAGACCGAAGTCAGATAATTTGGCGTTAT 47 * * * ** * * **** * * *** * *** queryGBox G-------- 51 c2__11199940_11199996_ AGTCCTGAA 56
c3_20894864_20894920 c3_sequences GGATGAGGAACCAATTGACCCTGGATTTCAAGATT-TACAAAAGAACGTA 49 queryGBox -------------CGTGGATATGGCACGCAAGTTTCTACCGGGCACCGTA 37 ** *** **** ** *** * **** c3_sequences AGCATCC------- 56 queryGBox AATGTCCGACTATG 51 * ***
RiboSearch - Conclusions • Filters false positives • Sequences are by far less conserved within eukaryotes than prokaryotes • The merge strategy is essential in eukaryotic genomes search
Our thanks • Dr. Danny Barash • Adaya Cohen