150 likes | 305 Views
http:// www.rnaparse.com. Operations on RNA Strings. RNA is the workhorse that uses this information to do everything else. * DNA is a linear string of four nucleotides ATGC that stores information. Shown above is the process of the transcription of DNA into single strands of RNA.
E N D
http:// www.rnaparse.com Operations on RNA Strings
RNA is the workhorse that uses this information to do everything else ... * DNA is a linear string of four nucleotides ATGC that stores information.
Shown above is the process of the transcription of DNA into single strands of RNA Roughly, RNA can be thought of a single stranded analog of DNA where ATCG gets turned into the bases AUCG.
RNA is a single, linear strand of nucleotides connected by a backbone of sugar-phosphate. As shown below, the strand is read as GCAGACCAAUGCAUGCGAUGAAGUUGAUCAUA...
Interestingly, we can see the string “GCAGACCAAUGCAUGCGAUGAAGUUGAUCAUA...” looks as if it has no structure until we see it represented as the stem and loop structure shown below. This is essentially due to only two rules of bonding between individual nucleotides.As you can imagine, this allows for an infinite number of convoluted RNA shapes to form depending on the sequence of nucleotides.
For purposes of discussion, RNA bonding takes place between certain nucleotides as follows (exceptions do exist.) A with UG with CHow can such a simple scheme produce EVERYTHING that living things are?
Axiom #1When talking about the function of RNA, shape rather than sequence means everything.Consider the hypothetical strings AUGCGGCAU and GACGCCGUC forming the hypothetical loop and stem shapes shown below.
In fact, we could think up a few thousand ways to create the same simple shape shown above using the same rule set and by choosing nucleotides that fit the rules: AAAACUUUU works...CUCUGAGAG works... The sequence of the two shapes below are readAUGCGGCAU and GACGCCGUCBut you will notice the shape or secondary structure does not differ between the two. Following the A:U, G:C rules of bonding we have the same shape with different sequences. ? =
RNA is a linear string that has the ability to fold back on itself to form complex shapes that have bio-functional meaning. RNA is composed of four nucleotides A U G C RNA rules of bonding are A with U, G with C (sometimes U with G) When talking about RNA structures secondary shape is often more important to function than the composition of linear sequence. Before continuing I'd like to stress a few key points:
Possible operations on a string of the 4 elements {AUGC} where the individual properties are such that (A binds to U) and (G binds to C)and their relationship to formal language * see Noam Chomsky
Linear, Type 3 language is easy – we just use simple regular expressions (e.g. find AUG[A or C] CCGAA) Loops are a bit harder but are still context-free language Pseudoknots are extremely hard to find and are of context-sensitive language. Pseudoknots are essentially an rna loop that have formed a stem-structure outside of itself. Searches for these RNA's as they exist in nature don't get extraordinarily complex until the string becomes folded back on itself in certain ways.
AAAAGGGGUUUUCCCC (((([[[[))))]]]] *Descriptive nomenclature* The RNA PseudoknotA string such as RNA can fold back on itself to form quite complex (non-knoted) structures that are exceedingly difficult to describe with programming languages.Show here is an RNA pseudoknot having the sequence: -AAAAGGGGUUUUCCCC-(note crossing dependencies between the two stems)
We parse RNA strings for structure rather than searching for specific nucleotide strings by treating RNA as a language where the predicates are based on properties of individual nucleotides: {A} concatenates to {U} {U} concatenates to {A} {G} concatenates to {U} {U} concatenates to {G} These predicates accept a stem of n-length and loops of n-lenght are added as needed. Then predicates are arranged in the same order as the structure we wish to match (locate.) Predicate A (Left part of stem) Loop Predicate (simple regular expression) Predicate B (Left part of stem) Loop Predicate (simple regular expression) Predicate A' (Right part of stem) Loop Predicate (simple regular expression) Predicate B' (Right part of stem) Or however we wish to arrange them to match some given structure ... www.RNAParse.com
The RNAparse database (screen shot below) is free to use and content is added as its discovered. Users are free to log on as GUEST. http://www.rnaparse.com/database/Main_Database_list.php http://www.rnaparse.com/database/Sequence_to_Structure_Report_report.php http://www.rnaparse.com/mirror/Main_DB_Compliment_Repeats_list.php www.RNAParse.comThe method is elegantly simple and is computationally linear.Results are parsed and fed into a database for further study. An important caveat in computational biology being that computer-located structure does not always mean a given structure exists in nature.
Many thanks to those who have contributed to make this research possible.Primary contact James F. Lynn jlynn@acsalaska.net