120 likes | 216 Views
David Goldberg CS 1950 Directed Study. RNA Sequence. Exon. Down Intron. Up Intron. GATTACACATGCCGTAG. CCCACTCCATGATTACAC. CATGCCGTAGCTCATGCC. GCCACGTCTTTTGCTCTTTGCAGGATTACATCACTGGAAACTTTAGCCACGTAAACTTTA. Pattern 1:ACATCAC Pattern 2:ACGT. Desired Upgrades. Current Program:
E N D
David Goldberg CS 1950 Directed Study
RNA Sequence Exon Down Intron Up Intron GATTACACATGCCGTAG CCCACTCCATGATTACAC CATGCCGTAGCTCATGCC GCCACGTCTTTTGCTCTTTGCAGGATTACATCACTGGAAACTTTAGCCACGTAAACTTTA Pattern 1:ACATCAC Pattern 2:ACGT
Desired Upgrades Current Program: • Command line arguments only (2 patterns) • Cannot Use Y or R or N • Only Checks Human RNA for patterns • Has static search length • Result file displays Human RNA id, mouse RNA id, and last 75 characters. • Only Searches Down Intron New Program: • Better user interface • Ability to use Y,R and N • Control Lengths between patterns and length of search • Easier to decipher result file • Checks Human and Mouse RNA • Can search up or down introns and exons.
Possible Problems • Programming in Perl • Extensive Use of Regular Expressions • Trouble figuring out exactly what is needed to be done • Don’t know if what we want to be done can be done
Old Program: • Command line arguments only (2 patterns) • Cannot Use Y or R or N • Only Checks Human RNA for patterns • Has static search length • Result file displays Human RNA id, mouse RNA id, and last 75 characters. • Only Searches Down Intron New Program: • Prompts user for inputs: • Path of database with default • 2 patterns • Minimum and Maximum distance between patterns • Searches from either 3’ splice site(beginning) or 5’ splice site(end) • Length from beginning or end to search • Which part to search(down intron, exon, up intron) • Will find matches in either the Human RNA, Mouse RNA or both • Result file displays Human RNA id, Mouse RNA id, sequence searched, 1st pattern found, sequence in between 1st pattern and 2nd pattern, and 2nd pattern
Old Program Results File Pattern1=ACG Pattern2=TT humanID mouseID ENSG00000124721_61 ENSMUSG00000033826_64 CENA GTAAGTTTTTATTTTTATTTATATCTACGTAGAAAGAGTTCCTTATTTAAAGGTGCTTAGTTTGCCTTCTCTGAT ENSG00000113569_8 ENSMUSG00000022142_8 CENA GTAAGTAGAAAACAATAAATTTGGCAAGTACAACTAATTTCTAACACATTGTTCCCTCAACGTTTTCTTCAGAAA ENSG00000105323_14 ENSMUSG00000040725_13 CENA GTGAGAGAATGAGTGTGTGTTTGTATGTAGTGATCGCACGTGTGCTTTTGAACCTGAGCAAGTTAGGTGGAGGCG ...
New Program Results File Pattern1=ACG Pattern2=TT Search=up SITE=3' humanID mouseID ENSG00000134690_4 SE CTACAACGTTCTTTTTAAAG ACG TT ENSMUSG00000028873_3 SE Not Found ENSMUSG00000026954_6 CENA TTTTATTCATACGCTTACAG ACG C TT ENSG00000115145_5 CENA Not Found ENSG00000124721_67 CENA CCACGTCTTCTTCTTTTCAG ACG TC TT ENSMUSG00000033826_70 CENA Not Found ENSG00000052126_20 CENA ACGTTTTCTAATATTCCCAG ACG TT ENSMUSG00000030231_11 CENA Not Found ENSG00000138468_2 SE CACGTCTTTGGTTTTTGTAG ACG TC TT ENSMUSG00000022591_2 SE TACGTCTTTCATTTTTGTAG ACG TC TT ENSG00000151376_4 CENA ACGTGTTTTATTTCTTTTAG ACG TG TT ENSMUSG00000030621_4 CENA Not Found ...
Exon, Intron Program • Wanted a program that searched the end of the down intron and beginning of the exon. • The first pattern would be in the intron. • The second pattern would be in the exon. • Exon usually start with a GT pattern so if it starts with that it should ignore that part in the pattern matching, but if the GT is not present it should still try to match the 2 patterns.
RNA Sequence Exon Down Intron Up Intron GATTACACATGCCGTAG CCCACTCCATGATTACAC CATGCCGTAGCTCATGCC ACTCCATGATTACAC GATTACACATG Pattern 1:GATT Pattern 2:ACAT
Exon, Intron Program • Prompts user for inputs: • Path of database with default • 2 patterns • Minimum distance between patterns • Will find matches in either the Human RNA, Mouse RNA or both • Result file displays Human RNA id, Mouse RNA id, small part of the down intron before first pattern, 1st pattern found, sequence in between 1st pattern and end of down intron, the GT sequence if it was at the start of the exon, the beginning of the exon until the 2nd pattern, and 2nd pattern, small part of the exon after the 2nd pattern, the length of the pattern in between the 1st pattern and the end of the intron, the length of the pattern between the start of the exon and the 2nd pattern.
RNA Sequence Exon Down Intron Up Intron GTATTACACATGCCGTA CCCACTCCATGATTACAC CATGCCGTAGCTCATGCC ACTCCATGATTACAC GTATTACACATGCCGTA 4 5 Pattern 1:GATT Pattern 2:ACAT
Exon, Intron Program Results File Pattern1=ACTG Pattern2=TTAC Max Space:15 humanID mouseID ENSG00000163872_13 ENSMUSG00000041215_12 CENA TGTAACATCT ACTG TCAAG GT AACATTC TTAC TGCGTT 5 7 ENSG00000135390_3 ENSMUSG00000010371_3 CENA GGAGAT ACTG ACAGATGAG GT ACC TTAC AGTGGAGTTG 9 3 ENSG00000103876_8 ENSMUSG00000030630_8 CENA CTTATGAACG ACTG GAGTG GT AA TTAC TGGAGCTCTGC 5 2 ENSG00000156253_3 ENSMUSG00000041079_3 SE TGCCTGAAATT ACTG TCAG GT ACG TTAC AGAAGCTCTG 4 3 ENSG00000151490_18 ENSMUSG00000030223_18 CENA AGAAGAGGAA ACTG ACAAA GT AAGTTTTTC TTAC TATG 5 9 ...