180 likes | 321 Views
RBP1 Splicing Regulation in Drosophila Melanogaster. 03-711 - Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla. This presentation available at http://www.jjoseph.org/biology/. Alternative Splicing in Dros. RBP1 Regulation. Involved in dsx splicing and Rbp1 auto-regulation
E N D
RBP1 Splicing Regulationin Drosophila Melanogaster 03-711 - Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at http://www.jjoseph.org/biology/
RBP1 Regulation • Involved in dsx splicing and Rbp1 auto-regulation • Suspected in many other related pathways
Genome Data • Sequence of all introns of known splice variants • Two annotated genomes available • D. Melanogaster • D. Pseudoobscura • As the gene names for D. Mel. and D. Pseu. differ, a list of gene orthologs was also obtained
Computational Approach • Create profile HMM for each motif (B-B, B-A) • Select the end of every intron (~50 bases) • Perform an HMM search for each intron segment, in both D. Mel. and D. Pseu. • Keep matches found in both species • Keep matches at the end of introns (~15 bases) • Return alignment of both species • Examine biological similarity of matches
Hidden Markov Profile (HMM) and HMMer • We needed an HMM profiler and search program. • Revised version of what Krogh/Haussler model called Plan 7 • Not only global alignment
HMMer Advantages • Possible Alignments • Classic global alignment • Classic local alignment • Global Profile, Local Sequence alignment • Fully local “multihit” alignment. Ex: • Scoring • Raw alignment score • E-value, showing the significance of the alignment
HMMer • Create HMM for multiple alignment of each B-B and B-A motif • Genome is scanned for high scoring matches • Only hits within a distance of 15 base pairs of the 3’ splice site are considered
Results: B-A Motif CG30271-RC-in_5 (27 - 39), GA15740-in_5 (27 - 39) score: -6 ctgttgaatcacttggaaagcaatcaGTCGACAATTGTTtacttttacag | |||||||||| ||||||||||||||||||||||||||||||||||| cctttgaatcactcggaaagcaatcaGTCGACAATTGTTtacttttacag CG30020-RA-in_3 (25 - 37), GA15581-in_9 (24 - 36) score: -8 ccgtcccagtgacttacaatacgaTTCTACTATTTTTtgtacgcttacag | | | | | ||||| |||| | | taaggctcttcatactttatcaaATCTACAATTTCTcaatgtaattgcag Klp3A-RA-in_3 (31 - 43), GA21186-in_3 (26 - 38) score: -9 ttgaagttcgaaaactcctgaaactaattgTTCCACAATTTTTttttatt | || || || ||| || ||||| | | tgttcaattcttaaataaaaccaatTTCGACTCTTTTTctcttctttcag na-RB-in_0 (33 - 45), GA13546-in_2 (25 - 37) score: -9 tctggtgcactgagagaaatgccatctacttcATCGATACTCTTTtgcag | | || | | || || | tgtaaacactcgttgcaaacacaaATTTACAATCAATttccatgttttat CG30428-RA-in_2 (33 - 45), GA15840-in_1 (25 - 37) score: -9 ggtaaggaagcgtaaaaataaattctttttttATCACCAATATTTttcag | || || ||||| |||| ||||| aaaatatcaagccgaaacaaatttATGTACAATTTTTtttttatggaaag CG2199-RB-in_0 (36 - 48), GA15296-in_0 (33 - 45) score: -10 ttgctactgccattataggtagtttaaaaactgttTTCTACACTCTTTct | | | | | || ||||| | | aacaaaaacaaaaatatggccctctgataattGGGGACACTTTATttcag
Results: B-B Motif ps-RD-in_4 (31 - 42), GA20847-in_4 (31 - 42) score: -11 catttaatatcttgaaaatatttaacataaATCTGATGCAAAtattccag | || | || |||||||||||||||||||||||||||||||| attactattcttaaaatatatttaacataaATCTGATGCAAAtattccag fru-RE-in_6 (26 - 37), GA12896-in_5 (24 - 35) score: -13 cccacccccacagtgatgacgcctaATATGAACCAAGcaaatgtttgcag | | | | | | ||| | || | | | | tgctaaataaaccaaattccaaaCTCTGATCAAAAaataccgataaaaag Ptp52F-RA-in_0 (38 - 49), GA14851-in_14 (34 - 45) score: -13 tactctttgaaaaataagcatatggatgtcactgataATATGATATTAAt | | | | || | ||| || || tctaaatcgtattcaaatcgaattgaaacataaATCGAATCCAAAaacag CG9455-RA-in_0 (32 - 43), GA21800-in_0 (27 - 38) score: -13 aatagtggctttgttttaataacaatgtaatATCTGATATTTAttctcag | | | | | ||||| | | | cagagcgtgccccgtctgatgatccgAACTGATCTGATgtttttcggtag CG8709-RA-in_2 (34 - 45), GA21271-in_9 (34 - 45) score: -13 acaaatcttaggaaataccaaagttgttctacgATCTTATCTATGgagtc | | | | | | || || | |||||| gccccatcagtgtcagtggcagctgaccccaccATTTGATCTATTtgcag CG7966-RA-in_0 (37 - 48), GA20727-in_4 (26 - 37) score: -13 tatatgtacacattgtactgcaaacacatgccctgaATCTTTGATAAAga | | ||| | | |||||| | |||| gtgttgaatgaaagaatacacttgaATCGGTTCTAAAttgcatcgcacag
Biomolecular activity analysis • fru gene, regulated by the tra and tra2 genes is expressed at the same time as dsx gene helps validate our results. • Expected presence of sxl and tra genes. • Functional Similarity: • B-A motif: SNF4Agamma, rdgc, qtc. • B-B motif: ps, ptp, CG9455.
Difficulties & Future Directions • Support Vector Machines were applied • Lack of significant training data. • Lack of direct experimental data for cross-validation. • Since the current D. Pse. genome has far fewer intron sequences, reliance upon orthologs introduces many false negatives.
Alternate Approach:Support Vector Machines (SVM) • Used for data classification • Creates hyperplanes that separate data into two classes with maximum-margin • Appropriate for multidimensional classification problems • Examples • Article classification • Protein classification • Critical points • Feature selection • Training
HMM and SVM • HMMer is used to generate features • All genome searched for A and B consensus sequences • Search results for each intron combined to create features • Features • Scores of two motifs in the upstream (2) • Distance of the motifs to the splice site (1) • Length of consensus sequence overlap (1) • Length of motif (1) • Does consensus sequence B precedes A (1) • Number of features = 6
Summary • Profile HMM used for modeling • Comparative analysis with the D.Pseu genome • High scoring alignments for both motifs further analyzed for biomolecular activity • The existence of the fru and other close matches help to validate our results