1 / 43

BBSI Research Simulation News

BBSI Research Simulation News. Project proposals. - Monday, June 16. - Format (see News, Presentations and other dates). Renaissance fair and other events. Party at Greg’s house. BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs.

nora
Download Presentation

BBSI Research Simulation News

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BBSI Research SimulationNews • Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) • Renaissance fair and other events • Party at Greg’s house

  2. BBSI Research SimulationPSSMs and Search for Repeats in DNAApplication of PSSMs • Regulatory protein and their binding sites • Palindromic DNA and its significance • How to find protein binding sites: Meme • PSSMs to find beginning of genes • Repeated sequences and location of protein binding sites Li et al (2002)

  3. Presence of CRP sites Regulation by carbon source Presence of X sites Regulation by Y C Regulatory Protein and their Binding Sites lacZ 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN Operator Crp GTA ..(8).. TAC RNA Polymerase

  4. Regulatory Protein and their Binding Sites 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

  5. recognizes GTGAGTT Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  6. recognizes GTGAGTT Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  7. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  8. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  9. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  10. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  11. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA recognizes GTGAGTT

  12. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  13. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  14. Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA Palindromes: Serve as binding sites for dimeric protein

  15. TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT tRNA DNA: cruciform RNA: stem/loop Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  16. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  17. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATACTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  18. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATA CTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  19. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATA TTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  20. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATATTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  21. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  22. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTAAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA

  23. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

  24. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

  25. Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT How to tell? Compensatory mutations: RNA Uncorrelated mutations: protein TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA

  26. Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Regulatory Protein and their Binding Sites How to find them? Count all in certain class(Li et al, 2000) Guess a pattern and improve(Meme, Gibbs sampler)

  27. Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT ACAGGGCAGAACCCGGGTGTTTCCGGGGACGCGCCCCCGGGCCTCCGCAGAGCTG Regulatory Protein and their Binding Sites How Meme finds them Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table

  28. Regulatory Protein and their Binding Sites How Meme finds them How do pattern finders work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum

  29. Regulatory Protein and their Binding Sites How Meme finds them How do pattern finders work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Step 6. If probability score high, remember pattern and score

  30. Regulatory Protein and their Binding Sites How Meme finds them How do pattern finders work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Step 6. If probability score high, remember pattern and score Step 7. Repeat Steps 1 - 5

  31. Regulatory Protein and their Binding Sites How Meme finds them • You’ve found a gene related to Purple Tongue Syndrome • BlastP: Encoded protein related to cAMP-binding proteins • Are the similarities trivial? Related to cAMP binding? • Does your protein contain cAMP-binding site? • What IS a cAMP-binding site? • Task • Determine what is a cAMP-binding site • Determine if your protein has one

  32. Regulatory Protein and their Binding Sites How Meme finds them Strategy • Collect sequences of known cAMP-binding proteins • Run Meme, a pattern-finding programAsk it to find any significant motifs Do it • Rerun Meme. Demand that every protein has identified motifs • Run Pfam over known sequence to check

  33. PSSMs in actionIdentification of beginning of gene aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA unknown Experimentally proven start sites

  34. PSSMs in actionIdentification of beginning of gene aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA unknown Experimentally proven start sites

  35. ACGT PSSMs in actionIdentification of beginning of gene aceB ACCACATAACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA

  36. ACGT PSSMs in actionIdentification of beginning of gene aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA

  37. PSSMs in actionAlgorithm to find binding sites (Li et al)

  38. How likely is it to find: GTGAGTTAACTCAC Li et al (2002)Algorithm Calculation of probability by Poisson equation Frequency of GTGAGTT= f1 Frequency of AACTCAC= f2 Frequency of joint occurrence = f1 · f2 = f12 Dimer occurred n times. How likely is that?

  39. N ! n! · (N – n)! · NCn Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = f12 · f12 · f12 · … · (1-f12) · (1-f12) · (1-f12) · … n times N - n times

  40. · (f12)n · (1-f12)(N-n) N ! n! · (N – n)! N ! n! · (N – n)! · (m/N)n · (1-m/N)(N-n) Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = Expected number = m = f12 · N f12 = m / N

  41. · (m/N)n · (1-m/N)(N-n) (m)n · (1-m/N)N · (N)n · (1-m/N)n N ! n! · (N – n)! N ! n! · (N – n)! N ! n! · (N – n)! N ! n! · (N – n)! (m)n · (1-m/N)N · (N)n · (1 )n (m)n · e-m · (N)n · (1 )n Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer =

  42. (m)n · e-m N · (N-1) · (N – 2) · … (N–n+1) n! · N ! n! · (N – n)! (N)n · (1 )n (m)n · e-m n! (m)n · e-m · (N)n · (1 )n Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer =

More Related