430 likes | 546 Views
BBSI Research Simulation News. Project proposals. - Monday, June 16. - Format (see News, Presentations and other dates). Renaissance fair and other events. Party at Greg’s house. BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs.
E N D
BBSI Research SimulationNews • Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) • Renaissance fair and other events • Party at Greg’s house
BBSI Research SimulationPSSMs and Search for Repeats in DNAApplication of PSSMs • Regulatory protein and their binding sites • Palindromic DNA and its significance • How to find protein binding sites: Meme • PSSMs to find beginning of genes • Repeated sequences and location of protein binding sites Li et al (2002)
Presence of CRP sites Regulation by carbon source Presence of X sites Regulation by Y C Regulatory Protein and their Binding Sites lacZ 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN Operator Crp GTA ..(8).. TAC RNA Polymerase
Regulatory Protein and their Binding Sites 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN
recognizes GTGAGTT Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
recognizes GTGAGTT Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA recognizes GTGAGTT
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNN TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA Palindromes: Serve as binding sites for dimeric protein
TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT tRNA DNA: cruciform RNA: stem/loop Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATACTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATA CTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATA TTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATATTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTAAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell? TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA
Regulatory Protein and their Binding Sites Palindromic sequences 5’- -3’ 3’- -5’ TTAATGTGAGTTAGCTCACTCATTAATTACACTCAATCGAGTGAGTAA TAT GGCATGCTAGCTTAAT TCATTAATTA AGTAACGTACGATCGG TAT How to tell? Compensatory mutations: RNA Uncorrelated mutations: protein TTAATGTAAGTTAGCTCACTCATTAATTACATTCAATCGAGTGAGTAA
Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Regulatory Protein and their Binding Sites How to find them? Count all in certain class(Li et al, 2000) Guess a pattern and improve(Meme, Gibbs sampler)
Human sequences 5’ to transcriptional start snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT ACAGGGCAGAACCCGGGTGTTTCCGGGGACGCGCCCCCGGGCCTCCGCAGAGCTG Regulatory Protein and their Binding Sites How Meme finds them Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table
Regulatory Protein and their Binding Sites How Meme finds them How do pattern finders work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum
Regulatory Protein and their Binding Sites How Meme finds them How do pattern finders work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Step 6. If probability score high, remember pattern and score
Regulatory Protein and their Binding Sites How Meme finds them How do pattern finders work? snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Step 6. If probability score high, remember pattern and score Step 7. Repeat Steps 1 - 5
Regulatory Protein and their Binding Sites How Meme finds them • You’ve found a gene related to Purple Tongue Syndrome • BlastP: Encoded protein related to cAMP-binding proteins • Are the similarities trivial? Related to cAMP binding? • Does your protein contain cAMP-binding site? • What IS a cAMP-binding site? • Task • Determine what is a cAMP-binding site • Determine if your protein has one
Regulatory Protein and their Binding Sites How Meme finds them Strategy • Collect sequences of known cAMP-binding proteins • Run Meme, a pattern-finding programAsk it to find any significant motifs Do it • Rerun Meme. Demand that every protein has identified motifs • Run Pfam over known sequence to check
PSSMs in actionIdentification of beginning of gene aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA unknown Experimentally proven start sites
PSSMs in actionIdentification of beginning of gene aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA unknown Experimentally proven start sites
ACGT PSSMs in actionIdentification of beginning of gene aceB ACCACATAACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA
ACGT PSSMs in actionIdentification of beginning of gene aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA
How likely is it to find: GTGAGTTAACTCAC Li et al (2002)Algorithm Calculation of probability by Poisson equation Frequency of GTGAGTT= f1 Frequency of AACTCAC= f2 Frequency of joint occurrence = f1 · f2 = f12 Dimer occurred n times. How likely is that?
N ! n! · (N – n)! · NCn Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = f12 · f12 · f12 · … · (1-f12) · (1-f12) · (1-f12) · … n times N - n times
· (f12)n · (1-f12)(N-n) N ! n! · (N – n)! N ! n! · (N – n)! · (m/N)n · (1-m/N)(N-n) Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = Expected number = m = f12 · N f12 = m / N
· (m/N)n · (1-m/N)(N-n) (m)n · (1-m/N)N · (N)n · (1-m/N)n N ! n! · (N – n)! N ! n! · (N – n)! N ! n! · (N – n)! N ! n! · (N – n)! (m)n · (1-m/N)N · (N)n · (1 )n (m)n · e-m · (N)n · (1 )n Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer =
(m)n · e-m N · (N-1) · (N – 2) · … (N–n+1) n! · N ! n! · (N – n)! (N)n · (1 )n (m)n · e-m n! (m)n · e-m · (N)n · (1 )n Li et al (2002)Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer =