1 / 23

Mathematical Challenges in Protein Motif Recognition

Explore approaches and techniques for recognizing protein motifs, including alignments, multiple alignments, HMMs, threading, and statistical methods.

carolhatch
Download Presentation

Mathematical Challenges in Protein Motif Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

  2. Approaches to Structural Motif Recognition Alignments Multiple alignments & HMMs Threading Profile methods (1D, 3D) * Statistical methods

  3. Structural Motif Recognition 1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix). 2) Devise a method to determine if an unknown sequence folds as the motif or not. 3) Verification in lab.

  4. Our Coiled-Coil Programs • PairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995] • predicts 2-stranded CCs • http://theory.lcs.mit.edu/paircoil • MultiCoil [Wolf, Kim, Berger, 1997] • predicts 3-stranded CCs • http://theory.lcs.mit.edu/multicoil • LearnCoil-Histidine Kinase [Singh, Berger, Kim, Berger, Cochran, 1998] • predicts CCs in histidine kinase linker domains • http://theory.lcs.mit.edu/learncoil • LearnCoil-VMF [Singh, Berger, Kim, 1999] • predicts CCs in viral membrane fusion proteins • http://theory.lcs.mit.edu/learncoil-vmf

  5. Long Distance Correlations In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence

  6. Biological Importance of Beta Helices • Surface proteins in human infectious disease: • virulence factors (plants, too) • adhesins • toxins • allergens • Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease) • Potential new materials

  7. What is Known • Solved beta-helix structures: • 12 structures in PDB in 7 different SCOP families • Related work: • ID profile of pectate lyase (Heffron et al. ‘98) • HMM (e.g., HMMER) • Threading (e.g., 3D-PSSM)

  8. Key Databases Solved structures: Protein Data Bank (PDB) (100’s of non-redundant structures) [www.rcsb.org/pdb/] Sequence databases: Genbank (100’s of thousands of protein sequences) [www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html] SWISSPROT (10’s of thousands of protein sequences) [www.ebi.ac.uk/swissprot]

  9. BetaWrap Program [Bradley, Cowen, Menke, King, Berger: RECOMB 2001] • Performance: • On PDB: no false positives & no false negatives. • Recognizes beta helices in PDB across SCOP families in cross-validation. • Recognizes many new potential beta helices. • Runs in linear time (~5 min. on SWISS-PROT).

  10. BetaWrap Program • Histogram of protein scores for: • beta helices not in database (12 proteins) • non-beta helices in PDB (1346 proteins )

  11. Single Rung of a Beta Helix

  12. 3D Pairwise Correlations B3 T2 B2 Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B1

  13. 3D Pairwise Correlations B3 T2 B2 Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B1

  14. Question: but how can we find these correlations which are a variable distance apart in sequence? [Tailspike, 63 residue turn]

  15. Finding Candidate Wraps • Assume we have the correct locations of a • single T2 turn (fixed B2 & B3). Candidate Rung B3 T2 B2 • Generate the 5 best-scoring candidates for the next rung.

  16. Scoring Candidate Wraps (rung-to-rung) Similar to probabilistic framework plus: • Pairwise probabilities taken • from amphipathic • beta (not beta helix) • structures in PDB. • Additional stacking bonuses • on internal pairs. • Incorporates distribution on • turn lengths.

  17. Scoring Candidate Wraps (5 rungs) • Iterate out to 5 rungs generating candidate wraps: • Score each wrap: • - sum the rung-to-rung scores • - B1 correlations filter • - screen for alpha-helical content

  18. Potential Beta Helices • Toxins: • Vaculating cytotoxin from the human gastric pathogen H. pylori • Toxin B from the enterohemorrhagic E. coli strain O157:H7 • Allergens: • Antigen AMB A II, major allergen from A. artemisiifolia (ragweed) • Major pollen allergen CRY J II, from C. japonica (Japanese cedar) • Adhesins: • AIDA-I, involved in diffuse adherence of diarrheagenic E. coli • Other cell surface proteins: • Outer membrane protein B from Rickettsia japonica • Putative outer membrane protein F from Chlamydia trachomatis • Toxin-like outer membrane protein from Helicobacter pylori

  19. The Problem • Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix? • Very difficult: • peptide synthesis (1-2 months) • X-ray crystallization, NMR (>1 year) • molecular dynamics • Our goal: predict folded structure based on a template of positive examples.

  20. Collaborators Math / CS Mona Singh Ethan Wolf Phil Bradley Lenore Cowen Matt Menke David Wilson Theo Tonchev Biologists Peter S. Kim Jonathan King Andrea Cochran James Berger Mari Milla

More Related