230 likes | 244 Views
Explore approaches and techniques for recognizing protein motifs, including alignments, multiple alignments, HMMs, threading, and statistical methods.
E N D
Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT
Approaches to Structural Motif Recognition Alignments Multiple alignments & HMMs Threading Profile methods (1D, 3D) * Statistical methods
Structural Motif Recognition 1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix). 2) Devise a method to determine if an unknown sequence folds as the motif or not. 3) Verification in lab.
Our Coiled-Coil Programs • PairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995] • predicts 2-stranded CCs • http://theory.lcs.mit.edu/paircoil • MultiCoil [Wolf, Kim, Berger, 1997] • predicts 3-stranded CCs • http://theory.lcs.mit.edu/multicoil • LearnCoil-Histidine Kinase [Singh, Berger, Kim, Berger, Cochran, 1998] • predicts CCs in histidine kinase linker domains • http://theory.lcs.mit.edu/learncoil • LearnCoil-VMF [Singh, Berger, Kim, 1999] • predicts CCs in viral membrane fusion proteins • http://theory.lcs.mit.edu/learncoil-vmf
Long Distance Correlations In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence
Biological Importance of Beta Helices • Surface proteins in human infectious disease: • virulence factors (plants, too) • adhesins • toxins • allergens • Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease) • Potential new materials
What is Known • Solved beta-helix structures: • 12 structures in PDB in 7 different SCOP families • Related work: • ID profile of pectate lyase (Heffron et al. ‘98) • HMM (e.g., HMMER) • Threading (e.g., 3D-PSSM)
Key Databases Solved structures: Protein Data Bank (PDB) (100’s of non-redundant structures) [www.rcsb.org/pdb/] Sequence databases: Genbank (100’s of thousands of protein sequences) [www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html] SWISSPROT (10’s of thousands of protein sequences) [www.ebi.ac.uk/swissprot]
BetaWrap Program [Bradley, Cowen, Menke, King, Berger: RECOMB 2001] • Performance: • On PDB: no false positives & no false negatives. • Recognizes beta helices in PDB across SCOP families in cross-validation. • Recognizes many new potential beta helices. • Runs in linear time (~5 min. on SWISS-PROT).
BetaWrap Program • Histogram of protein scores for: • beta helices not in database (12 proteins) • non-beta helices in PDB (1346 proteins )
3D Pairwise Correlations B3 T2 B2 Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B1
3D Pairwise Correlations B3 T2 B2 Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B1
Question: but how can we find these correlations which are a variable distance apart in sequence? [Tailspike, 63 residue turn]
Finding Candidate Wraps • Assume we have the correct locations of a • single T2 turn (fixed B2 & B3). Candidate Rung B3 T2 B2 • Generate the 5 best-scoring candidates for the next rung.
Scoring Candidate Wraps (rung-to-rung) Similar to probabilistic framework plus: • Pairwise probabilities taken • from amphipathic • beta (not beta helix) • structures in PDB. • Additional stacking bonuses • on internal pairs. • Incorporates distribution on • turn lengths.
Scoring Candidate Wraps (5 rungs) • Iterate out to 5 rungs generating candidate wraps: • Score each wrap: • - sum the rung-to-rung scores • - B1 correlations filter • - screen for alpha-helical content
Potential Beta Helices • Toxins: • Vaculating cytotoxin from the human gastric pathogen H. pylori • Toxin B from the enterohemorrhagic E. coli strain O157:H7 • Allergens: • Antigen AMB A II, major allergen from A. artemisiifolia (ragweed) • Major pollen allergen CRY J II, from C. japonica (Japanese cedar) • Adhesins: • AIDA-I, involved in diffuse adherence of diarrheagenic E. coli • Other cell surface proteins: • Outer membrane protein B from Rickettsia japonica • Putative outer membrane protein F from Chlamydia trachomatis • Toxin-like outer membrane protein from Helicobacter pylori
The Problem • Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix? • Very difficult: • peptide synthesis (1-2 months) • X-ray crystallization, NMR (>1 year) • molecular dynamics • Our goal: predict folded structure based on a template of positive examples.
Collaborators Math / CS Mona Singh Ethan Wolf Phil Bradley Lenore Cowen Matt Menke David Wilson Theo Tonchev Biologists Peter S. Kim Jonathan King Andrea Cochran James Berger Mari Milla