1 / 27

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

Non-redundant templates of structures:. Structure-Sequence alignment “Structure is better preserved than sequence”. Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE. Me!. Me!. Me!. Me!. M VNG LILNGKTK------------------------AEKVFQYANDNGVDGEWTYTE. trp (W): probably not here!.

Download Presentation

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-redundant templates of structures: Structure-Sequence alignment “Structure is better preserved than sequence” • Query sequence • MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Me! Me! Me! Me!

  2. MVNGLILNGKTK------------------------AEKVFQYANDNGVDGEWTYTE trp (W): probably not here! How can we match a sequence and a structure? • MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Sequence: Similar Sequences take this structure (but remember – sequence is less preserved than structure…) Pair-Interaction:How well do AAs get along (Positive hate positive? Maybe not…?) • more: • 2nd structures prediction. • 2nd structures constraints (β-strands forming β -sheets…) • etc. Solvation: which AAs are buried?

  3. GenTHREADER “An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences”David T. Jones (1999) “What a good presentation!”B. Raveh (2003)

  4. GenTHREADER overview: • Query sequence • MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE • Templates • For each template (in the Brookhaven PDB): • Construct a profile sequence • Align with query sequence • Calculate structural parameters (“to be continued…”) • send parameters to a well-trained NEURON NETWORK (like PSIPred…) • OUTPUT: match confidence & alignment

  5. STAGE 1: Building a profile for each template • Start with sequence of template peptide:“MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTC” • Run BLASTP on OWL non-redundant protein sequence data bank, with sequence as input. • Take all sequences with E-Value < 0.01. • Align using MULTAL – multiple sequence alignment method. • Construct a sequence profile based on BLOSUM 50 matrix.

  6. STAGE 2: Align sequence with a profile • MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE SCORE = ? Length of query sequence = ? Length of template profile = ? Length of alignment itself = ?

  7. STAGE 3: calculate (some) structural parameters In stage 2, the sequence was aligned to a profile of the structure. The aligned sequence is now imposed on the 3D structure of the template, and used for ENERGY POTENTIALS calculation.

  8. STAGE 3: structural parameters (cont.): E-Pair (pair interaction potential) • an energy potential for the probability of the interactions observed in this structure. • Distance and sequence separation between certain atoms of two different amino-acids are measured (Cβ – Cβ , Cβ - N, Cβ – O, etc.) • Statistics of known structures were gathered and weighted. • The observed interactions are compared to the statistics • An energy potential is calculated • In essence: the smaller E-Pair, the better. aa 39 aa 157

  9. STAGE 3: structural parameters (cont.): E-Solv (solvation potential) • Degree of burial (DOB) for an amino acid: “the number of other Cβ atoms located within 10Å of the residue’s Cβ atom” • In general, hydrophobic amino acids like to be buried, safely away from water. • Hydrophilic acids might like the outside world better. • Each amino acid DOB is calculated. • It’s compared to statistical occurrence. • ΔEsolv(AA,r) = -RT ln( f(AA,r) / f(r) ) Cβ 10Å Cβ Cβ Cβ Cβ Cβ

  10. STAGE 4: send it all to the (trained) Neuron Network Ouput is a score between 0-1 – translated to confidence level (Low, Medium, High & Certain)

  11. See this page on the web

  12. Representatives were taken for different fold types in CATH (“T-Level”). Who trains the Neural network? • CAT numbers were used for comparing pairs. • 9169 chain pairs • 383 pairs shared a common domain fold (= should give a positive answer) • The network was trained with these pairs.

  13. Neural network – black box?

  14. Confidence assignment CERTAIN LOW MEDIUM HIGH

  15. GenTHREADER – what to do with it? • Results on a ‘classic’ test set of 68 proteins: • High true-positive rate: 73.5% correctly recognized, 48.5% with CERTAIN. • Extremely reliable:Every “CERTAIN” prediction was correct. • Fast automatic method. • For 22 of 68 proteins, alignment is over 50% accurate. • Let’s go analyze the Mycoplasma Genitalium with it!

  16. Whole Genome Analysis with GenTHREADER Mycoplasme Genitalium genome analysis – ONE DAY ONLY!

  17. 1HGXtemplate ORF MG276 of mycoplasma gen.: spotting a remote homologue • MG276 is an “Adenine Phospho-ribosyl-transferase”(but this information is not given to GenTHREADER) • 1HGX is a template of other Phospho-ribosyl-transferase. • It has only 10% sequence identity with our MG276! • It was found by GenTHREADER as a certain match • E-Pair saved the situation! • But how do we know it’s true?

  18. Substrate Ligand binding site of 1HGX template

  19. Substrate binding sites preserved • Secondary structure prediction of MG276 is similar ORF MG276 of mycoplasma gen.: supporting evidence for 1HGX as a template • We cheated all along…

  20. ORF MG353 of mycoplasma gen.: an ORF with no known function • MG353 – no homologues found in databases • 1HUE is a template of an “Histone-like” protein • Very low sequence similarity with our MG353. • It was found by GenTHREADER as a certain match • Striking similarity in DNA Binding regiondespite overall low sequence similarity

  21. GenTHREADER improvements:(McGuffin, Jones - may 2003) • PSI-BLAST, PSI-PRED (2nd stuructures), some more… • Some Results:

  22. AB-INITIO FOLDING - ROSETTA (Simons et al 1997, 1999, Bystroff & Baker 1998, Bonneau et al 2001) Prediction of a protein fold from scratch? Method I:physically simulate protein folding Problem:CPU time Practical for short peptides APKFFRGGNWKMNGKRSLGELIHTLGDAKLSADTEVVCGI APSITEKVVFQETKAIADNKD WSKVEVHESRIYGGSVTNCK ELASQHDVDGFLVGGASLKPVDGFLHALAEGLGVDINAKH Method II:check probability for all possible conformations Problem:infinite search space Solution: use mother nature – decrease search space

  23. Decreasing the search space using elements from short peptides: • Take fragments of short peptides (3 residues – 9 residues long). • Join them together • Keep the 2nd structures constant. • “Play” with the angles of loop residues. • RESULT: 200,000 decoy structures

  24. In addition - I-Sites prediction 13 local-structure 3D motifs with sequence profiles: • Strong independence of motifs (fold-initiation sites?) • complements secondary structure

  25. Find the correct fold for a given sequence (back to threading…) • P(sequence | structure): • Solvation • 2nd structure – amino acid (proline in helix, etc.) • Pair Interaction • I–Sites prediction for this sequence(3D motifs) – did not contribute to performance • Etc. • P(structure) – sequence independant • 2nd structure packing • Strand hydrogen bonding • Strand assembly in sheets • Structure compactness • Frequency of I-Sites 3D motifs • Etc.

  26. native structures vs. predicted models RESULTS in CASP 4 – Baker’s a winner…

  27. We're done!

More Related