1 / 30

CSE182-L6

CSE182-L6. Protein structure basics Protein sequencing. Announcements. Midterm 1: Nov 1, in class. Assignment 2: Online, due October 20. Distinguishing between families. Distinguishing between families. Assignment 2. Profiles.

tabib
Download Presentation

CSE182-L6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE182-L6 Protein structure basics Protein sequencing CSE182

  2. Announcements • Midterm 1: Nov 1, in class. • Assignment 2: Online, due October 20. CSE182

  3. Distinguishing between families CSE182

  4. Distinguishing between families Assignment 2 CSE182

  5. Profiles • Start with an alignment of strings of length m, over an alphabet A, • Build an |A| X m matrix F=(fki) • Each entry fki represents the frequency of symbol k in position i 0.71 0.14 0.28 0.14 CSE182

  6. Scoring Profiles Scoring Matrix i k fki s CSE182

  7. Psi-BLAST idea • Multiple alignments are important for capturing remote homology. • Profile based scores are a natural way to handle this. • Q: What if the query is a single sequence. • A: Iterate: • Find homologs using Blast on query • Discard very similar homologs • Align, make a profile, search with profile. CSE182

  8. Pigeonhole principle again: • If profile of length m must score >= T • Then, a sub-profile of length l must score >= lT|/m • Generate all l-mers that score at least lT|/M • Search using an automaton • Multiple alignment: • Use ungapped multiple alignments only Psi-BLAST speed • Two time consuming steps. • Multiple alignment of homologs • Searching with Profiles. • Does the keyword search idea work? CSE182

  9. Protein Domains • An important realization (in the last decade) is that proteins have a modular architecture of domains/folds. • Example: The zinc finger domain is a DNA-binding domain. • What is a domain? • Part of a sequence that can fold independently, and is present in other sequences as well CSE182

  10. Domain review • What is a domain? • How are domains expressed • Motifs (Regular expression & others) • Multiple alignments • Profiles • Profile HMMs CSE182

  11. Domain databases Can you speed up HMM search? CSE182

  12. A structural view of proteins CSE182

  13. CS view of a protein • >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine). • MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL CSE182

  14. Protein structure basics CSE182

  15. Side chains determine amino-acid type • The residues may have different properties. • Aspartic acid (D), and Glutamic Acid (E) are acidic residues CSE182

  16. Bond angles form structural constraints CSE182

  17. Various constraints determine 3d structure • Constraints • Structural constraints due to physiochemical properties • Constraints due to bond angles • H-bond formation • Surprisingly, a few conformations are seen over and over again. CSE182

  18. Alpha-helix • 3.6 residues per turn • H-bonds between 1st and 4th residue stabilize the structure. • First discovered by Linus Pauling CSE182

  19. Beta-sheet • Each strand by itself has 2 residues per turn, and is not stable. • Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. • Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local interactions. CSE182

  20. Domains • The basic structures (helix, strand, loop) combine to form complex 3D structures. • Certain combinations are popular. Many sequences, but only a few folds CSE182

  21. 3D structure • Predicting tertiary structure is an important problem in Bioinformatics. • Premise: Clues to structure can be found in the sequence. • While de novo tertiary structure prediction is hard, there are many intermediate, and tractable goals. • The PDB database is a compendium of structures PDB CSE182

  22. Searching structure databases • Threading, and other 3d Alignments can be used to align structures. • Database filtering is possible through geometric hashing. CSE182

  23. Trivia Quiz • What research won the Nobel prize in Chemistry in 2004? • In 2002? CSE182

  24. How are Proteins Sequenced? Mass Spec 101: CSE182

  25. Nobel Citation 2002 CSE182

  26. Nobel Citation, 2002 CSE182

  27. Mass Spectrometry CSE182

  28. Enzymatic Digestion (Trypsin) + Fractionation Sample Preparation CSE182

  29. Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second CSE182

  30. Tandem MS Secondary Fragmentation Ionized parent peptide CSE182

More Related