300 likes | 407 Views
CSE182-L6. Protein structure basics Protein sequencing. Announcements. Midterm 1: Nov 1, in class. Assignment 2: Online, due October 20. Distinguishing between families. Distinguishing between families. Assignment 2. Profiles.
E N D
CSE182-L6 Protein structure basics Protein sequencing CSE182
Announcements • Midterm 1: Nov 1, in class. • Assignment 2: Online, due October 20. CSE182
Distinguishing between families Assignment 2 CSE182
Profiles • Start with an alignment of strings of length m, over an alphabet A, • Build an |A| X m matrix F=(fki) • Each entry fki represents the frequency of symbol k in position i 0.71 0.14 0.28 0.14 CSE182
Scoring Profiles Scoring Matrix i k fki s CSE182
Psi-BLAST idea • Multiple alignments are important for capturing remote homology. • Profile based scores are a natural way to handle this. • Q: What if the query is a single sequence. • A: Iterate: • Find homologs using Blast on query • Discard very similar homologs • Align, make a profile, search with profile. CSE182
Pigeonhole principle again: • If profile of length m must score >= T • Then, a sub-profile of length l must score >= lT|/m • Generate all l-mers that score at least lT|/M • Search using an automaton • Multiple alignment: • Use ungapped multiple alignments only Psi-BLAST speed • Two time consuming steps. • Multiple alignment of homologs • Searching with Profiles. • Does the keyword search idea work? CSE182
Protein Domains • An important realization (in the last decade) is that proteins have a modular architecture of domains/folds. • Example: The zinc finger domain is a DNA-binding domain. • What is a domain? • Part of a sequence that can fold independently, and is present in other sequences as well CSE182
Domain review • What is a domain? • How are domains expressed • Motifs (Regular expression & others) • Multiple alignments • Profiles • Profile HMMs CSE182
Domain databases Can you speed up HMM search? CSE182
CS view of a protein • >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine). • MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL CSE182
Protein structure basics CSE182
Side chains determine amino-acid type • The residues may have different properties. • Aspartic acid (D), and Glutamic Acid (E) are acidic residues CSE182
Various constraints determine 3d structure • Constraints • Structural constraints due to physiochemical properties • Constraints due to bond angles • H-bond formation • Surprisingly, a few conformations are seen over and over again. CSE182
Alpha-helix • 3.6 residues per turn • H-bonds between 1st and 4th residue stabilize the structure. • First discovered by Linus Pauling CSE182
Beta-sheet • Each strand by itself has 2 residues per turn, and is not stable. • Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. • Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local interactions. CSE182
Domains • The basic structures (helix, strand, loop) combine to form complex 3D structures. • Certain combinations are popular. Many sequences, but only a few folds CSE182
3D structure • Predicting tertiary structure is an important problem in Bioinformatics. • Premise: Clues to structure can be found in the sequence. • While de novo tertiary structure prediction is hard, there are many intermediate, and tractable goals. • The PDB database is a compendium of structures PDB CSE182
Searching structure databases • Threading, and other 3d Alignments can be used to align structures. • Database filtering is possible through geometric hashing. CSE182
Trivia Quiz • What research won the Nobel prize in Chemistry in 2004? • In 2002? CSE182
Nobel Citation 2002 CSE182
Nobel Citation, 2002 CSE182
Mass Spectrometry CSE182
Enzymatic Digestion (Trypsin) + Fractionation Sample Preparation CSE182
Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second CSE182
Tandem MS Secondary Fragmentation Ionized parent peptide CSE182