630 likes | 761 Views
CAP5510 – Bioinformatics Protein Structures. Tamer Kahveci CISE Department University of Florida. What and Why?. Proteins fold into a three dimensional shape Structure can reveal functional information that we can not find from sequence Misfolding proteins can cause diseases
E N D
CAP5510 – BioinformaticsProtein Structures Tamer Kahveci CISE Department University of Florida
What and Why? • Proteins fold into a three dimensional shape • Structure can reveal functional information that we can not find from sequence • Misfolding proteins can cause diseases • Sickle cell anemia, mad cow disease • Used in drug design Hemoglobin Normal v.s. sickled blood cells E → V HIV protease inhibitor
Goals • Understand protein structures • Primary, secondary, tertiary • Learn how protein shapes are • determined • Predicted • Structure comparison (?)
A Protein Sequence >gi|22330039|ref|NP_683383.1| unknown protein; protein id: At1g45196.1 [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM
R O H N C C OH H H Amino Acid Composition • Basic Amino AcidStructure: • The side chain, R,varies for each ofthe 20 amino acids Side chain Aminogroup Carboxylgroup
O O The Peptide Bond • Dehydration synthesis • Repeating backbone: N–C –C –N–C –C • Convention – start at amino terminus and proceed to carboxy terminus
Peptidyl polymers • A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. • We call the units of a protein amino acid residues. amidenitrogen carbonylcarbon
Side chain properties • Carbon does not make hydrogen bonds with water easily – hydrophobic • O and N are generally more likely than C to h-bond to water – hydrophilic • We group the amino acids into three general groups: • Hydrophobic • Charged (positive/basic & negative/acidic) • Polar
More Polar Amino Acids And then there’s…
Psi () – the angle of rotation about the C-C bond. Phi () – the angle of rotation about the N-C bond. The planar bond angles and bond lengths are fixed. Planarity of the Peptide Bond
Primary & Secondary Structure • Primary structure = the linear sequence of amino acids comprising a protein:AGVGTVPMTAYGNDIQYYGQVT… • Secondary structure • Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the-sheet • The location of direction of these periodic, repeating structures is known as the secondary structure of the protein
The Alpha Helix 60°
Properties of the Alpha Helix • 60° • Hydrogen bondsbetween C=O ofresidue n, andNH of residuen+4 • 3.6 residues/turn • 1.5 Å/residue rise • 100°/residue turn
Properties of -helices • 4 – 40+ residues in length • Often amphipathic or “dual-natured” • Half hydrophobic and half hydrophilic • If we examine many -helices,we find trends… • Helix formers: Ala, Glu, Leu, Met • Helix breakers: Pro, Gly, Tyr, Ser
The beta strand (& sheet) 135° +135°
Properties of beta sheets • Formed of stretches of 5-10 residues in extended conformation • Parallel/aniparallel,contiguous/non-contiguous
Turns and Loops • Secondary structure elements are connected by regions of turns and loops • Turns – short regions of non-, non- conformation • Loops – larger stretches with no secondary structure. • Sequences vary much more than secondary structure regions
Levels of Protein Structure • Secondary structure elements combine to form tertiary structure • Quaternary structure occurs in multienzyme complexes
Protein Structure Example Beta Sheet Helix Loop ID: 12as 2 chains
Views of a Protein Wireframe Ball and stick
Views of a protein Spacefill Cartoon CPK colors Carbon = green, black, or grey Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white
Mostly Helical Folding Motifs • Four helical bundle: • Globin domain:
/ Motifs • / barrel:
Determining the Structure of a Protein Experimental Methods X-ray NMR As of August 2013, structure of > 85,000 proteins are determined
X-Ray Crystallography Discovery of X-rays (Wilhelm Conrad Röntgen, 1895) Crystals diffract X-rays in regular patterns (Max Von Laue, 1912) The first X-ray diffraction pattern from a protein crystal (Dorothy Hodgkin, 1934)
X-Ray Crystallography • Grow millions of protein crystals • Takes months • Expose to radiation beam • Analyze the image with computer • Average over many copies of images • PDB • Not all proteins can be crystallized!
NMR • Nuclear Magnetic Resonance • Nuclei of atoms vibrate when exposed to oscillating magnetic field • Detect vibrations by external sensors • Computes inter-atomic distances. • Requires complex analysis. NMR can be used for short sequences (<200 residues) • More than one model can be derived from NMR.
Determining the Structure of a Protein Computational Methods
The Protein Folding Problem • Central question of molecular biology:“Given a particular sequence of amino acid residues (primary structure), what will the secondary/tertiary/quaternary structure of the resulting protein be?” • Input: AAVIKYGCAL…Output: 11, 22…
Structure v.s. Sequence • Observation: A protein with the same sequence (under the same circumstances) yields the same shape. • Protein folds into a shape that minimizes the energy needed to stay in that shape. • Protein folds in ~10-15 seconds.
Chou-Fasman methods • Uses statistically obtained Chou-Fasman parameters. • For each amino acid has • P(a): alpha • P(b): beta • P(t): turn • f(): additional turn parameter.
C.-F. Alpha Helix Prediction (1) P(a) P(b) • Find P(a) for all letters • Find 6 contiguous letters, at least 4 of them have P(a) > 100 • Declare these regions as alpha helix
C.-F. Alpha Helix Prediction (2) P(a) P(b) • Extend in both directions until 4 consecutive letters with P(a) < 100 found
C.-F. Alpha Helix Prediction (3) P(a) P(b) • Find sum of P(a) (Sa) and sum of P(b) (Sb) in the extended region • If region is long enough ( >= 5 letters) and P(a) > P(b) then declare the extended region as alpha helix
C.-F. Beta Sheet Prediction • Same as alpha helix replace P(a) with P(b) • Resolving overlapping alpha helix & beta sheet • Compute sum of P(a) (Sa) and sum of P(b) (Sb) in the overlap. • If Sa > Sb => alpha helix • If Sb > Sa => beta sheet
C.-F. Turn Prediction P(a) P(b) P(t) f() • An amino acid is predicted as turn if all of the following holds: • f(i)*f(i+1)*f(i+2)*f(i+3) > 0.000075 • Avg(P(i+k)) > 100, for k=0, 1, 2, 3 • Sum(P(t)) > Sum(P(a)) and Sum(P(b)) for i+k, (k=0, 1, 2, 3)
Other Methods for SSE Prediction • Similarity searching • Predator • Markov chain • Neural networks • PHD • ~65% to 80% accuracy