430 likes | 684 Views
BMI 731. Protein Structures and Related Database Searches. Protein. DNA (Genotype). Biology … Protein…. A single amino acid substitution in a protein causes sickle-cell disease…. What the.....!?. Why do we care about structure?.
E N D
BMI 731 Protein Structures and Related Database Searches
Protein DNA (Genotype) Biology … Protein…
A single amino acid substitution in a protein causes sickle-cell disease…
Why do we care about structure? • In the factory of living cells, proteins are the workers, performing a variety of biological tasks. • Each protein has a particular 3-D structure that determines its function. • Protein structure is more conserved than protein sequence, and more closely related to function. • Sequence -> Structure -> Function
Structural Information • Protein Data Bank: maintained by the Research Collaboratory of Structural Bioinformatics (RCSB) • http://www.rcsb.org/pdb/ • > 15,000 structures of proteins • Also contains of structures of Protein/Nucleic Acid Complexes, Nucleic Acids, Carbohydrates • Most structures are determined by X-ray crystallography. Other methods are NMR and electron microscopy (EM). Some structures are also theoretically predicted.
Protein? • Protein are linear heteropolymers: one or more polypeptide chains • Building blocks: 20(?) amino acid residues. • Range from a few 10s-1000s • Three-dimensional shapes (“fold”) adopted vary enormously.
Basic measurements on structures… • Bond lengths • Bond angles • Dihedral (torsion) angles
Bond Length • The distance between bonded atoms is constant • Depends on the “type” of the bond • Varies from 1.0 Å(C-H) to 1.5 Å(C-C) • BOND LENGTH IS A FUNCTION OF THE POSITION OF TWO ATOMS.
Bond Angle… • All bond angles are determined by chemical makeup of the atoms involved, and are constant. • Depends on the type of atom, and number of electrons available for bonding. • Ranges from 100° to 180° • BOND ANGLES IS A FUNCTION OF THE POSITION OF THREE ATOMS.
Dihedral Angles • These are usually variable • Range from 0-360° in molecules • Most famous are , , and • DIHEDRAL ANGLES ARE A FUNCTION OF THE POSITION OF FOUR ATOMS. http://www.colby.edu/chemistry/OChem/DEMOS/dihedral.html
Dihedral Angles A torsion angles is defined by 4 atoms, A, B, C and D. When atoms A, B, C and D are mainchain atoms (ie. the carboxylic carbon, C1; the alpha carbon, C2 or C-alpha; and the amide group nitrogen, N), There are THREE repeating torsion angles along the backbone chain called phi, psi and omega. http://bmbiris.bmb.uga.edu/wampler/tutorial/prot2.html
Ramachandran / phi-psi plot http://www.biochem.ucl.ac.uk/~roman/procheck/manual/examples/plot_01.html
Levels of Structure… 1 - Primary structure 2 - Secondary structure 3 - Tertiary structure 4 - Quaternary structure
Primary structure… • This is simply the amino acid sequences of polypeptide chains
Secondary structure • Local organization of protein backbone: -helix, -strand (which assemble into -sheet), turn and interconnecting loop.
The -helix • One of the most closely packed arrangement of residues. • Turn: 3.6 residues • Pitch: 5.4 Å/turn
The -sheet • Backbone almost fully extended, loosely packed arrangement of residues.
Tertiary structure… • Packing the secondary structure elements into a compact spatial unit • “Fold” or domain– this is the level to which structure prediction is currently possible.
Quaternary structure… • Assembly of homo or heteromeric protein chains. • Usually the functional unit of a protein, especially for enzymes
Classification… • Class • Fold/Architecture • Superfamily
Databases of structural classification • SCOP • Murzin AG, Brenner SE, Hubbard T, Chothia C • Structural classification of protein structures • Manual assembly by inspection • All nodes are annotated (e.g.. All-, /) • Structural similarity search using 3dSearch(Singh and Brutlag) • CATH • Dr. C.A. Orengo, Dr. A.D. Michie, etc • Class-Architecture-Topology-Homologous superfamily • Manual classification at Architecture level • Automated topology classification using the SSAP algorithms • No structural similarity search
Databases of structural classification • FSSP • L.L. Holm and C. Sander • Fully automated using the DALI algorithms (Holm and Sander) • No internal node annotations • Structural similarity search using DALI • Pclass • A. Singh, X. Liu, J. Chang, D. Brutlag • Fully automated using the LOCK and 3dSearch algorithms • All internal nodes automatically annotated with common terms • JAVA based classification browser • Structural similarity search using 3dSearch
Why Structure Alignment? • For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment—elucidates the common ancestry of the proteins. • For nonhomologous proteins, allows us to identify common substructures of interest. • Allows us to classify proteins into clusters, based on structural similarity.
How do we recognize structural similarities? • By eye (Alexei Murzin) SCOP--Gold standard for structure classification! • Algorithmically Growth of PDB demands automated techniques for classification and fold detection
Algorithms for Structure Alignment • Distance based methods • DALI (Holm and Sander): Aligning scalar distance plots • STRUCTAL (Gerstein and Levitt): Dynamic programming using pairwise inter-molecular distances • SSAP (Orengo and Taylor): Dynamic programming using intra-molecular vector distance • Vector based methods • VAST (Bryant): Graph theory based secondary structure alignment • 3dSearch (Singh and Brutlag): Fast secondary structure index lookup • Both vector and distance based • LOCK (Singh and Brutlag): Hierarchically uses both secondary structures vectors and atomic distances
DALI • Based on aligning 2-D intra-molecular distance matrices • Computes the best subset of corresponding residues from the two proteins such that similarity between the 2-D distance matrices is maximized. • Searches through all possible alignments of residues using Monte-Carlo algorithms
VAST-Vector Alignment Search Tool • Aligns only secondary structure elements (SSE) • Represents each SSE as a vector • Finds all possible pairs of vectors from the two structures that are similar • Uses a graph theory algorithms to find maximal subset of similar vectors • Overall alignment scores is based on the number of similar pairs of vectors between the two structures.
LOCK • Define local secondary structures • Find an initial superposition by using DP to align secondary structure vectors. • Use greedy algorithms to find nearest neighbors and minimize RMSD between the C- atoms from query and target. • Find the core of aligned C- atoms and minimize RMSD between them.
GenBank Where is the data? DB are equivalent
RefSeq NCBI Reference Sequences GenPeptDatabase http://inn.weizmann.ac.il/databanks/genpept.html http://www.expasy.org/sprot/ STATS: http://www.expasy.org/sprot/relnotes/relstat.html http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html http://www.rcsb.org/pdb/ PIR International Protein Sequence Database http://pir.georgetown.edu/pirwww/search/textpsd.shtml http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
A Flow chart for structure prediction Protein sequence Database similarity search Protein family, domain, cluster analysis Does sequence align with protein of known 3D structure? no Predicted three dimensional structure 3D comparative modeling Relation-ship to known structure? yes no 3D analysis in laboratory Is there a predicted structure? Structural analysis no
Images.. • 3-dimensional model showing the electron density in a molecule of buckminsterfullerene, an allotrope of carbon (C60).
Images… Computer generated image, showing 3-D structure of uteroglobin, a protein secreted in the uterus of mammals.
Images… (NMR… EPR…) A computer image of the charge density over the molecule chymosin, an important enzyme in cheese making. Overall negative charge is depicted as red, overall positive charge is shown in blue.
Thanks Thanks to Selnur Erdal for preparing initial versions of these slides.