170 likes | 257 Views
EECS 800 Research Seminar Mining Biological Data. Instructor: Luke Huan Fall, 2006. Lys. Lys. Gly. Gly. Leu. Val. Ala. His. Cartoon. Space filling. Oxygen Nitrogen Carbon Sulfur. Ribbon. Surface. Introduction. Protein A sequence from 20 amino acids
E N D
EECS 800 Research SeminarMining Biological Data Instructor: Luke Huan Fall, 2006
Lys Lys Gly Gly Leu Val Ala His Cartoon Space filling Oxygen Nitrogen Carbon Sulfur Ribbon Surface Introduction • Protein • A sequence from 20 amino acids • Adopts a stable 3D structure that can be measured experimentally
Growth of Known Structures in Protein Data Bank 35,000 The total number of known protein structures Newly characterized proteins in that year # of structures 1988 2005 Year Exponential Growth of Protein Structures
Protein Structure Space http://www.nigms.nih.gov/psi/
Structure Space is Described Hierarchically • From SCOP: Structure classification of proteins (http://scop.berkeley.edu/) • Class • Fold • Superfamily • Family • Protein domains
SCOP Statistics 25973 PDB Entries (July 2005). 70859 Domains.
Protein Secondary Structure • α Helix
Protein Secondary Structure • β strands
Top Level of Structure Space: Structure Classes • There are four major classes: • α proteins • β proteins • α + β (anti-parallel β strands) • α / β (parallel β strands).
Protein Folds • Protein fold is the way how secondary structures are organized in a 3D structure.
Popular Folds The eight most frequent SCOP folds
Superfamily and Family • Proteins within the same superfamily and family will tend to have similar sequence and similar function
…. The Nature of Protein Structure Data • The ball-stick model is an element-based structure representation • A structure is decomposed into a set of amino acids • Proteingeometry,topology,andattributesare defined with respect to the amino acid set • Geometry is the coordinates of amino acids • Topology is the phyisco-chemical interactions of the residues • Attributes are the physico-chemical properties of the residues
Part of the biological system in a cell at the molecular level Grant Challenges: Proteomics Source: http://www.ircs.upenn.edu/modeling2001/,
References • Bioinformatics: Genes, Proteins, and Computers, Christine Orengo, David Jones, Janet Thornton edit, Bios Scientific Publishers, 2003. (ISBN: 1-85996-0545)