670 likes | 821 Views
Biology 4900. Biocomputing. Chapter 10. Protein Analysis and Proteomics. Composition of living organisms. 5 major components Proteins Nucleic acids Lipids (fats) Water Carbohydrates. Pevsner, Bioinformatics and Functional Genomics, 2009. Roles of DNA and Proteins.
E N D
Biology 4900 Biocomputing
Chapter 10 Protein Analysis and Proteomics
Composition of living organisms 5 major components Proteins Nucleic acids Lipids (fats) Water Carbohydrates Pevsner, Bioinformatics and Functional Genomics, 2009
Roles of DNA and Proteins • If we think of constructing an organism like building a house, DNA would be the blueprint and proteins would be most of the construction materials • Protein functions include: • Structural roles (e.g., actin in the cytoskeleton) • Enzyme catalysts (e.g., trypsin, a serine protease) • Intra- and intercellular transporters • Molecular signaling • Cellular regulation (e.g., Nrf2) Pevsner, Bioinformatics and Functional Genomics, 2009
Amino Acids • Organic compounds with amino and carboxylate functional groups • Each AA has unique side chain (R) attached to alpha (α) carbon • Crystalline solids with high MP’s • Highly-soluble in water • Exist as dipolar, charged zwitterions (ionic form) • Exist as either L- or D- enantiomers • Almost without exception, biological organisms use only the L enantiomer Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011; Berg JM, Tymoczko JL, Stryer L, Biochemistry, 5th Edition, 2002
Formation of Peptides/Proteins • Proteins and polypeptides are biochemical compounds consisting of amino acids • Chains of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues • Proteins • Longer and more complex than polypeptides • Typically folded into a globular or fibrous form • Structure facilitates a biological function Peptide linkages Amino acid Protein Polypeptide
Proteins have different levels of structure • Primary (1°): Sequence of amino acids • Determines 3D structure • Secondary (2°): H-bonding interactions between AA residues begin to produce regular, identifiable structures • Alpha (α) helices • Beta (β) strands • Random coil • Tertiary (3°): Overall structure of single protein in 3 dimensions • Quaternary (4°): Assemblies of multiple polypeptides and/or proteins http://protein-pdb.com/2011/10/04/primary-protein-structure/
Protein Secondary Structure Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011
Proteins 2° Structure: The α-helix • Backbone N-H groups form H-bonds with C=O group four residues away in sequence • AA’s in an α helix arranged in a right-handed helix • Each amino acid residue is rotated 100° relative to previous residue in helix • Helix has 3.6 residues per turn http://simplygeology.wordpress.com/tag/s-waves/
Proteins 2° Structure: The β-sheet • Beta (β) sheets formed by H-bond connected strands • β strands are elongated helices without helical H-bonds • β Sheets may be parallel or antiparallel http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm
Proteins 2° Structure: Random Coils and Loops • Proteins typically contain regions lacking either sheet or helical structures. These regions may be classified as: • Random Coils • Loops • Loops may perform important structural and functional roles, including: • Connecting β strands form antiparallel sheets • Increasing flexibility (hinge motion) • Binding metal ions or other biomolecules to alter protein function http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm
Proteins 3° Structure • Protein function determined by 3D shape • Tertiary structure results from residue interactions: • H-bonding • Disulfide Bridges • Salt Bridges • Hydrophobic Interactions Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011
Proteins 3° Structure • Polar and charged residues tend to be on surface of protein, exposed to water, while hydrophobic residues tend to be buried Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011
Proteins 4° Structure • Functional proteins may contain two or more polypeptide chains held together by the same forces that control 3° structure: • H-bonding • Disulfide Bridges • Salt Bridges • Hydrophobic Interactions • Each chain is a subunit of structure • Each subunit has its own 1°, 2° and 3° structure Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011
Proteins are Large Macromolecules • Proteins are extremely large • MW of glucose is 180 u, compared with 65,000 u for hemoglobin • Proteins synthesized inside cells remain inside cells • The presence of intracellular proteins in blood or urine can be used to test for certain diseases Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011
Protein Functions • Catalytic Function: • Enzymes are proteins that catalyze biological functions • Structural function: • Most human structural materials (excluding bone) are comprised of proteins • Collagen (bundled helices) • 25-35% of total protein in body • Tendons • ligaments • Skin • Cornea • Cartilage • Bone • blood vessels • gut • Keratin (bundled helices) • Chief constituent of hair, skin, fingernails http://www.imb-jena.de/~rake/Bioinformatics_WEB/proteins_classification.html
Protein Functions • Storage Function: • Storage of small molecules or ions • Ovalbumin • Main protein in egg whites • Can be broken down into amino acids for use by developing embryos • Ferritin • Globular complex of 24 protein subunits • Buffers iron concentration in cells Ovalbumin (chicken egg white) ferritin http://www.stagleys.demon.co.uk/explorers/genesandproteins/page6.html; http://ferritin.blogspot.com/
Protein Functions Immunoglobulin • Protective Function: • Protection against external foreign substances • Antibodies • Very large proteins • Combine with, and destroy viruses, bacteria • blood clotting/Coagulation • thrombin • Protease responsible for platelet aggregation and formation of fibrin Harris, L. J., Larson, S. B., Hasel, K. W., Day, J., Greenwood, A., McPherson, A. Nature 1992, 360, 369-372; http://courses.washington.edu/conj/immune/antibody.htm; http://www.colorado.edu/intphys/Class/IPHY3430-200/014blood.htm
Protein Functions • Regulatory Function: • Protein hormones • Insulin • Protein hormone that directs cells in the liver, muscle, and fat to take up glucose from the blood and store it as glycogen • Forms hexamer bound together by Zn Insulin http://en.wikipedia.org/wiki/File:InsulinHexamer.jpg; Seager SL, Slabaugh MR, Chemistry for Today: General, Organic and Biochemistry, 7th Edition, 2011
Protein Functions • Nerve impulse transmission: • Rhodopsin • Protein found in rods cells of eye retina • Converts light events into nerve impulses sent to the brain http://cherfan2010biology12assessment.wikispaces.com/The+Retina
Protein Functions • Movement function: • Proteins involved in muscle contraction • Myosin • Actin http://www.sigmaaldrich.com/life-science/metabolomics/enzyme-explorer/learning-center/structural-proteins/actin.html
Protein Functions • Transport function: • Transport ions or molecules throughout the body • Serum albumin: Transports fatty acids between fat and other tissues • Hemoglobin: Transports O2 from lungs to other tissues (e.g., muscles) • Transferrin: Transports iron in blood plasma Serum albumin hemoglobin transferrin http://en.wikipedia.org/ ; http://www.pdb.org/pdb/101/motm.do?momID=37
Protein Databases • NCBI RefSeq • UniProt/Swiss-Prot TrEMBL (merged with PIR) (http://www.ebi.ac.uk/uniprot/) • Ensembl (http://useast.ensembl.org/index.html) • Protein DataBank Some of these DB’s have been consolidated over the years. Efforts are being made to develop community standards for reporting protein data HUPO
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) http://www.psidev.info/ • HUPO organized into working groups that focus on different aspects of protein research • Gel Electrophoresis • Mass Spectrometry • Molecular Interactions • Protein Modifications • Proteomics Informatics • Sample Processing • Goals: Defining standards for proteomic data representation to facilitate the comparison, exchange, and verification of data • Controlled vocabularies • MIAPE: Minimum information about a proteomics experiment
Techniques to Identify Proteins Direct Protein Sequencing – Edman degradation • Useful for identifying short sequences (>50 residues) for protein concentrations of 1-10 picomoles http://en.wikibooks.org/wiki/Structural_Biochemistry/Proteins/Protein_sequence_determination_techniques; http://en.wikipedia.org/wiki/Edman_degradation
Techniques to Identify Proteins Mass Spectrometry • Proteins digested into fragments by enzymes • Passed through LC column then sprayed into MS through narrow positively-charge nozzle that further fragments the pieces into ions. • Mass-to-charge ratio of the fragments are calculated to determine amino acid sequence. • Unlike Edman degradation, MS does not have an absolute upper size limit for proteins, but larger proteins are computationally more difficult to sequence. http://www.magnet.fsu.edu/education/tutorials/tools/ionization_esi.html
Outline: Protein analysis and proteomics Perspectives on Individual proteins Perspective 1: Protein families (domains and motifs) Perspective 2: Physical properties (3D structure) Perspective 3: Localization Perspective 4: Function
Perspective 1: Protein domains and motifs Page 389
Definitions Signature: a protein category such as a domain or motif Domain: a region of a protein that can adopt a 3D structure (a fold) Examples: • zinc finger domain • immunoglobulin domain Family: a group of proteins that share a domain Motif (or fingerprint): A short, conserved region of a protein; typically 10 to 20 contiguous amino acid residues Pevsner, Bioinformatics and Functional Genomics, 2009
15 most common domains (human) Zn finger, C2H2 type 1093 proteins Immunoglobulin 1032 EGF-like 471 Zn-finger, RING 458 Homeobox 417 Pleckstrin-like 405 RNA-binding region RNP-1 400 SH3 394 Calcium-binding EF-hand 392 Fibronectin, type III 300 PDZ/DHR/GLGF 280 Small GTP-binding protein 261 BTB/POZ 236 bHLH 226 Cadherin 226 Page 391 Source: Integr8 at EBI website
EBI Integr8 site • Go to the Integr8 site: http://www.ebi.ac.uk/proteome/ • Browse species; choose Homo sapiens. • Click “Proteome analysis” • Click on “Genomics Statistics to obtain a variety of statistics, such as common repeats, domains, average protein length
Integr8: AA Composition Source: Integr8 at EBI website (updated 7/09)
Analysis of full-length proteins [fragments excluded]Avg protein length : 412 +/- 548 amino acid residues Size range: 4 - 34942 amino acid residues Source: Integr8 at EBI website (updated 7/09)
Definitions of a domain According to InterPro at EBI (http://www.ebi.ac.uk/interpro/): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. Page 390
Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times Pevsner, Bioinformatics and Functional Genomics, 2009
Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily. Pevsner, Bioinformatics and Functional Genomics, 2009
Blastp search for MeCP2 (human) These domains comprise a family and are homologous, even if the rest of the protein is quite different domain
Example of a multidomain protein: HIV-1 pol • Multi-domain proteins such as HIV-1 gag-pol are common • Pol (NP_789740), 995 amino acids long • Gag-Pol (NP_057849), 1435 amino acids • cleaved into three proteins with distinct activities: • -- aspartyl protease • -- reverse transcriptase • -- integrase • We will explore HIV-1 pol through UniProt. Pevsner, Bioinformatics and Functional Genomics, 2009
www.uniprot.org • Three protein databases merged to form UniProt: • SwissProt • TrEMBL (translated European Molecular Biology Lab) • Protein Information Resource (PIR) • You can search for information on your favorite protein • there; a BLAST server is provided. Pevsner, Bioinformatics and Functional Genomics, 2009
ExPASyUniProt/SwissProt • Go to ExPASy (http://www.expasy.ch/) • Enter search name or SwissProt accession number. • Ex. Search for HIV-1 gag-pol
EMBL-EBI Uniprot (trEmbl, PIR, SwissPRot) • Go to EMBL-EBI • Enter search name or accession number. • Ex. Search for HIV-1 gag-pol Extensive results Select This
Results of Search, UniProtKB • Sequence • Secondary Structure • Link to PDB 3D Structure • Links to databases (Pfam, PROSITE)
Pfam Features Integrase Zinc binding domain Integrase core domain
Pfam Features: Domains Select This
O16305 Calmodulin Pfam Features: Domains Students to perform this in class • Search for EFHand (PF00036) • Select link to Interpro Calmodulin EF Hand-like domain EF Hand 1 (binding site) Motifs are typically subsets of domains
Definition of a motif • Motif (or fingerprint): A short, conserved region of a protein (10 to 20 amino acids). • Simple motifs include (but are not limited to): • transmembrane domains • phosphorylation sites • calcium-binding sites • These do not imply homology when found in a group of proteins. • PROSITE (www.expasy.org/prosite) is a dictionary of motifs. • In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). • In contrast, a profile is a quantitative motif description. We will encounter profiles in Pfam, ProDom, SMART, and other databases. Pevsner, Bioinformatics and Functional Genomics, 2009
Perspective 2: Physical properties of proteins
Physical properties of proteins Many websites are available for the analysis of individual proteins. ExPASy is an excellent resource. The accuracy of these programs varies. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Pevsner, Bioinformatics and Functional Genomics, 2009