350 likes | 524 Views
The Protein Databank. The . pdb file format, and other resources for structural information. Topic 5. Chapter 10 & 11, Du and Bourne “Structural Bioinformatics”. PDB: Protein Data Bank. http://www.rcsb.org/pdb. Experimental Methods: X-Ray and NMR. X-ray crystallography
E N D
The Protein Databank The .pdb file format, and other resources for structural information Topic 5 Chapter 10 & 11, Du and Bourne “Structural Bioinformatics”
PDB: Protein Data Bank http://www.rcsb.org/pdb
Experimental Methods: X-Ray and NMR • X-ray crystallography • -- Need crystals • -- No size limit (in principle) • -- Based on the scattering of the electron cloud of the atoms • -- Quality metrics = resolution and R-factor • Nuclear Magnetic Resonance spectroscopy (NMR) • -- Solution-based • -- Typical size limitation (< 50K) • -- Produces multiple models • -- Not just for determining structure (dynamics) • -- “Resolution”: root-mean-square-deviation (RMSD)
Important experimental quantities from X-ray Quality: Resolution (in Å) and R-factor (values = 0 to 1). Atom coordinates: Define the mean coordinates of the (heavy) atoms. B-factors(aka, temperature factors): Describes the apparent disorder about the mean. Disorder is spatial (crystal heterogeneity) and temporal (protein flexibility). However, in reality, B-factors are in protein crystallography are NOT pure Debye-Waller factors (mobilities). Instead, B-factors are most often best characterized as “fudge factors” uses to fit the electron density maps. Occupancies: Occasionally, a better fit to the electron density can often by obtained by assuming that certain atoms can be in more than one location, due to alternate conformations.
The .pdb file HEADER OXIDOREDUCTASE 21-JUL-93 1SPD TITLE AMYOTROPHIC LATERAL SCLEROSIS AND STRUCTURAL DEFECTS IN CU,ZN TITLE 2 SUPEROXIDE DISMUTASE COMPND MOL_ID: 1; COMPND 2 MOLECULE: SUPEROXIDE DISMUTASE; COMPND 3 CHAIN: A, B; COMPND 4 EC: 1.15.1.1; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 ORGANISM_TAXID: 9606 KEYWDS OXIDOREDUCTASE, SUPEROXIDE ACCEPTOR EXPDTA X-RAY DIFFRACTION AUTHOR H.E.PARGE,J.A.TAINER REVDAT 4 29-FEB-12 1SPD 1 JRNL VERSN REVDAT 3 24-FEB-09 1SPD 1 VERSN REVDAT 2 01-APR-03 1SPD 1 JRNL REVDAT 1 30-APR-94 1SPD 0 JRNL AUTH H.X.DENG,A.HENTATI,J.A.TAINER,Z.IQBAL,A.CAYABYAB,W.Y.HUNG, JRNL AUTH 2 E.D.GETZOFF,P.HU,B.HERZFELDT,R.P.ROOS,C.WARNER,G.DENG, JRNL AUTH 3 E.SORIANO,C.SMYTH,H.E.PARGE,A.AHMED,A.D.ROSES,R.A.HALLEWELL, JRNL AUTH 4 M.A.PERICAK-VANCE,T.SIDDIQUE JRNL TITL AMYOTROPHIC LATERAL SCLEROSIS AND STRUCTURAL DEFECTS IN JRNL TITL 2 CU,ZN SUPEROXIDE DISMUTASE. JRNL REF SCIENCE V. 261 1047 1993 JRNL REFN ISSN 0036-8075 JRNL PMID 8351519 REMARK 2 REMARK 2 RESOLUTION. 2.40 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : PROLSQ, X-PLOR REMARK 3 AUTHORS : KONNERT,HENDRICKSON,BRUNGER REMARK 3
Resolution From Wikipedia: Resolution in terms of electron density is a measure of the resolvability in the electron density map of a molecule. In X-ray crystallography, resolution is the highest resolvable peak in the diffraction pattern.
Resolution Histogram From Wikipedia: Resolution in terms of electron density is a measure of the resolvability in the electron density map of a molecule. In X-ray crystallography, resolution is the highest resolvable peak in the diffraction pattern.
R-Factor R-factor (aka, residual factor or agreement factor) is a measure of the difference between the observed and computed intensities. Note that the structure factor F is related to intensities from the diffraction pattern. A similar quality criterion is Rfree, which is calculated from a subset (~10%) of reflections that were not included in the structure refinement. 0.6: Very bad 0.5: Bad 0.4: Recoverable 0.2: Good for Protein 0.05: Good for small organic models 0.0: Perfect ||Fobs| - |Fcalc|| R = ------------------ |Fobs| R values:
Common rules of thumb A good rule of thumb for defining an acceptability threshold is based on resolution and R-factor. A resolution of 2.0 Å or lower and a R-factor of 0.20 or lower is a commonly used threshold in structural bioinformatic analyses. It is important to remember though, that there is no such thing as a single structure. Proteins are best described by ensembles. In the past, NMR structures were considered to be of lower quality than x-ray structures. However, they are increasingly accepted, especially since the environmental conditions (solvent vs. liquid crystal) have been argued to be more biological. Unfortunately, there is no magic number that can be used to assess NMR structure quality, or lack thereof.
REMARK 3 REMARK 3 REMARK 3 presents information on refinement program(s) used and related statistics. For non-diffraction studies, REMARK 3 is used to describe any refinement done, but its format is mostly free text.
REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 63 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -Y,X-Y,Z REMARK 290 3555 -X+Y,-X,Z REMARK 290 4555 -X,-Y,Z+1/2 REMARK 290 5555 Y,-X+Y,Z+1/2 REMARK 290 6555 X-Y,X,Z+1/2 REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 2 0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 3 -0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 3 -0.866025 -0.500000 0.000000 0.00000 REMARK 290 SMTRY3 3 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 4 -1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 4 0.000000 -1.000000 0.000000 0.00000 REMARK 290 SMTRY3 4 0.000000 0.000000 1.000000 35.77500 REMARK 290 SMTRY1 5 0.500000 0.866025 0.000000 0.00000 REMARK 290 SMTRY2 5 -0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 5 0.000000 0.000000 1.000000 35.77500 REMARK 290 SMTRY1 6 0.500000 -0.866025 0.000000 0.00000 REMARK 290 SMTRY2 6 0.866025 0.500000 0.000000 0.00000 REMARK 290 SMTRY3 6 0.000000 0.000000 1.000000 35.77500 REMARK 290 REMARK 290 REMARK: NULL Symmetry operations
Remarks contain all sorts of useful info REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: COVALENT BOND ANGLES REMARK 500 REMARK 500 THE STEREOCHEMICAL PARAMETERS OF THE FOLLOWING RESIDUES REMARK 500 HAVE VALUES WHICH DEVIATE FROM EXPECTED VALUES BY MORE REMARK 500 THAN 6*RMSD (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 500 IDENTIFIER; SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT: (10X,I3,1X,A3,1X,A1,I4,A1,3(1X,A4,2X),12X,F5.1) REMARK 500 REMARK 500 EXPECTED VALUES PROTEIN: ENGH AND HUBER, 1999 REMARK 500 EXPECTED VALUES NUCLEIC ACID: CLOWNEY ET AL 1996 REMARK 500 REMARK 500 M RES CSSEQI ATM1 ATM2 ATM3 REMARK 500 ALA A 1 CB - CA - C ANGL. DEV. = -9.3 DEGREES REMARK 500 ALA A 1 CA - C - O ANGL. DEV. = 17.7 DEGREES REMARK 500 THR A 2 N - CA - CB ANGL. DEV. = -15.8 DEGREES REMARK 500 THR A 2 OG1 - CB - CG2 ANGL. DEV. = 15.0 DEGREES REMARK 500 THR A 2 CA - CB - OG1 ANGL. DEV. = -13.1 DEGREES REMARK 500 THR A 2 N - CA - C ANGL. DEV. = 17.4 DEGREES REMARK 500 ALA A 1 CA - C - N ANGL. DEV. = -17.6 DEGREES REMARK 500 LYS A 3 CD - CE - NZ ANGL. DEV. = -15.2 DEGREES REMARK 500 ALA A 4 CA - C - N ANGL. DEV. = -13.8 DEGREES REMARK 500 ALA A 4 O - C - N ANGL. DEV. = 19.6 DEGREES REMARK 500 VAL A 5 C - N - CA ANGL. DEV. = -23.5 DEGREES REMARK 500 CYS A 6 O - C - N ANGL. DEV. = 10.3 DEGREES
SEQRES SEQRES 1 A 147 PRO LYS ALA LEU ILE VAL TYR GLY SER THR THR GLY ASN SEQRES 2 A 147 THR GLU TYR THR ALA GLU THR ILE ALA ARG GLU LEU ALA SEQRES 3 A 147 ASP ALA GLY TYR GLU VAL ASP SER ARG ASP ALA ALA SER SEQRES 4 A 147 VAL GLU ALA GLY GLY LEU PHE GLU GLY PHE ASP LEU VAL SEQRES 5 A 147 LEU LEU GLY CYS SER THR TRP ASN ASP ASP SER ILE GLU SEQRES 6 A 147 LEU GLN ASP ASP PHE ILE PRO LEU PHE ASP SER LEU GLU SEQRES 7 A 147 GLU THR GLY ALA GLN GLY ARG LYS VAL ALA CYS PHE GLY SEQRES 8 A 147 CYS GLY ASP SER SER TYR GLU TYR PHE CYS GLY ALA VAL SEQRES 9 A 147 ASP ALA ILE GLU GLU LYS LEU LYS ASN LEU GLY ALA GLU SEQRES 10 A 147 ILE VAL GLN ASP GLY LEU ARG ILE ASP GLY ASP PRO ARG SEQRES 11 A 147 ALA ALA ARG ASP ASP ILE VAL GLY TRP ALA HIS ASP VAL SEQRES 12 A 147 ARG GLY ALA ILE
Other info regarding the protein HET ACE A 0 3 HET ACE B 0 3 HET CU A 154 1 HET ZN A 155 1 HET CU B 154 1 HET ZN B 155 1 HETNAM ACE ACETYL GROUP HETNAM CU COPPER (II) ION HETNAM ZN ZINC ION FORMUL 1 ACE 2(C2 H4 O) FORMUL 3 CU 2(CU 2+) FORMUL 4 ZN 2(ZN 2+) HELIX 1 HA GLU A 133 THR A 137 1 5 HELIX 2 HB GLU B 133 THR B 137 1 5 SHEET 1 SA 9 ALA A 4 LYS A 9 0 SHEET 2 SA 9 GLN A 15 GLU A 21 -1 N PHE A 20 O ALA A 4 SHEET 3 SA 9 VAL A 29 LYS A 30 1 N LYS A 30 O GLU A 21 SHEET 4 SA 9 VAL A 94 ASP A 101 -1 SHEET 5 SA 9 GLY A 85 ALA A 89 1 SHEET 6 SA 9 GLY A 41 HIS A 48 -1 N HIS A 43 O VAL A 87 SHEET 7 SA 9 ARG A 115 HIS A 120 1 N VAL A 118 O HIS A 46 SHEET 8 SA 9 CYS A 146 GLY A 150 -1 N GLY A 147 O LEU A 117 SHEET 9 SA 9 ALA A 4 LYS A 9 1 N VAL A 7 O VAL A 148 SHEET 1 SB 9 ALA B 4 LYS B 9 0 SHEET 2 SB 9 GLN B 15 GLU B 21 -1 N PHE B 20 O ALA B 4 SHEET 3 SB 9 VAL B 29 LYS B 30 1 N LYS B 30 O GLU B 21 SHEET 4 SB 9 VAL B 94 ASP B 101 -1 SHEET 5 SB 9 GLY B 85 ALA B 89 1 SHEET 6 SB 9 GLY B 41 HIS B 48 -1 N HIS B 43 O VAL B 87 SHEET 7 SB 9 ARG B 115 HIS B 120 1 N VAL B 118 O HIS B 46 SHEET 8 SB 9 CYS B 146 GLY B 150 -1 N GLY B 147 O LEU B 117 SHEET 9 SB 9 ALA B 4 LYS B 9 1 N VAL B 7 O VAL B 148 SSBOND 1 CYS A 57 CYS A 146 1555 1555 2.10 SSBOND 2 CYS B 57 CYS B 146 1555 1555 2.06 ...
Other info regarding the protein ... LINK CH3 ACE A 0 N ALA A 1 1555 1555 1.49 LINK CH3 ACE B 0 N ALA B 1 1555 1555 1.58 LINK C ACE A 0 N ALA A 1 1555 1555 1.85 LINK CU CU A 154 NE2 HIS A 120 1555 1555 2.08 LINK CU CU A 154 ND1 HIS A 46 1555 1555 2.05 LINK CU CU A 154 NE2 HIS A 63 1555 1555 2.09 LINK CU CU A 154 NE2 HIS A 48 1555 1555 2.13 LINK ZN ZN A 155 ND1 HIS A 71 1555 1555 2.13 LINK ZN ZN A 155 ND1 HIS A 63 1555 1555 2.05 LINK ZN ZN A 155 ND1 HIS A 80 1555 1555 2.08 LINK ZN ZN A 155 OD2 ASP A 83 1555 1555 1.95 LINK C ACE B 0 N ALA B 1 1555 1555 1.80 LINK CU CU B 154 NE2 HIS B 120 1555 1555 2.02 LINK CU CU B 154 NE2 HIS B 63 1555 1555 2.15 LINK CU CU B 154 ND1 HIS B 46 1555 1555 2.15 LINK CU CU B 154 NE2 HIS B 48 1555 1555 2.13 LINK ZN ZN B 155 ND1 HIS B 63 1555 1555 2.20 LINK ZN ZN B 155 OD2 ASP B 83 1555 1555 2.00 LINK ZN ZN B 155 ND1 HIS B 80 1555 1555 2.08 LINK ZN ZN B 155 ND1 HIS B 71 1555 1555 2.16 SITE 1 CUA 4 HIS A 46 HIS A 48 HIS A 63 HIS A 120 SITE 1 ZNA 4 HIS A 63 HIS A 71 HIS A 80 ASP A 83 SITE 1 CUB 4 HIS B 46 HIS B 48 HIS B 63 HIS B 120 SITE 1 ZNB 4 HIS B 63 HIS B 71 HIS B 80 ASP B 83 SITE 1 AC1 4 HIS A 46 HIS A 48 HIS A 63 HIS A 120 SITE 1 AC2 4 HIS A 63 HIS A 71 HIS A 80 ASP A 83 SITE 1 AC3 4 HIS B 46 HIS B 48 HIS B 63 HIS B 120 SITE 1 AC4 4 HIS B 63 HIS B 71 HIS B 80 ASP B 83 ...
ATOM Example ChainID Atom coordinates Atom number Residue name Occup- ancy Atom name Residue number B-factor x y z ATOM 1 N PRO A 2 22.126 26.173 0.149 1.00 28.61 N ATOM 2 CA PRO A 2 21.848 26.169 1.597 1.00 27.50 C ATOM 3 C PRO A 2 20.582 25.363 1.875 1.00 26.69 C ATOM 4 O PRO A 2 19.724 25.215 0.973 1.00 26.48 O ATOM 5 CB PRO A 2 21.874 27.626 1.981 1.00 28.55 C ATOM 6 CG PRO A 2 21.899 28.434 0.721 1.00 29.65 C ATOM 7 CD PRO A 2 21.761 27.465 -0.440 1.00 28.77 C ATOM 8 N LYS A 3 20.499 24.795 3.073 1.00 22.80 N ATOM 9 CA LYS A 3 19.360 23.972 3.469 1.00 22.07 C ATOM 10 C LYS A 3 18.610 24.700 4.597 1.00 18.49 C ATOM 11 O LYS A 3 19.262 25.140 5.536 1.00 17.98 O ATOM 12 CB LYS A 3 19.669 22.668 4.145 1.00 24.58 C ATOM 13 CG LYS A 3 20.495 21.675 3.360 1.00 36.59 C ATOM 14 CD LYS A 3 20.652 20.419 4.220 1.00 48.23 C ATOM 15 CE LYS A 3 19.341 19.779 4.628 1.00 53.43 C ATOM 16 NZ LYS A 3 19.502 19.003 5.891 1.00 57.07 N ATOM 17 N ALA A 4 17.319 24.698 4.389 1.00 17.98 N ATOM 18 CA ALA A 4 16.468 25.371 5.384 1.00 17.19 C ATOM 19 C ALA A 4 15.446 24.391 5.938 1.00 14.46 C ATOM 20 O ALA A 4 14.919 23.560 5.198 1.00 16.67 O ATOM 21 CB ALA A 4 15.806 26.585 4.748 1.00 16.22 C
Insertion Code Things would be very simple if the amino acids in every chain were numbered in the obvious way, starting with 1. The problem with numbering started when people wanted to compare the 'same' proteins from different species. They found that there were the following possibilities that gave rise to differences: More or fewer residues at either end. Extra residues at various places within the chain. Fewer residues at various places within the chain. Different amino acids at the same place. For example, relative to an important reference, the Tyr and Phe in the next example must be “inserted” to maintain the sequence numbering elsewhere. Reference: Xxx-Arg---------Asp-Xxx Current: Xxx-Arg-Tyr-Phe-Asp-Xxx
Insertion Code ATOM 2518 CB ARG H 100 10.115 0.762 57.410 1.00 16.08 C ATOM 2519 CG ARG H 100 10.970 0.968 58.664 1.00 14.49 C ATOM 2520 CD ARG H 100 12.115 -0.023 58.757 1.00 17.17 C ATOM 2521 NE ARG H 100 12.888 0.203 59.977 1.00 16.50 N ATOM 2522 CZ ARG H 100 14.066 -0.354 60.234 1.00 17.58 C ATOM 2523 NH1 ARG H 100 14.620 -1.175 59.353 1.00 13.62 N ATOM 2524 NH2 ARG H 100 14.687 -0.088 61.380 1.00 17.77 N ATOM 2525 N TYR H 100A 7.182 2.284 55.730 1.00 12.75 N ATOM 2526 CA TYR H 100A 6.427 2.198 54.486 1.00 14.21 C ATOM 2527 C TYR H 100A 6.376 3.604 53.886 1.00 14.53 C ATOM 2528 O TYR H 100A 6.716 4.584 54.555 1.00 15.84 O ATOM 2529 CB TYR H 100A 5.008 1.657 54.732 1.00 14.11 C ATOM 2530 CG TYR H 100A 4.153 2.469 55.689 1.00 14.04 C ATOM 2531 CD1 TYR H 100A 3.708 3.754 55.357 1.00 14.99 C ATOM 2532 CD2 TYR H 100A 3.761 1.934 56.914 1.00 14.26 C ATOM 2533 CE1 TYR H 100A 2.890 4.480 56.224 1.00 14.63 C ATOM 2534 CE2 TYR H 100A 2.948 2.648 57.788 1.00 13.06 C ATOM 2535 CZ TYR H 100A 2.513 3.916 57.440 1.00 17.73 C ATOM 2536 OH TYR H 100A 1.690 4.600 58.311 1.00 16.51 O ATOM 2537 N PHE H 100B 5.974 3.698 52.623 1.00 14.33 N ATOM 2538 CA PHE H 100B 5.892 4.983 51.943 1.00 14.40 C ATOM 2539 C PHE H 100B 4.477 5.207 51.440 1.00 14.15 C ATOM 2540 O PHE H 100B 4.048 4.587 50.470 1.00 12.86 O ATOM 2541 CB PHE H 100B 6.908 5.020 50.798 1.00 15.51 C ATOM 2542 CG PHE H 100B 8.332 4.891 51.268 1.00 17.82 C ATOM 2543 CD1 PHE H 100B 8.834 3.656 51.668 1.00 17.78 C ATOM 2544 CD2 PHE H 100B 9.143 6.014 51.385 1.00 17.58 C ATOM 2545 CE1 PHE H 100B 10.126 3.538 52.185 1.00 17.48 C ATOM 2546 CE2 PHE H 100B 10.438 5.911 51.902 1.00 16.89 C ATOM 2547 CZ PHE H 100B 10.928 4.669 52.302 1.00 15.82 C ATOM 2548 N ASP H 101 3.762 6.107 52.113 1.00 16.17 N ATOM 2549 CA ASP H 101 2.369 6.394 51.790 1.00 17.29 C 1JGU
HETATM HETATM 1110 N1 FMN 149 -4.033 35.956 18.202 1.00 11.52 N HETATM 1111 C2 FMN 149 -3.099 36.573 18.873 1.00 12.48 C HETATM 1112 O2 FMN 149 -2.376 37.334 18.335 1.00 16.26 O HETATM 1113 N3 FMN 149 -3.088 36.481 20.290 1.00 11.60 N HETATM 1114 C4 FMN 149 -3.984 35.714 21.017 1.00 15.21 C HETATM 1115 O4 FMN 149 -3.735 35.647 22.242 1.00 18.11 O HETATM 1116 C4A FMN 149 -4.990 34.991 20.369 1.00 16.27 C HETATM 1117 N5 FMN 149 -5.880 34.239 20.865 1.00 16.15 N HETATM 1118 C5A FMN 149 -6.779 33.619 20.162 1.00 12.95 C HETATM 1119 C6 FMN 149 -7.711 32.852 20.939 1.00 16.60 C HETATM 1120 C7 FMN 149 -8.654 32.226 20.220 1.00 17.33 C HETATM 1121 C7M FMN 149 -9.808 31.422 20.926 1.00 22.85 C HETATM 1122 C8 FMN 149 -8.811 32.254 18.783 1.00 17.98 C HETATM 1123 C8M FMN 149 -9.911 31.545 18.096 1.00 21.26 C HETATM 1124 C9 FMN 149 -7.877 33.022 18.058 1.00 18.01 C HETATM 1125 C9A FMN 149 -6.878 33.683 18.696 1.00 14.21 C HETATM 1126 N10 FMN 149 -5.844 34.525 18.119 1.00 13.81 N HETATM 1127 C10 FMN 149 -4.943 35.171 18.860 1.00 12.49 C HETATM 1128 C1' FMN 149 -5.976 34.383 16.651 1.00 10.38 C HETATM 1129 C2' FMN 149 -5.162 33.164 16.170 1.00 14.33 C HETATM 1130 O2' FMN 149 -3.801 33.135 16.617 1.00 14.75 O