320 likes | 580 Views
Surabhi Agarwal. Bioinformatics and Protein Structural Analysis.
E N D
Surabhi Agarwal Bioinformatics and Protein Structural Analysis The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the most widespread fields of research in bioinformatics.
Master Layout (Part 1) 1 This animation consists of 2 parts: Part 1: Protein Structural Databases Part 2: Uses of Structural databases 2 Different types of data and the organization of data in a Structural Database 3 4 Search the Database for Protein Structures 5
Definitions of the components:Part 1 – Protein structural databases 1 • Query Peptide: The unknown protein or peptide whose sequence is first determined, with which further analysis is performed. This protein sequence is compared with other known protein sequences in existing databases. • Protein sequence: The linear chain or sequence of amino acids, which form the structural unit of a protein, is known as the protein sequence. This sequence is unique for all proteins and is also known as the primary structure of the protein. • Sequence similarity: The process by which the amino acid sequences of two proteins are aligned linearly to evaluate their similarities. • 3-D structural alignment: The three dimensional structural alignment is the process of super-positioning two given protein structures. This can be achieved by using suitable software by entering protein identifiers or their atomic coordinates. 2 3 4 5
Definitions of the components:Part 1 – Protein structural databases 1 • Geometry of Protein Structure: Geometry of a protein structure refers to the three dimensional coordinates of its atoms and the angles between their bonds. These are essential to simulate the protein structure on computers. • Biology of Protein Structure: Information regarding the biological source of the protein and its metabolic roles within the cell and organism is referred to as the biology of protein structure. • SCOP classification: SCOP stands for “Structural Classification of Proteins” and aims to provide a detailed description of the various structural and evolutionary relationships between all proteins that have been structurally characterized. SCOP Classification can be done at four levels - Class, Fold, Superfamily and Family. • CATH classification: CATH stands for “Class Architecture Topology and Homologous Superfamily” and provides a semi-automatic, hierarchical classification of protein domains. The levels for CATH classification are Class, Architecture, Topology and Homologous Superfamily. 2 3 4 5
Step 1: Protein Structure Database: Search 1 Protein Structural Database Enter Protein ID or text query Capsid 2 Optional Inputs Structure Features Biology Number of Chains Source Organism Macromolecule type Number of Chains Number of models Molecular Weight Secondary Structure Content Secondary Structure Length SCOP classification CATH classification Source Organism Expression Organism Enzyme Classification Biological Process Cellular component 3 Retro Transcribing Viruses 10 Sequence Features Sequence Length Experiment Experimental method Sequence Translated Nucleotide Sequence Sequence Length Sequence Motif Experimental method Resolution Crystal Properties Detectors used Experimental Data Available Search < 500 X-RAY CRYSTALLOGRAPHY 4 5 http://www.pdb.org/pdb/search/advSearch.do
Step 1: Protein Structure Database: Search 1 Action Description of the action Audio Narration The protein structural databases contain a basic search box which requires the input for an identifier of the protein. This identifier can be the protein name, key-word, ID, author, etc. In this example, we take the case of Viral Capsid Proteins. These databases have advanced search features which are optional but help in making the query very specific. The general options can be categorized in 4 broad classes. Structural Features, Biology, Sequence Data and Experimental Details. Follow the steps as shown in the animations. First show the basic layout of the database. Then input the test “Capsid” in the text box on the top of the page. For each 4 categories, when the down-link gets clicked announce the options as the mouse hovers on them. The downlink in the animation should look like the downlink in web-pages. Re-create all images. Schematic for Database functioning 2 3 4 5 http://www.pdb.org/pdb/search/advSearch.do
Step 2.a: Protein Structure database: Output 1 Protein Structural Database Number of Hits 67 2 Showing 1 to 4 of 67 Next HIV CAPSID C-TERMINAL DOMAIN (CAC146) X-RAY CRYSTAL STRUCTURE OF EQUINE INFECTIOUS ANEMIA VIRUS (EIAV) CAPSID PROTEIN P26 ROUS SARCOMA VIRUS CAPSID PROTEIN: N-TERMINAL DOMAIN Structure of HIV1 protease and AKC4p_133a complex. 3 4 Action Description of the action Audio Narration The search results for the query protein entered showed 67 structures in the database that match the criteria given by the user in the search options. The first page of the results shows the titles of all the hits. The user then needs to select the protein structure of their interest to study in detail. Here we select the structure titled “HIV CAPSID C-TERMINAL DOMAIN (CAC146)” for further study. Schematic for Database functioning Follow the steps as shown in the animations. Re-create all images. Show the display of “67” in front of tab titled “Number of Hits”. Then show the figure under the 2nd horizontal line. Show clicking effect on the 1st point. This slide and the 8 that follow it, are part for the same animated webpage. 5 http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM
Step 2.b - Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity 2 Methods Geometry Biology Derived data 1. 1AUM 2. Molecule:HIV CAPSID Structure Weight: 7970.16 Type:polypeptide(L) Chains:A Length:70 Classification: Viral Protein 3. Scientific Name: Human immunodeficiency virus 1 Expression System: Escherichia coli bl21(de3) 4.“Structure of the carboxyl-terminal dimerization domain of the HIV-1 capsid protein”, Science, 1997 3 4 Action Description of the action Audio Narration Follow the steps as shown in the animations. Re-create all images. This slide and the 7 slides that follow it, are part for the same webpage. The mouse pointer should be shown clicking on each of the 8 tabs one –by-one , and the text below it changes accordingly. Always highlight the active tab with a different color as done in websites..As each of the four headings is being narrated in the audio narration, that particular text must be highlighted in the animation. The summary page shows all the general information pertaining to the basic features of the protein. This includes: 1 . Protein Identifier 2. Molecule name, structure weight, polymer type, number of chains, length of the molecule and its classification 3. Source organism and Expression organism 4. Journal, paper and author name Schematic for Database functioning 5 http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM
Step 2.c - Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity Methods Geometry Biology Derived data 2 • 1. FASTA • >1AUM:A|PDBID|CHAIN|SEQUENCELDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPGATLEEMMTACQG • Chain Type: polypeptide(L) Domain of the protein Sequence of Amino Acid Residues and their positions No assigned secondary structure Hydrogen Bonded Turn Alpha Helix Cysteine Residues Di-sulphide bridge 3 Cysteine Residues 4 Action Description of the action Audio Narration Schematic for Database functioning The sequence data tab contains all the information related to the amino acid sequence corresponding to the protein under consideration 1. FATSA sequence for all chains in the polypeptide 2. Type of chain such as polypeptide, glyco-peptide, lipo-peptide, etc. 3. Diagrammatic representation of the Classification and Secondary structure of this chain - assigning residues with helix, sheet or turn Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8, as described there. 5 http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM
Step 2.d - Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity Methods Geometry Biology Derived data Perform BLAST of the sequence of the retrieved Protein 2 Table for cluster of similar proteins where the structure has been determined BLAST 3 4 Action Description of the action Audio Narration Schematic for Database functioning Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8, as described there. The sequence similarity tab shows the information related to comparative studies of the two sequences. 1. Option to perform BLAST search. 2. List of Clusters of proteins is produced. These clusters are formed and ranked based on the resolution of the structures within them. The better the quality (resolution) of the cluster, higher it is ranked. When the user clicks on a particular cluster, the component proteins within the cluster are displayed along with supporting information.. 5 http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM
Step 2.e -Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity Methods Geometry Biology Derived data 2 GAG POLYPROTEIN (colored blue) HIV capsid alignment with GAG ployprotein HIV CAPSID (colored orange) 3 4 Action Description of the action Audio Narration Schematic for Database functioning Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8 , as described there. The structural similarity tab shows the information related to comparative studies of the two structures. It establishes equivalences based on 3D conformations of both proteins. The default visualization tool for PDB is Jmol. Structural alignment is covered in more detail in the second part of this animation. 5 http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A
Step 2.f - Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity 2 Methods Geometry Biology Derived data 3 4 Action Description of the action Audio Narration Schematic for Database functioning All tables have to be re-drawn by the animator. Follow the steps as shown in the animations. This is a follow-up slide to slide #8, as described there. This tab provides details of the methodology used in conducting those experiments. This includes, 1. Crystallization methods, pH, temperature, and other details of the experiment 2. Crystal Data (Space group, unit cell dimensions) 3. Diffraction source, diffraction protocol and diffraction detectors 4. Data related to Resolution and Refinement details 5. Software, programs and Computing utilized. A brief summary of this result is shown in this animation. For details visit http://www.pdb.org/pdb/explore/materialsAndMethods.do?structureId=1AUM# 5 http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM
Step 2.g - Protein Structure database: Output 1 Protein Structural Database Ramachandran Map to show the residues that lie in the favored region (outlined in Dark Blue) and the permitted region (outlined in light blue) Values for Fold Deviation Score . For a specific reference value, FDS is a multiple of the standard deviation Summary Sequence data Sequence similarity 3D similarity Plot for Fold Deviation Score. x- axis has the residue positions and y-axis has the FDS values The position, total number, range of the covalent bond lengths between two adjacent atoms in a protein molecule The angle formed by 2 consecutive planes of 4 linearly bonded atoms. Their occurrence, positions along with other statistics. The angle formed by 3 consecutive atoms in native conformation of a protein and their statistics Methods Geometry Biology Derived data 2 3 67/68 residues lie in the favored region and none of the residues lie in the dis-allowed region 4 Action Description of the action Audio Narration Schematic for Database functioning All tables have to be re-drawn by the animator. Follow the steps as shown in the animations. This is a follow-up slide to slide #8 , as described there. The Geometry of the molecule contains all the spatial information about the Geometry of the molecule, so that it can be simulated in a virtual environment. This includes: Bond length: Number of occurrences and their positions in the chains Bond Angles: Number of occurrences and their positions in the chains Dihedral Angles: Number of occurrences and their positions in the chains Ramachandran plot, Fold Deviation Scores and other structural details http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM 5 http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM
Step 2.h - Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity Methods Geometry Biology Derived data Protein Details Gene Details 2 3 4 Action Description of the action Audio Narration The biology tab contains information about the significance of the molecule at the biological and cellular level. This includes 1. Molecule type 2. Formula weight 3. Monomers, and linkages 4. Source method 5. Ligands and prosthetic groups 6. Gene detail and Genome information 7. Keywords Schematic for Database functioning Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8 , as described there. 5 http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM
Step 2.g - Protein Structure database: Output 1 Protein Structural Database Summary Sequence data Sequence similarity 3D similarity Methods Geometry Biology Derived data 2 CATH classification PFAM classification SCOP classification 3 4 Action Description of the action Audio Narration Data for the same protein but from other resources such as SCOP, CATH and PFAM classification details are provided in the derived data tab. For more detailed analysis visit http://www.pdb.org/pdb/explore/derivedData.do?structureId=1AUM Schematic for Database functioning Follow the steps as shown in the animations. Re-create all images. This is a follow-up slide to slide #8 , as described there. 5 http://www.pdb.org/pdb/explore/geometryDisplay.do?structureId=1AUM
Master Layout (Part 2) 1 This animation consists of 2 parts: Part 1: Protein Structural Databases Part 2: Uses of Structural databases 2 3 Protein Structural alignment Secondary Structure Prediction 4 Functional Annotation 5
Definitions of the componentsPart 2 – Uses of structural databases 1 Protein Structural Alignment: The geometry of two given protein structures can be compared by means of available software tools that analyse their three dimensional similarity to each other. Protein Structure Prediction: The prospective secondary structures of peptides or proteins can be predicted from a given stretch of amino acid residues by using machine learning algorithms. Machine Learning Algorithms: These are computer algorithms that can be trained from a given classified dataset. Thereafter, these programs train their parameters in a such a way, that they can classify new data. Most widely used Machine Learning Algorithms in Bioinformatics are Artificial Neural Networks, Hidden Markov Modeling, Support Vector Machines, etc. Functional Annotation: For novel proteins that are yet to be characterized, the potential functions can be predicted by techniques such as Homology Modelling which provide an initial insight into the protein’s properties. 2 3 4 5
Definitions of the componentsPart 2 – Uses of structural databases 1 • Gene Ontology: Also known as GO terms, they are identifiers to represent a gene’s functional properties categorized to cover three domains namely, “cellular component”, “molecular function” and “biological process”. • Root Mean Square Deviation (RMSD): Qauantification of the average distance between the atoms of the super-imposed proteins. The higher is the RMSD value, the lower is the similarity. • Protein Structural Alignment Server: Web based servers which help in determining the structural similarity of two given proteins by superimposing the two proteins and calculating various comparative parameters. Currently there are a large number of web based servers assigned for this task. Few examples of available servers for this include DALI (Distance Matrix Alignment), MAMMOTH (Matching Molecular Models Obtained from Theory), CE/CE-MC (Combinatorial Extension -- Monte Carlo), SSAP(Sequential Structure Alignment Program), ProFit (Protein least-squares Fitting), etc. 2 3 4 5
Step 1: Structure Alignment - Input 1 Protein Structural Alignment Server (DALI) Enter the first PDB ID and Chain(or Upload a Protein Structure) 1A8O 1BAJ Enter the second PDB ID and Chain(or Upload a Protein Structure) 2 Submit 3 Running the Server… 3D Superimposition Non-aligned regions on super-imposed structures 4 Action Description of the action Audio Narration Web-Tool functioning Follow the steps as shown in the animations. Re-create all images. Enter the 2 IDs in the text box. Follow it with clicking effect on “Submit” Button. Show the action in progress effect as shown in the slide. Follow it with the two simple structures getting superimposed and highlight the no-aligned areas. Follow this with the actual output in the next slide. Two given proteins can be structurally aligned to evaluate the similarity between them. The server requires an input of two protein sequences or their IDs, which are then simulated and aligned based on their 3D coordinates, bond angles and dihedral angles. Few of the various servers available for this are DALI, MAMMOTH, CE/CE-MC, SSAP and ProFit. 5
Step 2: Structure Alignment- Output 1 Protein Structural Alignment Server (DALI) 2 It is the probability for similarity between the two structures. P-value < 0.05 indicates significant similarity Raw score of alignment is used to compare other similarity matches with same proteins 3 In super-imposed proteins, RMSD The average of the distances between the atoms 1BAJ 1A8O Percentage of identical residues in the sequences of the alignment P-value: 0.00e+00Score: 190.92RMSD: 0.75%Id: 94.0% 4 5 http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A
Step 2: Structure Alignment- Output 1 Action Description of the action Audio Narration Web-Tool functioning Follow the steps as shown in the animations. Mention the definitions of the result in audio narration as well as written format. Re-create all images. The results are 1. P-value: It is the probability measure that the two structure are similar. If P-value < 0.05 indicates significant similarity 2. Raw score: It is used to compare other similarity matches with same proteins 3. RMSD: Measure of the average distance between the atoms of the super-imposed proteins 4. Percentage sequence identity in the alignment 2 3 4 5 http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A
Step 3: Structure Prediction 1 Protein Structural Prediction Server Enter the sequence of amino acids (primary structure of protein) 2 DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERA Alpha Helix Beta Sheets 3 Coils Predicted Secondary Structure 4 5 http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html
Step 3: Structure Prediction Action Description of the action Audio Narration Web-Tool functioning Follow the steps as shown in the animations. Re-create all images. • Once the amino acid sequence of the protein is known, its secondary and tertiary structures can be predicted using many prediction algorithms, which utilize information from previous structurally characterized sequences. In the secondary structure prediction, • “h” represents Alpha Helix • “e” represents Beta Sheets, • “c” represents Coils • Since all known proteins have not yet been structurally characterized, this provides a useful bioinformatics analysis tool for researchers. The various servers for structure prediction are GOR, HNN, PredictProtein, NNPredict and Sspro. http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html
Probability GO Term Description Step 4: Functional Annotation 1 Membrane Intra-cellular organelle GO0432 GO0 89 % 74% Protein Functional Annotation Server Enter the sequence of amino acids (primary structure of protein) DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERA 2 Probability GO Term Description C21 Steroid Hormone Metabolism Vitamin Transport GO0189 GO0243 Probability GO Term Description 100 % 97% Functional Prediction 3 Vitamin D Binding Water Binding GO0549 GO0543 100 % 97% Cellular Component Molecular Functions Biological Functions 4 Action Description of the action Audio Narration Web-Tool functioning Follow the steps as shown in the animations. Re-create all images. Given a particular amino acid sequence, the cellular, molecular and biological processes associated with the sequence can be predicted using functional annotation servers. These processes are represented by a unique set of identifiers called “Gene Ontology Terms” or the “GO Terms”. The GO term can be a word or an alphanumeric identifier which includes a definition with cited sources and a namespace indicating the domain to which it belongs. The various server for this include DbAli Annolite, PFP, ProteomeAnalyst, GOPET, SpearMint and ProKnow. 5 http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1AO6, http://kiharalab.org/web/pfp.php
Interactivity option 1: Predict the 3 Dimensional Structure of Human Serum Albumin and cross-validate 1 Select a structural alignment tool and superimpose the predicted structure on the actual structure derived from the database 6 Check for the quality of the alignment. If the RMSD value is low, then the structural alignment is good. Thereby, the structure prediction was correct 7 Go to the “sequence details” tab and retrieve the FASTA sequence of the protein 3 2 Input the term “human serum albumin” in a structural Database 1 Predict the tertiary structure from the amino-acid sequence and save the predicted structure coordinates 5 Go to the 3D structure details and save the actual co-ordinates and the 3D structure of the protein, derived from experimental details 4 3 Click on the hit which matches with your query 2 4 Boundary/limits Results Interacativity Type Options Remove the step number mentioned in the tabs in “yellow” color. Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again” All the tabs must be arranged in right order. Arrange the steps in the order to be performed. Remove the step number from the bottom of the tab 5
Interactivity option 2.a - True/False - Questions 1 GO stands for “Genetic Oncology” DALI is a server for Protein Structural Alignment TRUE 2 SCOP is a classification scheme for Nucleic Acids p-value is one of the result from Structural Alignment FALSE In protein secondary structure, “e” stands for coil 3 RMSD stands for “Root Mean Square Distance” 4 Results Interactivity Type Options Flash the Questions one at a time. User needs to press either the “Green tab” marked “TRUE” or the “Red Tab” marked “FALSE”. If the answer is correct flash “Tick”. If the answer is incorrect flash “Cross”. For all questions which have an answer “False”, also mention the correct answer as shown in the next slide Next Slide True or False 5
Interactivity option 2.b - True/False - Correct Answers 1 GO stands for “Genetic Oncology” FALSE GO stands for “Genetic Ontology” DALI is a server for Protein Structural Alignment TRUE 2 SCOP is a classification scheme for Nucleic Acids FALSE SCOP is a classification scheme for Proteins In protein secondary structure, “e” stands for beta sheets p-value is one of the result from Structural Alignment TRUE RMSD stands for “Root Mean Square Deviation” In protein secondary structure, “e” stands for coil FALSE 3 RMSD stands for “Root Mean Square Distance” FALSE 4 Results Interacativity Type Options Flash the Questions one at a time. User needs to press either the “Green tab” marked “TRUE” or the “Red Tab” marked “FALSE”. If the answer is correct flash “Tick”. If the answer is incorrect flash “Cross” The questions are followed by their correct answers True or False 5
Interactivity option 2.c - True/False - Example 1 TRUE 2 SCOP is a classification scheme for Nucleic Acids DALI is a server for Protein Structural Alignment GO stands for “Genetic Oncology” FALSE The correct answer is “False”. GO stands for “Genetic Ontology” SCOP is a classification scheme for Proteins 3 4 Results Interacativity Type Options Boundary/limits Flash the Questions one at a time. User needs to press either the “Green tab” marked “TRUE” or the “Red Tab” marked “FALSE”. If the answer is correct flash “Tick”. If the answer is incorrect flash “Cross” and the correct answer as mentioned in the next slide This is an example slide to show the various cases of answers. True or False 5
Questionnaire 1 1. Which is the server for Protein Structure Prediction ? Answers: a) ProtParam b) PeptideMass c) nnPREDICT d) DALI 2. Which is the server for Functional annotation of Proteins? Answers: a) DALI b) GOR c) SSAP d) Proteome Analyst 3. Which amongst these is NOT the output for Functional annotation? Answers: a) GO Term b) Source Organism c) Probability of annotation d) Description of Function 4. By default, PDB structures appear in which visualization tool? Answers: a) VMD b) NAMD c) Jmol d) None of the above 5. PDB is primarily which Database? a) Protein b) Nucleotide c) Gene d) None of the Above 2 3 4 5
Links for further reading Reference websites http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html http://cubic.bioc.columbia.edu/predictprotein/ http://ekhidna.biocenter.helsinki.fi/dali_lite/start http://kiharalab.org/web/pfp.php http://pa.cs.ualberta.ca:8080/pa/index.html http://www.ebi.ac.uk/Tools/clustalw2/index.html http://www.pdb.org/pdb/home/home.do http://expasy.org/sprot/ http://expasy.org/prosite/ http://webdocs.cs.ualberta.ca/~bioinfo/PA/
Links for further reading Following URLs are used for animations http://www.pdb.org/pdb/search/advSearch.do http://www.pdb.org/pdb/explore/explore.do?structureId=1AUM http://www.pdb.org/pdb/workbench/showPrecalcAlignment.do?action=pw_fatcat&mol=1A8O.A&mol=1BAJ.A http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html http://www.pdb.org/pdb/explore/remediatedSequence.do?structureId=1AO6 http://kiharalab.org/web/pfp.php
Links for further reading Published Literature SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures Alexey G. Murzin, Steven E. Brenner, Tim Hubbard and Cyrus Chothia. J. Mol. Biol. (1995) 247, 536–540 CATH — a hierarchic classification of protein domain structures CA Orengo, AD Michie, S Jones, DT Jones, MB Swindells and JM Thornton Structure 1997, Vol 5 No 8 Books: Bioinformatics Sequence and Genome Analysis by David Mount