480 likes | 596 Views
Protein Structure Exercise. Bioinformatics Tools and Databases Foothill College. Protein Sequences. From NCBI, search either proteins or genomes – using keywords etc. From Genome, type in HIV-1 What links do you have from there? Choose NC_001802, then coding region
E N D
Protein Structure Exercise Bioinformatics Tools and Databases Foothill College
Protein Sequences • From NCBI, search either proteins or genomes – using keywords etc. • From Genome, type in HIV-1 • What links do you have from there? • Choose NC_001802, then coding region • From that entry, save FASTA protein • Identify the gag-pol and env sequence
HIV-1 Gag-Pol AA Sequence >gi|28872819|ref|NP_057849.4| Gag-Pol; Gag-Pol polyprotein [Human immunodeficiency virus 1] MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQT GSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQG QMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAA EWDRVHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRMYSPT SILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPAATLEEMMTAC QGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEG HQMKDCTERQANFLREDLAFLQGKAREFSSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQ VTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAI GTVLVGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEK EGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGD AYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQY MDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWT VNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSK DLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPI QKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGR QKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYL AWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKC QLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTI HTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIH NFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQD NSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
BLASTing PDB • Open two browsers • Open the URL to NCBI BLAST P • BLAST the PDB database with the amino acid sequence from gag-pol and then env • Go to http://us.expasy.org/tools/blast/ • BLAST the PDB database as above • What are the top structures at each site?
Digging Deeper into Sequence • From the expasy PDB BLAST return: • Choose another sequence (close or far) and do a Multiple Sequence Alignment • Choose BLOSSUM or PAM matrices • View the alignments in HTML format
NCBI BLASTp of PDB • After doing the BLAST P of PDB: • Click on related structures to see more • Follow the PDB links to the MMDB • Hint: you can use some of these structures at VAST for structure comparisons • How can you display the structure? • RasMol, SPDBV, and Cn3D viewers
BLAST P – Other Data • From NCBI BLAST P – what are the conserved domains that are detected? • Click on each to find the Pfam entries • Show domain relatives (CDART) • (The next two images show results for gag-pol and env proteins – try both) • Path is from CDD to CDART – explore!
Conserved Domain Databases • NCBI contains a database of conserved domains. These are linked, by sequence to BLAST and other tools. • Conserved domains represent “functional folds” in nature’s playbook. • You can compare your sequence by alignment (Pfam) to other protein folds. • Use CDART for graphical domain display.
exPASy Proteomics Tools • http://us.expasy.org/tools/ • Protein identification and characterization • DNA -> Protein • Similarity searches, pattern and profile searches • Post translational modifications • Topology prediction • Primary structure analysis • Secondary structure prediction • Tertiary structure • Sequence alignment • Biological text analysis
exPASy ScanProsite • Go to exPASy ScanProsite • http://www.expasy.ch/tools/scanprosite/ • Enter either HIV sequence (gag-pol or env) into the search box • You can choose email data return here • What are the post translational modifications? Click on the references.
PIR – Georgetown University • Go to http://pir.georgetown.edu/ • Choose the iProClass database http://pir.georgetown.edu/iproclass/ • Paste in the gag-pol sequence • Look at the BLAST hits • Try the links to domain display and pattern match. What do you see?
Pfam • Go to The Pfam Home page at: http://www.sanger.ac.uk/Software/Pfam • Choose Protein search • Enter the HIV-1 gag-pol sequence • The search may take 3 to 5 minutes • The page return will show protein families and conserved domains
SMART • Simple Modular Architecture Research Tool • Sequence analysis • Architecture analysis • Search with sequence or accession • Don’t forget to check a database: • Pfam • Signal peptides • Internal peptides
NCBI Structure Tools • http://www.ncbi.nlm.nih.gov/Structure/ • Modeling to for the MMDB and PDB • MMDB – Molecular Modeling Data Base • PDB – Protein Data Bank • Search by keyword (HIV-1 or gag-pol) • Follow links in and out of MMDB / PDB • RasMol, Chime, Cn3D structure viewers
MMDB • Molecular Modeling Database • http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml • Contains weekly updates from PDB • “The structure database is considerably smaller than Entrez's protein or nucleotide databases, but a large fraction of all known protein sequences have homologs in this set”
Cn3D Structure Viewer • Structure viewer • PC, Unix / Linux, Mac OSX etc. • Helper application • Structure view / sequence view • Can align and show multiple sequences • Has a great online tutorial • (read carefully) and try it out! • Exports files as PNG for great presos too!
VAST and VAST Search • Vector Alignment Search Tool • VAST Search is a service that allows searching for structural neighbors starting with a set of 3D-coordinates specified by the user. • Type in a structure code (PDB) view similar alignments and click to import. • (Start by BLASTing the PDB database).
CDD and CDD Search • http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml • Enter a sequence, accession number, or search by keyword • CDD is linked from BLAST so you may enter it while doing sequence analysis • Where there are CDDs there often is homology – or close cousins (UniGene)
The Protein Machine • http://www2.ebi.ac.uk/translate/ • For translating nucleotide sequences into protein in three different modes • You can choose the sense strand or complement or any reading frame • You can start and end at any position • You can select any translation table • Or enter an accession number