1 / 63

D2DBT9 - Genetic Analysis and Bioinformatics

D2DBT9 - Genetic Analysis and Bioinformatics. Bioinformatics of Proteins in One and Three Dimensions Dr. Jaume Bacardit jaume.bacardit@nottingham.ac.uk. Learning outcomes. To gain practical experience at using protein-related web-based biological databases and extracting information from them

lala
Download Presentation

D2DBT9 - Genetic Analysis and Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. D2DBT9 - Genetic Analysis and Bioinformatics Bioinformatics of Proteins in One and Three DimensionsDr. Jaume Bacarditjaume.bacardit@nottingham.ac.uk

  2. Learning outcomes • To gain practical experience at using protein-related web-based biological databases and extracting information from them • To gain practical experience at using web-based protein structure prediction public services • Having basic knowledge about how to use protein visualisation tools • Have basic practical experience about how to perform homology modelling

  3. Protein we are going to use today… • We are going to use in most examples the AXR4 protein from Arabidopsis Thaliana MAIITEEEEDPKTLNPPKNKPKDSDFTKSESTMKNPKPQSQNPFPFWFYFTVVVSLATII FISLSLFSSQNDPRSWFLSLPPALRQHYSNGRTIKVQVNSNESPIEVFVAESGSIHTETV VIVHGLGLSSFAFKEMIQSLGSKGIHSVAIDLPGNGFSDKSMVVIGGDREIGFVARVKEV YGLIQEKGVFWAFDQMIETGDLPYEEIIKLQNSKRRSFKAIELGSEETARVLGQVIDTLG LAPVHLVLHDSALGLASNWVSENWQSVRSVTLIDSSISPALPLWVLNVPGIREILLAFSF GFEKLVSFRCSKEMTLSDIDAHRILLKGRNGREAVVASLNKLNHSFDIAQWGNSDGINGI PMQVIWSSEASKEWSDEGQRVAKALPKAKFVTHSGSRWPQESKSGELADYISEFVSLLPK SIRRVAEEPIPEEVQKVLEEAKAGDDHDHHHGHGHAHAGYSDAYGLGEEWTTT

  4. Biological databases • Uniprot • NCBI Entrez • Pfam

  5. UniProt • UniProt is a collaboration between the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). • Main protein data base • http://www.uniprot.org/

  6. Querying UniProt with a protein Name: AXR4 Uniprot ID

  7. Included in the AXR4 page…. • Annotation of protein • Function, location, specificity, disruption phenotype • Gene Ontology • Sequence • Transmembrane potential • Bibliographic references • Cross-references to other databases • GenBank, PIR, KEGG, TAIR (Adapbidopsis-specific)

  8. Scrolling down through the AXR4 page…. If we click here….

  9. Blasting the AXR4 sequence…

  10. Returns these results • Now we select the most closer homologs and press align

  11. Aligning the homologs (ClustalW)

  12. ClustalW also generates phylogenetic trees

  13. Not only in Uniprot we have protein information…. • The NCBI’s Entrez system returns this for AXR4

  14. Pfam: sequence-based detection of protein families

  15. Pfam returns three possible sequence motifs (but no significant results)

  16. Protein Data Bank (PDB) Put your PDB ID Here Each protein in PDB is identified by a 4-letter code

  17. Entry 2p31 Let’s click at display PDB file

  18. PDB file for 2p31 Sequence Atomic coordinates of the amino acids

  19. Other Biological Databases

  20. Prediction sites • Secondary Structure Prediction • Prediction of residue’s structural aspects • Tertiary structure prediction • Transmembrane prediction • Functional sites prediction • These servers perform very complex calculations • They sometimes take a day or two (or more) to reply • Generally users are notified by email when the results are ready

  21. PSIPRED: Secondary Structure Prediction

  22. Results of PSIPRED…

  23. 3D structure prediction • 3D Jury is a Meta-server for 3D PSP

  24. Results of 3D-Jury • Good source of templates

  25. Results of 3D Jury (scrolling to the right)

  26. LOMETS • The quick server from the Zhang group • Zhang’s I-Tasser is the best publicly available PSP server • Unfortunately it is very overloaded (for AXR4 it took 8 days to return a model • LOMETS performs fold recognition using several locally installed programs • Generates homology modelling from the alignments obtained in the FR process • Another good source of distant templates

  27. LOMETS results

  28. mGenTHREADER Prediction results • More templates !!

  29. Other 3D PSP servers • FUGUE • 3D-JIGSAW • Hhpred • SAM-T08 • ROBETTA (David Baker’s server. Heavily overloaded too) • Results of CASP8 (to see how these servers perform)

  30. Infobiotic.net PSP server • Created here in Nottingham • It predicts a broad variety of residue’s structural aspects

  31. Results from the Infobiotic.net server

  32. Firestar:Functional sites prediction

  33. TMHMM: Transmembrane prediction

  34. PyMOL • One of the best protein visualisation tools • Free for educational use • Your can ask for a license at http://www.pymol.org/educational.html • I have a license, so if you would like to use it in your personal computers, you can download it from http://www.cs.nott.ac.uk/~jqb/pymol-1_1edu1-bin-win32.zip • I also have the Linux and MacOS versions • Please, do not distribute it 

  35. Let’s downalod 2p31 and open it from pymol

  36. Controls are at the top right of the screen • A control (all) affects everything loaded into pymol • Also, you can control each loaded protein/selection individually. Right now there is only one protein (2p31) • Five types of controls: • Actions, Show, Hide, Label and Colour

  37. To change to a cartoon visualisaton… • 2p31  Hide  Everything • 2p31  Show  Cartoon • 2p31  Colour  Spectrum  Rainbow • Now click on the middle of the screen, drag the mouse and this is what you obtain….

  38. Visualising only chain A • As we saw in the PDB web site, this protein has two chains • To visualise only one of them, we have to create a selection • You have to type this at the pymol prompt: • PyMOL>select chainA, 2p31 and chain A • chainA is the label of the selection • Everything after the comma is the definition of the selection • We can select chains, residues and even atoms • Type “help selection” to see all possible options

  39. Visualising only chain A • All  Hide  Everything • chainA  Show  Cartoon • chainA  Color  Spectrum  Rainbow • chainA  Action  Zoom

  40. Showing the protein surface • chainA  Show  Surface • Type this: set transparency=0.5

  41. Simple Homology Modelling • We are going to use Modeller • Free for academic use • http://salilab.org/modeller/9v6/modeller9v6.exe • Licence key: MODELIRANJE • 1st step: Installing it. • When choosing the destination path, choose c:\temp (in B08/B09) • Modeller is a very sophisticated tool where you can controll almost any aspect of the homology modelling process • Here we are only going to use the simplest options

  42. Chain we are going to model ENLYFQSMINSFYAFEVKDAKGRTVSLEKYKGKVSLVVNVASDCQLTDRNYLGLKELHKEFGPSHFSVLAFPCNQFGESEPRPSKEVESFARKNYGVTFPIFHKIKILGSEGEPAFRFLVDSSKKEPRWNFWKYLVNPEGQVVKFWRPEEPIEVIRPDIAALVRQVIIKKKEDL T0388 LOC493869A, Homo sapiens CASP target ID

  43. 1st step: BLAST against PDB

  44. Selecting the template • The perfect match exists, because right now the structure for this target is already public • We are going to ignore it, and use chain A of protein 2p31 instead

  45. 2nd step: Creating an alignment • Modeller has a sophisticated alignment tool • Uses structural information from the template • Dynamic programming instead of the approximate method of blast • To create the alignment you need to: • Download the PDB file of the template • Put your sequence in PIR format (example) • Edit the alignment script to set the template and chain • Call modeller: mod9v6.exe align.py

  46. PIR file • Just replace the sequence with your own one • The last line in the sequence needs to end in * • Do not touch anything else from the file, or the alignment script will not work >P1;target sequence:target:::::::0.00: 0.00 ENLYFQSMINSFYAFEVKDAKGRTVSLEKYKGKVSLVVNVASDCQLTDRNYLGLKELHKE FGPSHFSVLAFPCNQFGESEPRPSKEVESFARKNYGVTFPIFHKIKILGSEGEPAFRFLV DSSKKEPRWNFWKYLVNPEGQVVKFWRPEEPIEVIRPDIAALVRQVIIKKKEDL*

  47. Align.py from modeller import * from modeller.automodel import * env = environ() aln = alignment(env) template='2p31' chain='A' tc=template+chain mdl = model(env, file=template, model_segment=('FIRST:'+chain,'LAST:'+chain)) aln.append_model(mdl, align_codes=tc, atom_files=template+'.pdb') aln.append(file='target.ali', align_codes='target') aln.align2d() aln.write(file='target-'+tc+'.ali', alignment_format='PIR') aln.write(file='target-'+tc+'.pap', alignment_format='PAP') Just change the value of these 2 lines with your template

  48. Results of the alignment • Alignment is different from that produced by BLAST • Modeller has ignored the residues lacking structural information _aln.pos 10 20 30 40 50 60 2p31A -----Q----DFYDFKAVNIRGKLVSLEKYRGSVSLVVNVASECGFTDQHYRALQQLQRDLGPHHFNV target ENLYFQSMINSFYAFEVKDAKGRTVSLEKYKGKVSLVVNVASDCQLTDRNYLGLKELHKEFGPSHFSV _consrvd * ** * * ****** * ********* * ** * * * ** ** * _aln.p 70 80 90 100 110 120 130 2p31A LAFPCNQFGQQEPDSNKEIESFARRTYSVSFPMFSKIAVTGTGAHPAFKYLAQTSGKEPTWNFWKYLV target LAFPCNQFGESEPRPSKEVESFARKNYGVTFPIFHKIKILGSEGEPAFRFLVDSSKKEPRWNFWKYLV _consrvd ********* ** ** ***** * * ** * ** * *** * * *** ******** _aln.pos 140 150 160 170 2p31A APDGKVVGAWDPTVSVEEVRPQITALVR---------- target NPEGQVVKFWRPEEPIEVIRPDIAALVRQVIIKKKEDL _consrvd * * ** * * * ** * ****

  49. Creating the model from modeller import * from modeller.automodel import * log.verbose() env = environ() template='2p31' chain='A' tc=template+chain class MyModel(automodel): def get_model_filename(self,sequence, id1, id2, file_ext): return sequence+'_'+`id2`+file_ext def special_restraints(self, aln): rsr = self.restraints a = MyModel(env, alnfile='target-'+tc+'.ali', knowns=tc, sequence='target', assess_methods=(assess.DOPE, assess.GA341)) a.starting_model = 1 a.ending_model = 5 a.make() • 5 models are created • Each of them can be slightly different • Models are going to be assessed using 2 different criteria

More Related