630 likes | 768 Views
D2DBT9 - Genetic Analysis and Bioinformatics. Bioinformatics of Proteins in One and Three Dimensions Dr. Jaume Bacardit jaume.bacardit@nottingham.ac.uk. Learning outcomes. To gain practical experience at using protein-related web-based biological databases and extracting information from them
E N D
D2DBT9 - Genetic Analysis and Bioinformatics Bioinformatics of Proteins in One and Three DimensionsDr. Jaume Bacarditjaume.bacardit@nottingham.ac.uk
Learning outcomes • To gain practical experience at using protein-related web-based biological databases and extracting information from them • To gain practical experience at using web-based protein structure prediction public services • Having basic knowledge about how to use protein visualisation tools • Have basic practical experience about how to perform homology modelling
Protein we are going to use today… • We are going to use in most examples the AXR4 protein from Arabidopsis Thaliana MAIITEEEEDPKTLNPPKNKPKDSDFTKSESTMKNPKPQSQNPFPFWFYFTVVVSLATII FISLSLFSSQNDPRSWFLSLPPALRQHYSNGRTIKVQVNSNESPIEVFVAESGSIHTETV VIVHGLGLSSFAFKEMIQSLGSKGIHSVAIDLPGNGFSDKSMVVIGGDREIGFVARVKEV YGLIQEKGVFWAFDQMIETGDLPYEEIIKLQNSKRRSFKAIELGSEETARVLGQVIDTLG LAPVHLVLHDSALGLASNWVSENWQSVRSVTLIDSSISPALPLWVLNVPGIREILLAFSF GFEKLVSFRCSKEMTLSDIDAHRILLKGRNGREAVVASLNKLNHSFDIAQWGNSDGINGI PMQVIWSSEASKEWSDEGQRVAKALPKAKFVTHSGSRWPQESKSGELADYISEFVSLLPK SIRRVAEEPIPEEVQKVLEEAKAGDDHDHHHGHGHAHAGYSDAYGLGEEWTTT
Biological databases • Uniprot • NCBI Entrez • Pfam
UniProt • UniProt is a collaboration between the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). • Main protein data base • http://www.uniprot.org/
Querying UniProt with a protein Name: AXR4 Uniprot ID
Included in the AXR4 page…. • Annotation of protein • Function, location, specificity, disruption phenotype • Gene Ontology • Sequence • Transmembrane potential • Bibliographic references • Cross-references to other databases • GenBank, PIR, KEGG, TAIR (Adapbidopsis-specific)
Scrolling down through the AXR4 page…. If we click here….
Returns these results • Now we select the most closer homologs and press align
Not only in Uniprot we have protein information…. • The NCBI’s Entrez system returns this for AXR4
Pfam returns three possible sequence motifs (but no significant results)
Protein Data Bank (PDB) Put your PDB ID Here Each protein in PDB is identified by a 4-letter code
Entry 2p31 Let’s click at display PDB file
PDB file for 2p31 Sequence Atomic coordinates of the amino acids
Prediction sites • Secondary Structure Prediction • Prediction of residue’s structural aspects • Tertiary structure prediction • Transmembrane prediction • Functional sites prediction • These servers perform very complex calculations • They sometimes take a day or two (or more) to reply • Generally users are notified by email when the results are ready
3D structure prediction • 3D Jury is a Meta-server for 3D PSP
Results of 3D-Jury • Good source of templates
LOMETS • The quick server from the Zhang group • Zhang’s I-Tasser is the best publicly available PSP server • Unfortunately it is very overloaded (for AXR4 it took 8 days to return a model • LOMETS performs fold recognition using several locally installed programs • Generates homology modelling from the alignments obtained in the FR process • Another good source of distant templates
mGenTHREADER Prediction results • More templates !!
Other 3D PSP servers • FUGUE • 3D-JIGSAW • Hhpred • SAM-T08 • ROBETTA (David Baker’s server. Heavily overloaded too) • Results of CASP8 (to see how these servers perform)
Infobiotic.net PSP server • Created here in Nottingham • It predicts a broad variety of residue’s structural aspects
PyMOL • One of the best protein visualisation tools • Free for educational use • Your can ask for a license at http://www.pymol.org/educational.html • I have a license, so if you would like to use it in your personal computers, you can download it from http://www.cs.nott.ac.uk/~jqb/pymol-1_1edu1-bin-win32.zip • I also have the Linux and MacOS versions • Please, do not distribute it
Controls are at the top right of the screen • A control (all) affects everything loaded into pymol • Also, you can control each loaded protein/selection individually. Right now there is only one protein (2p31) • Five types of controls: • Actions, Show, Hide, Label and Colour
To change to a cartoon visualisaton… • 2p31 Hide Everything • 2p31 Show Cartoon • 2p31 Colour Spectrum Rainbow • Now click on the middle of the screen, drag the mouse and this is what you obtain….
Visualising only chain A • As we saw in the PDB web site, this protein has two chains • To visualise only one of them, we have to create a selection • You have to type this at the pymol prompt: • PyMOL>select chainA, 2p31 and chain A • chainA is the label of the selection • Everything after the comma is the definition of the selection • We can select chains, residues and even atoms • Type “help selection” to see all possible options
Visualising only chain A • All Hide Everything • chainA Show Cartoon • chainA Color Spectrum Rainbow • chainA Action Zoom
Showing the protein surface • chainA Show Surface • Type this: set transparency=0.5
Simple Homology Modelling • We are going to use Modeller • Free for academic use • http://salilab.org/modeller/9v6/modeller9v6.exe • Licence key: MODELIRANJE • 1st step: Installing it. • When choosing the destination path, choose c:\temp (in B08/B09) • Modeller is a very sophisticated tool where you can controll almost any aspect of the homology modelling process • Here we are only going to use the simplest options
Chain we are going to model ENLYFQSMINSFYAFEVKDAKGRTVSLEKYKGKVSLVVNVASDCQLTDRNYLGLKELHKEFGPSHFSVLAFPCNQFGESEPRPSKEVESFARKNYGVTFPIFHKIKILGSEGEPAFRFLVDSSKKEPRWNFWKYLVNPEGQVVKFWRPEEPIEVIRPDIAALVRQVIIKKKEDL T0388 LOC493869A, Homo sapiens CASP target ID
Selecting the template • The perfect match exists, because right now the structure for this target is already public • We are going to ignore it, and use chain A of protein 2p31 instead
2nd step: Creating an alignment • Modeller has a sophisticated alignment tool • Uses structural information from the template • Dynamic programming instead of the approximate method of blast • To create the alignment you need to: • Download the PDB file of the template • Put your sequence in PIR format (example) • Edit the alignment script to set the template and chain • Call modeller: mod9v6.exe align.py
PIR file • Just replace the sequence with your own one • The last line in the sequence needs to end in * • Do not touch anything else from the file, or the alignment script will not work >P1;target sequence:target:::::::0.00: 0.00 ENLYFQSMINSFYAFEVKDAKGRTVSLEKYKGKVSLVVNVASDCQLTDRNYLGLKELHKE FGPSHFSVLAFPCNQFGESEPRPSKEVESFARKNYGVTFPIFHKIKILGSEGEPAFRFLV DSSKKEPRWNFWKYLVNPEGQVVKFWRPEEPIEVIRPDIAALVRQVIIKKKEDL*
Align.py from modeller import * from modeller.automodel import * env = environ() aln = alignment(env) template='2p31' chain='A' tc=template+chain mdl = model(env, file=template, model_segment=('FIRST:'+chain,'LAST:'+chain)) aln.append_model(mdl, align_codes=tc, atom_files=template+'.pdb') aln.append(file='target.ali', align_codes='target') aln.align2d() aln.write(file='target-'+tc+'.ali', alignment_format='PIR') aln.write(file='target-'+tc+'.pap', alignment_format='PAP') Just change the value of these 2 lines with your template
Results of the alignment • Alignment is different from that produced by BLAST • Modeller has ignored the residues lacking structural information _aln.pos 10 20 30 40 50 60 2p31A -----Q----DFYDFKAVNIRGKLVSLEKYRGSVSLVVNVASECGFTDQHYRALQQLQRDLGPHHFNV target ENLYFQSMINSFYAFEVKDAKGRTVSLEKYKGKVSLVVNVASDCQLTDRNYLGLKELHKEFGPSHFSV _consrvd * ** * * ****** * ********* * ** * * * ** ** * _aln.p 70 80 90 100 110 120 130 2p31A LAFPCNQFGQQEPDSNKEIESFARRTYSVSFPMFSKIAVTGTGAHPAFKYLAQTSGKEPTWNFWKYLV target LAFPCNQFGESEPRPSKEVESFARKNYGVTFPIFHKIKILGSEGEPAFRFLVDSSKKEPRWNFWKYLV _consrvd ********* ** ** ***** * * ** * ** * *** * * *** ******** _aln.pos 140 150 160 170 2p31A APDGKVVGAWDPTVSVEEVRPQITALVR---------- target NPEGQVVKFWRPEEPIEVIRPDIAALVRQVIIKKKEDL _consrvd * * ** * * * ** * ****
Creating the model from modeller import * from modeller.automodel import * log.verbose() env = environ() template='2p31' chain='A' tc=template+chain class MyModel(automodel): def get_model_filename(self,sequence, id1, id2, file_ext): return sequence+'_'+`id2`+file_ext def special_restraints(self, aln): rsr = self.restraints a = MyModel(env, alnfile='target-'+tc+'.ali', knowns=tc, sequence='target', assess_methods=(assess.DOPE, assess.GA341)) a.starting_model = 1 a.ending_model = 5 a.make() • 5 models are created • Each of them can be slightly different • Models are going to be assessed using 2 different criteria