1 / 33

Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily

Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily. Gaurav Narale. Major Facilitator Superfamily (MFS). MEMBRANE TRANSPORT Largest secondary transporter protein family known so far with more than 1000 members identified. 1

livia
Download Presentation

Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily Gaurav Narale

  2. Major Facilitator Superfamily (MFS) • MEMBRANE TRANSPORT • Largest secondary transporter protein family known so far with more than 1000 members identified.1 • Use a solute gradient to drive the translocation of substrates such as ions, sugars, amino acids, peptides and other hydrophilic solutes.2 • Typically 400-600 amino acids long. • 12 transmembrane -helices, with both the N- and C-termini in the cytosol.3 • Two six-helix halves connected by a central loop. • Found in all three kingdoms of living organisms.

  3. Identifying Templates and Targets • TEMPLATES - Two known structures: • Lactose Permease (LacY) E. Coli • Glycerol-3-Phosphate Transporter (GlpT) E. Coli • Sequence identity between the two is negligible (~9%). • CE algorithm for structural alignment indicates that they superimpose over most of their chain length (RMSD~3.7Å) • 1st GOAL: To find a Eukaryotic member of the MFS that shows enough sequence identity with one of the known structures to allow reasonable alignment.

  4. Function and Mechanism of LacY and GlpT Both use a solute gradient to drive translocation of substrate: - LacY mediates the coupled transport of lactose and H+ - GlpT catalyzes the exhange of glycerol-3-phosphate for phosphate • Alternating-Access Model • Outward-facing conformation exposed to the extracellular side. • Inward-facing conformation exposed to the cytoplasm. • Ribbon Representation • Amino-terminal domain (blue). • Carboxyl-terminal domain (green). • Bends and other irregularities in the -helices are indicated by deviations from ideally straight and continuous helical ribbon.

  5. Identifying Templates and Targets • Lactose Permease (LacY) • Obtained protein pdb file from protein data bank (1PV6) and extracted amino acid sequence in FASTA format. www.rcsb.org/pdb • Searched for a TARGET with high sequence identity using NCBI BLAST. www.ncbi.hlm.nih.gov • General search against all organisms: 2 iterations, threshold 0.005 - hits were mainly bacterial proteins. 2. Saved the results as a profile (PSSM) 3. More sensitive search using the original sequence as well as the saved profile as input while limiting to a eukaryotic search: 2 iterations, threshold 0.01 • Unable to identify a suitable target.

  6. Identifying Templates and Targets • Glucose-3-Phosphate Transporter (GlpT) • Obtained protein pdb file from protein data bank (1PW4) and extracted amino acid sequence in FASTA format. www.rcsb.org/pdb • Searched for a TARGET with high sequence identity using NCBI BLAST. www.ncbi.hlm.nih.gov • General search against all organisms: 2 iterations, threshold 0.005 • Obtained a suitable TARGET: Glucose-6-Phosphate Translocase Homo Sapien 3. Utilized BLinkto identify several eukaryotic “close targets” for use in multiple sequence alignments.

  7. Multiple sequence alignment • Only template and target - initial review • Both templates, target and close targets • 15 proteins similar to the target selected from different species to get a better alignment • Only template and target extracted • Around 30 % similarity between template and target • Well distributed alignment

  8. Alignment using FUGUE 10 20 30 40 50 hs1pw4a ( 5 ) fkpaphkarlpaaeidptYrrlrwqIflGIffGyaAYylVRkNFALAMpy QUERY g6pt -------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPS aaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaa 60 70 80 90 100 hs1pw4a ( 55 ) L-veqgfsrgDLGfALSGISiAygfSkfimgsvSdrsnPrvfLPaGLilA QUERY g6pt LVEEIPLDKDDLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLV aaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa 110 120 130 140 150 hs1pw4a ( 104 ) AavMlfMGfvpwATssiavMfvlLflCGwfQGmGwpPCgrTmvhwwsqke QUERY g6pt GLVNIFFAWSSTV----PVFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQ aaaaaaaaa aaaa aaaaaaaaaaaaaaa aaaaaaaaa a 160 170 180 190 200 hs1pw4a ( 154 ) rggivsVwncAhNvggGiPPllFllGmawfndwhAALYmPAfcAilvAlf QUERY g6pt FGTWWAILSTSMNLAGGLGPILATILAQSY-SWRSTLALSGALCVVVSFL aaaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaaaa 210 220 230 240 250 hs1pw4a ( 204 ) AfamMrdTpqsCglppiee-----ykndtakqifmqyVlpnklLwyIAiA QUERY g6pt CLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTG aaaa aaaaaa aaaaaaaaa 260 270 280 290 300 hs1pw4a ( 262 ) NvfVyLLRYGiLDwSPtylkevKhfaldkSSwAYflYEyagipGTllCgw QUERY g6pt YLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAAGY aaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa 310 320 330 340 350 hs1pw4a ( 312 ) msdkv----------frgnrGaTGvfFMtlVtiaTivywmnpagNptvdm QUERY g6pt LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLWIL aaaa aaaaaaaaaaaaaaaaaa aaaaa 360 370 380 390 400 hs1pw4a ( 352 ) iCmivIGflIyGPvmLIglHAleLApkkAagtAagfTglfGylgGSvaAs QUERY g6pt VLGAVFGFSSYGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFL-AG aaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa 410 420 430 440 450 hs1pw4a ( 402 ) aiVGytvdffgwdgGfmvMigGSilAvilLivVmigekrrheqllqelvp QUERY g6pt LPFSTIAKHYSWSTAFWVAEVICAASTAAFFLLRNIRTKMGRVSKKAE-- aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa33333

  9. MPSA - only template and target P_1P4W FKPAPHKARLPAAEIDPTYRRLRWQIFLGIFFGYAAYYLVRKNFALAMPYLVEQG-FSRG GLUCOSE6HUMAN -------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPSLVEEIPLDKD * * ** .:* **: **: **.*::.** ***: :.:. P_1P4W DLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAAAVMLFMGFVPWATSSIAVM GLUCOSE6HUMAN DLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWS----STVPVF **** *. * **.:***: * :**: ..* ::.:**:*.. * :*:.: *::.*: P_1P4W FVLLFLCGWFQGMGWPPCGRTMVHWWSQKERGGIVSVWNCAHNVGGGIPPLLFLLGMAWF GLUCOSE6HUMAN AALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILSTSMNLAGGLGPILATI-LAQS .* ** * **:******:.: :*:. .: * :: . : *:.**: *:* : :* P_1P4W NDWHAALYMPAFCAILVALFAFAMMRDTPQSCGLP-----PIEEYKNDTAKQIFMQYVLP GLUCOSE6HUMAN YSWRSTLALSGALCVVVSFLCLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLL .*:::* :.. .::*:::.: :::: * . ** * * *.. :: :* :* P_1P4W NKLLWYIAIANVFVYLLRYGILDWSPTYLKEVKHFALDKSSWAYFLYEYAGIPGTLLCGW GLUCOSE6HUMAN SPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAAGY . ** :: . :.*: :: **. :* : * : .* * .*: *:: .*: P_1P4W MSDKVFRGN--------RGATGVFFMTLVTIATIVYWMNPAGN--PTVDMICMIVIGFLI GLUCOSE6HUMAN LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLWILVLGAVFGFSS :**:.: * . :*:*: :*:: :: :. :.: :: *:** P_1P4W YGPVMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDFFGWDGGFMVMI GLUCOSE6HUMAN YGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFLAGLPFSTIAKHYSWSTAFWVAEV ***: *:*: * * ** : .**: .:.**:. :** :*. .: : .: . ::. : P_1P4W GGSILAVILLIVVMIGEKRRHEQLLQELVP GLUCOSE6HUMAN ICAASTAAFFLLRNIRTKMGRVSKKAE--- : :. :::: * * : . *

  10. Extracted template-target P_1PW4 -------FKPAPHKARLPAAEIDPTYRRLRWQIFLGIFFGYAAYYLVRKNFALAMPYLVE gi|2765461|e --------------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPSLVE . P_1PW4 QGFS---RGDLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAAAVMLFMGFVP gi|2765461|e EIPLD--KDDLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWSS : . : . *. : P_1PW4 WATSS--IAVMFVLLFLCGWFQGMGWPPCGRTMVHWWSQKERGGIVSVWNCAHN--VGGG gi|2765461|e TVP------VFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILSTSMN--LAGG . : : . .. P_1PW4 IPP-------LLFLLGMAWFN-----------DWHAALYMPAFCAILVALFAFAMMRDTP gi|2765461|e LGP-------ILATILAQSYS------------WRSTLALSGALCVVVSFLCLLLIHNEP : . .. : P_1PW4 QSCGLPPIEEYKNDT-------------------AKQIFMQYVLPNKLLWYIAIANVFVY gi|2765461|e ADVGLRNLDPMPSEG--------------KKGSLKEESTLQELLLSPYLWVLSTGYLVVF :. . : . P_1PW4 LLRYGILDWSPTYLKEVKHFALDK-SSWAYFLYEYAGIPGTLLCGWMSDKVFR------- gi|2765461|e GVKTCCTDWGQFFLIQEKGQSALV-GSSYMSALEVGGLVGSIAAGYLSDRAMAKAGLSNY . . P_1PW4 -GNRGATGVFFMTLVTIATIVYWMNPAG---------------NPTVDMICMIVIGFLIY gi|2765461|e GNPRHGLLLFMMAGMTVSMYLFRVTVTSD-----------S--PKLWILVLGAVFGFSSY P_1PW4 GP-VMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDF-FGWDGGFMVM gi|2765461|e GP-IALFGVIANESAPPNLCGTSHAIVGLMANVG-GFLAGLPFSTIAKH-YSWSTAFWVA : P_1PW4 IGGSILAVILLIVVMIGEKRRHEQLLQELVP----------------------------- gi|2765461|e EVICAASTAAFFLLRNIRTKMGRVSKKAE-------------------------------

  11. Checking alignment in MODELER Using chk_align.top script _aln.pos 210 220 230 240 250 260 270 1PW4 MRDTPQSCGLPPIEEYKND/T-----AKQIFMQYVLPNKLLWYIAIANVFVYLLRYGILDWSPTYLKE G6PT IHNEPADVGLRNLDPMPSE-GKKGSLKEESTLQELLLSPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQ _consrvd * ** * * ** * ** * Problem near chain break _aln.pos 210 220 230 240 250 260 270 1PW4 MRDTPQSCGLPPIEEYKND/----TAKQIFMQYVLPNKLLWYIAIANVFVYLLRYGILDWSPTYLKEV G6PT IHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQE _consrvd * ** * * ** * ** *

  12. Modeler Runs • Using extracted template and target alignment • Sequence for template extracted from structure using Insight • Missing residues in structure appear as chain breaks • Parameters: • OUTPUT_CONTROL = 1 1 1 1 1 • STARTING_MODEL= 1 • ENDING_MODEL = 5 • LIBRARY_SCHEDULE = 4 • MD_LEVEL = 'refine_1'

  13. PROSA 2 runs • Used to evaluate models • Models with best scores from MODELER were compared using PROSA • Z value used for initial comparison • Graph used to identify location of major violations

  14. Model Selection Criteria • MODELER log file • Minimum energy • Number of violations • Number of really bad violations • Location of violations with respect to alignment and structure • PROSA 2 log file • Z score closest to template • Peaks and troughs in graph relative to template

  15. Adjusting the alignment • Comparison of structures obtained from modeler in Insight • Alignment violations clearly visible • Criteria for modifying alignment: • Unequal number of residues in loop • Unsatisfied structural similarity constraints • Residues violating constraints as generated by modeler

  16. 1st run - adjustment in Insight

  17. Loop Modeling • Modeler Run 2 • Loop Modeling Run 1

  18. Loop modeling • Generate models based on adjusted alignment • 25 models obtained • Models selected based on minimum energy and constraint violations • Parameters: • OUTPUT_CONTROL = 1 1 1 1 1 • STARTING_MODEL= 1 • ENDING_MODEL = 5 • LIBRARY_SCHEDULE = 2 • MD_LEVEL = 'refine_3’ • DO_LOOPS = 1 • LOOP_ENDING_MODEL = 5 • LOOP_MD_LEVEL = 'refine_3’

  19. Loop Modeling Run 1Best 4 Models Picked ID1, ID2 : 1 5 Current energy : 192 PROSA Z score : -6.60 ( Z score of template : -7.3 ) ID1, ID2 : 3 2 Current energy : 387 PROSA Z score : -6.57 ID1, ID2 : 4 2 Current energy : 363 PROSA Z score : -6.76 ID1, ID2 : 5 4 Current energy : 242 PROSA Z score : -6.3

  20. Violations - MODELER log file ID1, ID2 : 1 5 Current energy : 192.1849 # RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i ------------------------------------------------------------------------------------------------- 25 Phi/Psi pair of dihedral restraints: 64 44 11 36.170 140.638 79.036 1.000 ------------------------------------------------------------------------------------------------- Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000 # ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL 7 1360 45D 46K C N 368 370 -68.99 -70.20 30.80 2.20 -62.90 150.55 19.23 7 46K 46K N CA 370 371 109.62 140.40 -40.80 8 1361 46K 47D C N 377 379 173.18 54.50 123.21 12.43 -63.30 132.44 18.20 8 47D 47D N CA 379 380 7.79 40.90 -40.00 9 1362 47D 48D C N 385 387 -138.58 -63.30 76.02 11.52 -63.30 76.02 11.52 9 48D 48D N CA 387 388 -29.45 -40.00 -40.00 12 1369 103F 104A C N 811 813 -69.81 -68.20 21.24 1.77 -62.50 165.18 26.73 12 104A 104A N CA 813 814 124.12 145.30 -40.90 13 1370 104A 105A C N 816 818 -169.75 -62.50 107.58 21.02 -62.50 107.58 21.02 13 105A 105A N CA 818 819 -49.29 -40.90 -40.90

  21. 1st loop model - violations in Insight Residue 104 Residue 46

  22. Loop Model Run 1 - adjustment

  23. Loop Modeling 2 • Refinement of Loop Model 1 • Loop Modeling 2 • Modeler Run 3

  24. Loop Modeling Run 2Best 5 Models ID1, ID2 : 5 1 Current energy : 237.4322 PROSA Z score : -5.82 ID1, ID2 : 3 1 Current energy : 222.2522 PROSA Z score : -6.27 ID1, ID2 : 1 1 Current energy : 195.7286 PROSA Z score : -6.32 ID1, ID2 : 2 4 Current energy : 226.8002 PROSA Z score : -6.09 ID1, ID2 : 2 2 Current energy : 198.0359 PROSA Z score : -6.15

  25. Violations - MODELER log file ID1, ID2 : 1 1 Current energy : 195.7286 # RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i ------------------------------------------------------------------------------------------------- 4 Stereochemical improper torsion pot: 156 1 2 1.943 1.943 16.723 1.000 25 Phi/Psi pair of dihedral restraints: 67 40 11 34.260 132.074 73.358 1.000 ------------------------------------------------------------------------------------------------- Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000 # ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL 3 1430 45D 46K C N 368 370 -103.79 -118.00 33.92 1.76 -62.90 154.80 22.53 3 46K 46K N CA 370 371 169.89 139.10 -40.80 4 1431 46K 47D C N 377 379 -95.02 -70.90 59.16 2.00 -63.30 119.95 16.85 4 47D 47D N CA 379 380 -155.68 150.30 -40.00 5 1432 47D 48D C N 385 387 -63.33 -70.90 31.08 1.19 -63.30 160.16 19.77 5 48D 48D N CA 387 388 120.16 150.30 -40.00 9 1441 103F 104A C N 811 813 -122.41 -134.00 20.39 1.24 -62.50 166.47 30.50 9 104A 104A N CA 813 814 163.78 147.00 -40.90 10 1442 104A 105A C N 816 818 -64.90 -68.20 29.69 2.28 -62.50 156.71 25.57 10 105A 105A N CA 818 819 115.80 145.30 -40.90

  26. Loop Model Violation Sites

  27. Refinements in Final Model • Some regions can be realigned and refined further taking into consideration their energy violations. • Other tools could be used such as PROCHECK etc in addition to Modeler and PROSA to get further insight into energy details. • Structural alignment of model with other known transport protein structures might be of some help.

More Related