340 likes | 506 Views
Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily. Gaurav Narale. Major Facilitator Superfamily (MFS). MEMBRANE TRANSPORT Largest secondary transporter protein family known so far with more than 1000 members identified. 1
E N D
Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily Gaurav Narale
Major Facilitator Superfamily (MFS) • MEMBRANE TRANSPORT • Largest secondary transporter protein family known so far with more than 1000 members identified.1 • Use a solute gradient to drive the translocation of substrates such as ions, sugars, amino acids, peptides and other hydrophilic solutes.2 • Typically 400-600 amino acids long. • 12 transmembrane -helices, with both the N- and C-termini in the cytosol.3 • Two six-helix halves connected by a central loop. • Found in all three kingdoms of living organisms.
Identifying Templates and Targets • TEMPLATES - Two known structures: • Lactose Permease (LacY) E. Coli • Glycerol-3-Phosphate Transporter (GlpT) E. Coli • Sequence identity between the two is negligible (~9%). • CE algorithm for structural alignment indicates that they superimpose over most of their chain length (RMSD~3.7Å) • 1st GOAL: To find a Eukaryotic member of the MFS that shows enough sequence identity with one of the known structures to allow reasonable alignment.
Function and Mechanism of LacY and GlpT Both use a solute gradient to drive translocation of substrate: - LacY mediates the coupled transport of lactose and H+ - GlpT catalyzes the exhange of glycerol-3-phosphate for phosphate • Alternating-Access Model • Outward-facing conformation exposed to the extracellular side. • Inward-facing conformation exposed to the cytoplasm. • Ribbon Representation • Amino-terminal domain (blue). • Carboxyl-terminal domain (green). • Bends and other irregularities in the -helices are indicated by deviations from ideally straight and continuous helical ribbon.
Identifying Templates and Targets • Lactose Permease (LacY) • Obtained protein pdb file from protein data bank (1PV6) and extracted amino acid sequence in FASTA format. www.rcsb.org/pdb • Searched for a TARGET with high sequence identity using NCBI BLAST. www.ncbi.hlm.nih.gov • General search against all organisms: 2 iterations, threshold 0.005 - hits were mainly bacterial proteins. 2. Saved the results as a profile (PSSM) 3. More sensitive search using the original sequence as well as the saved profile as input while limiting to a eukaryotic search: 2 iterations, threshold 0.01 • Unable to identify a suitable target.
Identifying Templates and Targets • Glucose-3-Phosphate Transporter (GlpT) • Obtained protein pdb file from protein data bank (1PW4) and extracted amino acid sequence in FASTA format. www.rcsb.org/pdb • Searched for a TARGET with high sequence identity using NCBI BLAST. www.ncbi.hlm.nih.gov • General search against all organisms: 2 iterations, threshold 0.005 • Obtained a suitable TARGET: Glucose-6-Phosphate Translocase Homo Sapien 3. Utilized BLinkto identify several eukaryotic “close targets” for use in multiple sequence alignments.
Multiple sequence alignment • Only template and target - initial review • Both templates, target and close targets • 15 proteins similar to the target selected from different species to get a better alignment • Only template and target extracted • Around 30 % similarity between template and target • Well distributed alignment
Alignment using FUGUE 10 20 30 40 50 hs1pw4a ( 5 ) fkpaphkarlpaaeidptYrrlrwqIflGIffGyaAYylVRkNFALAMpy QUERY g6pt -------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPS aaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaa 60 70 80 90 100 hs1pw4a ( 55 ) L-veqgfsrgDLGfALSGISiAygfSkfimgsvSdrsnPrvfLPaGLilA QUERY g6pt LVEEIPLDKDDLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLV aaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa 110 120 130 140 150 hs1pw4a ( 104 ) AavMlfMGfvpwATssiavMfvlLflCGwfQGmGwpPCgrTmvhwwsqke QUERY g6pt GLVNIFFAWSSTV----PVFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQ aaaaaaaaa aaaa aaaaaaaaaaaaaaa aaaaaaaaa a 160 170 180 190 200 hs1pw4a ( 154 ) rggivsVwncAhNvggGiPPllFllGmawfndwhAALYmPAfcAilvAlf QUERY g6pt FGTWWAILSTSMNLAGGLGPILATILAQSY-SWRSTLALSGALCVVVSFL aaaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaaaa 210 220 230 240 250 hs1pw4a ( 204 ) AfamMrdTpqsCglppiee-----ykndtakqifmqyVlpnklLwyIAiA QUERY g6pt CLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTG aaaa aaaaaa aaaaaaaaa 260 270 280 290 300 hs1pw4a ( 262 ) NvfVyLLRYGiLDwSPtylkevKhfaldkSSwAYflYEyagipGTllCgw QUERY g6pt YLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAAGY aaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa 310 320 330 340 350 hs1pw4a ( 312 ) msdkv----------frgnrGaTGvfFMtlVtiaTivywmnpagNptvdm QUERY g6pt LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLWIL aaaa aaaaaaaaaaaaaaaaaa aaaaa 360 370 380 390 400 hs1pw4a ( 352 ) iCmivIGflIyGPvmLIglHAleLApkkAagtAagfTglfGylgGSvaAs QUERY g6pt VLGAVFGFSSYGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFL-AG aaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa 410 420 430 440 450 hs1pw4a ( 402 ) aiVGytvdffgwdgGfmvMigGSilAvilLivVmigekrrheqllqelvp QUERY g6pt LPFSTIAKHYSWSTAFWVAEVICAASTAAFFLLRNIRTKMGRVSKKAE-- aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa33333
MPSA - only template and target P_1P4W FKPAPHKARLPAAEIDPTYRRLRWQIFLGIFFGYAAYYLVRKNFALAMPYLVEQG-FSRG GLUCOSE6HUMAN -------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPSLVEEIPLDKD * * ** .:* **: **: **.*::.** ***: :.:. P_1P4W DLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAAAVMLFMGFVPWATSSIAVM GLUCOSE6HUMAN DLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWS----STVPVF **** *. * **.:***: * :**: ..* ::.:**:*.. * :*:.: *::.*: P_1P4W FVLLFLCGWFQGMGWPPCGRTMVHWWSQKERGGIVSVWNCAHNVGGGIPPLLFLLGMAWF GLUCOSE6HUMAN AALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILSTSMNLAGGLGPILATI-LAQS .* ** * **:******:.: :*:. .: * :: . : *:.**: *:* : :* P_1P4W NDWHAALYMPAFCAILVALFAFAMMRDTPQSCGLP-----PIEEYKNDTAKQIFMQYVLP GLUCOSE6HUMAN YSWRSTLALSGALCVVVSFLCLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLL .*:::* :.. .::*:::.: :::: * . ** * * *.. :: :* :* P_1P4W NKLLWYIAIANVFVYLLRYGILDWSPTYLKEVKHFALDKSSWAYFLYEYAGIPGTLLCGW GLUCOSE6HUMAN SPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAAGY . ** :: . :.*: :: **. :* : * : .* * .*: *:: .*: P_1P4W MSDKVFRGN--------RGATGVFFMTLVTIATIVYWMNPAGN--PTVDMICMIVIGFLI GLUCOSE6HUMAN LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLWILVLGAVFGFSS :**:.: * . :*:*: :*:: :: :. :.: :: *:** P_1P4W YGPVMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDFFGWDGGFMVMI GLUCOSE6HUMAN YGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFLAGLPFSTIAKHYSWSTAFWVAEV ***: *:*: * * ** : .**: .:.**:. :** :*. .: : .: . ::. : P_1P4W GGSILAVILLIVVMIGEKRRHEQLLQELVP GLUCOSE6HUMAN ICAASTAAFFLLRNIRTKMGRVSKKAE--- : :. :::: * * : . *
Extracted template-target P_1PW4 -------FKPAPHKARLPAAEIDPTYRRLRWQIFLGIFFGYAAYYLVRKNFALAMPYLVE gi|2765461|e --------------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPSLVE . P_1PW4 QGFS---RGDLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAAAVMLFMGFVP gi|2765461|e EIPLD--KDDLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWSS : . : . *. : P_1PW4 WATSS--IAVMFVLLFLCGWFQGMGWPPCGRTMVHWWSQKERGGIVSVWNCAHN--VGGG gi|2765461|e TVP------VFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILSTSMN--LAGG . : : . .. P_1PW4 IPP-------LLFLLGMAWFN-----------DWHAALYMPAFCAILVALFAFAMMRDTP gi|2765461|e LGP-------ILATILAQSYS------------WRSTLALSGALCVVVSFLCLLLIHNEP : . .. : P_1PW4 QSCGLPPIEEYKNDT-------------------AKQIFMQYVLPNKLLWYIAIANVFVY gi|2765461|e ADVGLRNLDPMPSEG--------------KKGSLKEESTLQELLLSPYLWVLSTGYLVVF :. . : . P_1PW4 LLRYGILDWSPTYLKEVKHFALDK-SSWAYFLYEYAGIPGTLLCGWMSDKVFR------- gi|2765461|e GVKTCCTDWGQFFLIQEKGQSALV-GSSYMSALEVGGLVGSIAAGYLSDRAMAKAGLSNY . . P_1PW4 -GNRGATGVFFMTLVTIATIVYWMNPAG---------------NPTVDMICMIVIGFLIY gi|2765461|e GNPRHGLLLFMMAGMTVSMYLFRVTVTSD-----------S--PKLWILVLGAVFGFSSY P_1PW4 GP-VMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDF-FGWDGGFMVM gi|2765461|e GP-IALFGVIANESAPPNLCGTSHAIVGLMANVG-GFLAGLPFSTIAKH-YSWSTAFWVA : P_1PW4 IGGSILAVILLIVVMIGEKRRHEQLLQELVP----------------------------- gi|2765461|e EVICAASTAAFFLLRNIRTKMGRVSKKAE-------------------------------
Checking alignment in MODELER Using chk_align.top script _aln.pos 210 220 230 240 250 260 270 1PW4 MRDTPQSCGLPPIEEYKND/T-----AKQIFMQYVLPNKLLWYIAIANVFVYLLRYGILDWSPTYLKE G6PT IHNEPADVGLRNLDPMPSE-GKKGSLKEESTLQELLLSPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQ _consrvd * ** * * ** * ** * Problem near chain break _aln.pos 210 220 230 240 250 260 270 1PW4 MRDTPQSCGLPPIEEYKND/----TAKQIFMQYVLPNKLLWYIAIANVFVYLLRYGILDWSPTYLKEV G6PT IHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQE _consrvd * ** * * ** * ** *
Modeler Runs • Using extracted template and target alignment • Sequence for template extracted from structure using Insight • Missing residues in structure appear as chain breaks • Parameters: • OUTPUT_CONTROL = 1 1 1 1 1 • STARTING_MODEL= 1 • ENDING_MODEL = 5 • LIBRARY_SCHEDULE = 4 • MD_LEVEL = 'refine_1'
PROSA 2 runs • Used to evaluate models • Models with best scores from MODELER were compared using PROSA • Z value used for initial comparison • Graph used to identify location of major violations
Model Selection Criteria • MODELER log file • Minimum energy • Number of violations • Number of really bad violations • Location of violations with respect to alignment and structure • PROSA 2 log file • Z score closest to template • Peaks and troughs in graph relative to template
Adjusting the alignment • Comparison of structures obtained from modeler in Insight • Alignment violations clearly visible • Criteria for modifying alignment: • Unequal number of residues in loop • Unsatisfied structural similarity constraints • Residues violating constraints as generated by modeler
Loop Modeling • Modeler Run 2 • Loop Modeling Run 1
Loop modeling • Generate models based on adjusted alignment • 25 models obtained • Models selected based on minimum energy and constraint violations • Parameters: • OUTPUT_CONTROL = 1 1 1 1 1 • STARTING_MODEL= 1 • ENDING_MODEL = 5 • LIBRARY_SCHEDULE = 2 • MD_LEVEL = 'refine_3’ • DO_LOOPS = 1 • LOOP_ENDING_MODEL = 5 • LOOP_MD_LEVEL = 'refine_3’
Loop Modeling Run 1Best 4 Models Picked ID1, ID2 : 1 5 Current energy : 192 PROSA Z score : -6.60 ( Z score of template : -7.3 ) ID1, ID2 : 3 2 Current energy : 387 PROSA Z score : -6.57 ID1, ID2 : 4 2 Current energy : 363 PROSA Z score : -6.76 ID1, ID2 : 5 4 Current energy : 242 PROSA Z score : -6.3
Violations - MODELER log file ID1, ID2 : 1 5 Current energy : 192.1849 # RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i ------------------------------------------------------------------------------------------------- 25 Phi/Psi pair of dihedral restraints: 64 44 11 36.170 140.638 79.036 1.000 ------------------------------------------------------------------------------------------------- Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000 # ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL 7 1360 45D 46K C N 368 370 -68.99 -70.20 30.80 2.20 -62.90 150.55 19.23 7 46K 46K N CA 370 371 109.62 140.40 -40.80 8 1361 46K 47D C N 377 379 173.18 54.50 123.21 12.43 -63.30 132.44 18.20 8 47D 47D N CA 379 380 7.79 40.90 -40.00 9 1362 47D 48D C N 385 387 -138.58 -63.30 76.02 11.52 -63.30 76.02 11.52 9 48D 48D N CA 387 388 -29.45 -40.00 -40.00 12 1369 103F 104A C N 811 813 -69.81 -68.20 21.24 1.77 -62.50 165.18 26.73 12 104A 104A N CA 813 814 124.12 145.30 -40.90 13 1370 104A 105A C N 816 818 -169.75 -62.50 107.58 21.02 -62.50 107.58 21.02 13 105A 105A N CA 818 819 -49.29 -40.90 -40.90
1st loop model - violations in Insight Residue 104 Residue 46
Loop Modeling 2 • Refinement of Loop Model 1 • Loop Modeling 2 • Modeler Run 3
Loop Modeling Run 2Best 5 Models ID1, ID2 : 5 1 Current energy : 237.4322 PROSA Z score : -5.82 ID1, ID2 : 3 1 Current energy : 222.2522 PROSA Z score : -6.27 ID1, ID2 : 1 1 Current energy : 195.7286 PROSA Z score : -6.32 ID1, ID2 : 2 4 Current energy : 226.8002 PROSA Z score : -6.09 ID1, ID2 : 2 2 Current energy : 198.0359 PROSA Z score : -6.15
Violations - MODELER log file ID1, ID2 : 1 1 Current energy : 195.7286 # RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i ------------------------------------------------------------------------------------------------- 4 Stereochemical improper torsion pot: 156 1 2 1.943 1.943 16.723 1.000 25 Phi/Psi pair of dihedral restraints: 67 40 11 34.260 132.074 73.358 1.000 ------------------------------------------------------------------------------------------------- Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000 # ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL 3 1430 45D 46K C N 368 370 -103.79 -118.00 33.92 1.76 -62.90 154.80 22.53 3 46K 46K N CA 370 371 169.89 139.10 -40.80 4 1431 46K 47D C N 377 379 -95.02 -70.90 59.16 2.00 -63.30 119.95 16.85 4 47D 47D N CA 379 380 -155.68 150.30 -40.00 5 1432 47D 48D C N 385 387 -63.33 -70.90 31.08 1.19 -63.30 160.16 19.77 5 48D 48D N CA 387 388 120.16 150.30 -40.00 9 1441 103F 104A C N 811 813 -122.41 -134.00 20.39 1.24 -62.50 166.47 30.50 9 104A 104A N CA 813 814 163.78 147.00 -40.90 10 1442 104A 105A C N 816 818 -64.90 -68.20 29.69 2.28 -62.50 156.71 25.57 10 105A 105A N CA 818 819 115.80 145.30 -40.90
Refinements in Final Model • Some regions can be realigned and refined further taking into consideration their energy violations. • Other tools could be used such as PROCHECK etc in addition to Modeler and PROSA to get further insight into energy details. • Structural alignment of model with other known transport protein structures might be of some help.