1 / 25

Ghent University Workshop June 15th 2001 Enzymes responsible for the degradation of hemicelluloses

Ghent University Workshop June 15th 2001 Enzymes responsible for the degradation of hemicelluloses. Endo-xylanases from family 11 : computer analysis of protein sequences reveals structural relationships Johan Wouters Xpr, Univ of Namur.

amanda
Download Presentation

Ghent University Workshop June 15th 2001 Enzymes responsible for the degradation of hemicelluloses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ghent University Workshop June 15th 2001 Enzymes responsible for the degradation of hemicelluloses Endo-xylanases from family 11 : computer analysis of protein sequences reveals structural relationships Johan Wouters Xpr, Univ of Namur

  2. > Biodegradation of the xylan backbone is accomplished by a complex set of enzymes generically called endo xylanases (E.C. 3.2.1.8), which are produced by fungi and bacteria (Biely, 1985). > A large number of endoxylanases have been described, purified, characterized and sequenced from different microorganisms (Sunna and Antranikian, 1997). > Based on sequence similarities and hydrophobic cluster analysis, they have been grouped in families 10 (or F) and 11 (or G) of the glycosyl hydrolases (Gilkes et al. 1991; Henrissat and Davies, 1997). Family 11 endo-xylanases

  3. > Family 11 xylanases have a small catalytic domain (around 20 kDa), with an all ß-strand sandwich fold structure (Himmel et al. 1997). > Endoxylanases have been found to be useful for several different biotechnological applications such as cellulose pulp biobleaching (Buchert et al. 1994), bread-making (Courtin et al. 1999) and saccharification of lignocellulosic biomass (Lee, 1997). These applications require enzymes capable of operating under different “extreme” conditions. Therefore, for the design and protein engineering of endoxylanases with different characteristics, a good knowledge of their sequence and structure is of great importance.

  4. Why do we align? Databases: Searching for homologous protein sequences by similarity Modeling: Prediction of structural features by homology, by threading Bench: Prediction of essential residues for rational site directed mutagenesis, oligonucleotides design for cloning homologous genes by PCR Multiple alignments and Matchbox

  5. Most often used approach Databases:1 hour Handling several paste & go servers Rapidly going through the outputs Choosing the one that ‘ looks better ’ Modeling: < 1 month Choosing the best template Refining the model Validating it Bench:> 1 year Multiple alignments and Matchbox

  6. The false positive problem 1 hour hypothesizing on a non significant alignment = 1 year making a mess before planting that the hypothesis was wrong Multiple alignments and Matchbox

  7. The Match box alignment program ‘Align’ (Depiereux et al Comput. Appl. Biosci. 13(3):249-256 (1997)) Performs simultaneous multiple alignment giving confidence score for each aligned position > Helps prediction of structurally conserved regions > Helps prediction of functional sites > Leads to the identification of targets for site directed mutagenesis Multiple alignments and Matchbox

  8. 1 153 1sdy_A 2 153 1spd_A 3 154 dfef 4 151 2sod_O 10 20 30 40 50 60 70 + + + + + + + 1 --vqavavlkgdagvsgvvkfeqaseSEpttvsyeiagnSpnaergfhihefgdatngcvsagphfnpfk 2 -AtkavcvlkgdgpvqgiinfeqkesNGpvkvwgsikgl-teglhgfhvhefgdntagctsagphfnpls 3 ATkkavavlkgtsnvegvvtltqeddG-pttvnvrisgl-apgkhgfhlhefgdttngcmstgphfnpdk 4 -Atkavcvlkgdgpvqgtihfeakgd--tvvvtgsitgl-tegdhgfhvhqfgdntqgctsagphfnpls 333333333333333555555777 77777777777 333311111111111331111111111133 The Match box alignment program ‘Align’ (Scores refer to a window of 9 residues) Determined from the analysis of 4900 test aligned positions

  9. > 82 sequences of family 11 endo-xylanases (catalytic domains of the mature proteins) were included in the analysis (Sapag, de Ioannes, Eyzaguirre (Santiago, Chile)) > Sequences were obtained from the literature and from public databases (SwissProt, GenBank and CAZy) > The sequences range in length from 175 to 233 residues : 36 sequences from bacteria, 43 from fungi, 2 from protozooa and 1 from insects. > Experimental physico chemical data (optimum T, pI, optimum pH) associated to the proteins were retrieved from the literature Sequences Analysis

  10. > Multiple sequence alignment using MATCHBOX. Conserved boxes define predicted Structural Conserved Regions (pSCR). Residues contained in the boxes have a score below 5 (for scores below 5 on more than 2 non-redundant sequences, a rate of confidence over 90% (less than 10% of false positive) can be expected, even when the percentage of identity is very low (10%). > Additional run of multiple alignment of segments not contained in the boxes using CLUSTAL (Higgins et al. 1992). > Thus, the final alignment (MATCHTAL) takes advantage of both approaches: MATCHBOX for definition of the boxes (pSCR’s), and CLUSTAL for the multiple alignment of the segments outside the boxes.

  11. Sec. struct. B2 A2 A3 B3 A5 B5 Asn/Asp Box 1 Similarity higher in sec. struct. Elements - Gaps are in the loops

  12. Box 2 Box 3 Box 4 B6 cord B9 B8 thumb B7 A6 helix B4 A4 Cord and thumb regions conserved Glu Glu

  13. B2 A2 A3 B3 A5 B5 B6 cord B9 B8 thumb B7 A6 helix B4 A4 E E Multiple sequence alignment

  14. Position 55, W predominates (70/82) Position 52 aromatic: Y (63/82), W or F (14/82). 62 sequences have a Asp in position 57. Residue 40 is a G (65/82) 49 G’s are observed in position 76 and 48 in position 77. No similarities in the first 30 residues. One of the sequences (Polypastron multivesiculosum) starts only at residue 81, and lacks strands B1, B2 and part of A2, suggesting that B1 and B2 may not be necessary for enzymatic activity. Strand A2 shows low similarity except at position 81, where 24 Y and 35 M are found. Six sequences lack A2 altogether. Between B2 and A2, two long insertions (probably forming a loop) are found in the endoxylanases from Penicillium sp. and P. stipitis.

  15. Topologies of family 11 endo-xylanases allowing the separation of those enzymes into two groups. Exs 11 type I : endo-xylanases from T.reseei XYNI (1XYN), A.niger (1UKR), A.kawachii (1BK1), B.circulans (1BCX). Exs 11 type II : endo-xylanases from T.reseei XYNII (1XYP), T.lanuginosa (1YNA), T.harzianum (1XND), P.varioti (1PVX), B.agaradhaerens (1QH6), D.thermophilum XynB (1F5J), streptomyces sp S38.

  16. At the end of B3, G 109 is present in 79 sequences and allows a special twist to the chain (Törrönen and Rouvinen, 1997) Residue 100 is in all cases a D (27) or N (65): this residue is related to the pH optimum of the enzyme. Glu 167 Highly conserved segment YGW (152-154) (Box1, B5). Y152 is strongly hydrogen bonded to E167; it seems to be of great functional importance since its mutation to F (in B. circulans XLNA) leads to a totally inactive enzyme (Wakarchuk et al. 1994). A5 is not present in P. stipitis In A3, a highly conserved F or Y (79 sequences) is found in position 89 and 75 W in 93; in this last case, the few replacements are F or Y.

  17. At position 150 21 Cys are present, the highest number found at a particular location in all the sequences. Cys residues are not very common (0.7%). Twenty-seven of the sequences have no Cys and 14 have only one; therefore, at least half of the total possesses no disulfide bridges. The “cord” shows low similarity (except Pro 185 (70/82). Glu 167 B9 is a stretch of very low similarity and varies considerably in length, but it is followed by the third high similarity box. Between B9 and B8, two highly conserved residues are apparent: D204 (76/82) and G205 (80/82); this last residue is considered important in hairpin formation (Törrönen and Rouvinen, 1997). Y208 (82/82), R215 (72/82) and N217 (69/82).

  18. Box 4, H262 (79/82), W263 (80/82) and G270 (79/82). B7 is also well conserved; residue 227 is a T in 77 sequences and 228 is F in 81 cases; a consensus sequence QYWSVR can be proposed for residues 230-235 B4 does not show high similarities, except for E289 Glu 289 The “thumb” is well conserved; it has a consensus sequence PSIXG where X is almost any residue and the others show few and mainly homologous replacements. Pro 219 gives a twist to the strand at the beginning of the thumb. The carboxy terminus of the sequences is highly variable in length

  19. The Match box alignment program ‘Explore’ (Depiereux et al Comput. Appl. Biosci. 13(3):249-256 (1997)) Performs principal component analysis > Helps in finding relation (distances) between sequence similarities > Allows phylogenetic analysis > Allows classification Multiple alignments and Matchbox

  20. The Match box alignment program ‘Explore’ MAO Hum A Rat A Bov A Hum B Rat B Hum A 1.0 Rat A 0.97 1.0 (DFK matrix) Bov A 0.98 0.97 1.0 Hum B 0.94 0.89 0.93 1.0 Rat B 0.95 0.92 0.93 0.98 1.0 The similarity matrix is treated by principal coordinates analysis to produce a graphical representation of the similarity between the sequences. Sequences are represented in a two-dimensional space, the distances between them are correlated with their similarity

  21. Sequences analysis (factor analysis) The similarity matrix is treated by principal coordinates analysis to produce a graphical representation of the similarity between the sequences. Sequences are represented in a two-dimensional space, the distances between them are correlated with their similarity

  22. A Ce.fi.XYLD • A Th.fu.TfxA • A St.sp.S38 • A St.th.STII • A St.li.XlnC • A Ba.ci.1xnb • A St.sp.EC3 • A St.li.XlnB • A Ba.su.xylA • A Ba.sp.xylA • C Di.th.xylB • C Ru.fl.XynB • C Ru.fl.XynD • C Ca.sp.xyl • C Ru.sp.Xyl1 • C Ru.fl.XynA • B Ba.ag.1qh6 • B Ru.al.XynA • B Po.mu.POX • B Po.mu.xylA • B Cl.ac.xylB • B Ba.sp.xylJ • B Ba.sp.xylY • B Cl.th.xylV • B Cl.st.XylA • B Cl.th.xylA • B Cl.th.xylB • B Cl.th.xylU • B Ba.Pu.XYNA • Ia Tr.re.1xyp • Ia Tr.re.XYII • Ia Tr.vi.xIIA • Ia Tr.vi.xIIB • Ia Th.la.1yna • Ia Pe.sp.XynA • Ia Pa.va.1pvx • Ia Ch.gr.CgXB • Ia Aso.pi.xyl • Ia As.or.XyG1 • Ia As.nid.X24 • Ia As.ka.xylB • Ia Tr.ha.1xnd • Ia As.nid.X22 • Ia Ma.gr.XY22 • Ia Co.ca.Xyl1 • Ia Ch.gr.CgXA • Ia As.tu.xylB • Ia As.ni.XynN • Ib Co.ca.Xyl2 • Ib Cla.pur.Xy • Ib Co.sa.xyl • Ib Sc.co.xylA • Ib Hu.in.Xyl1 • Ib Ba.D3.BDX • Ic Ce.mi.XYLA • Ic Ps.fl.XYLE • Ic Ph.co.xyl • II As.ni.1ukr • II Pe.pu.XynB • II As.tu.XYLA • II As.ni.Xyn5 • II As.ka.1bkl • II As.aw.EXLA • II Au.pu.XylA • IIIa Ne.fr.XY3A • IIIa Ne.fr.XY3B • IIIa Ne.fr.XYA1 • IIIa Ne.fr.XYA2 • IIIa Orpin.XynA • IIIb Fi.su.XyCA • IIIb Fi.su.XyCB • IIIb Pi.sp.XYA1 • IIIb Pi.sp.XYA2 • IIIb Ne.fr.xyl2 • iV Co.ca.Xyl3 • iV Pi.st.XynA bacterial enzymes are initalic and enzymes produced by protozooa or nsects areunderlined Sequences analysis (factor analysis)

  23. > 82 amino acid sequences (catalytic domains of the mature proteins) of family 11 endoxylanases have been aligned using the program MATCHBOX. > Four boxes defining segments of high similarity are detected; they correspond to regions of defined secondary structure: B5, B6, B8 and the carboxyl end of the alpha helix, respectively. > The two glutamate acting as catalytic residues are conserved in all sequences. Conclusions

  24. > A very good correlation is found between the presence (at position 100) of an asparagine in the so-called ‘alkaline’ xylanases, or an aspartic acid in those with a more acidic pH optimum. > Cysteine residues are not common in these sequences (0.7% of all residues), and disulfide bridges are not important in explaining the stability and activity of the thermophilic xylanases. > The alignment allows the classification of the enzymes in groups according to sequence similarity. Fungal and bacterial enzymes can be found to form clusters of higher similarity. Conclusions

  25. Solutions in structural bio physico chemistry Ligand-receptor docking Sequences analysis Protein structures Stereoelectronic properties of ligands http://www.fundp.be/~jwouters/XPR/ Spin-off project DGTRE : Johan Wouters Laboratoire de Chimie Moléculaire Structurale Unité de Recherche en Biologie Moléculaire

More Related