1 / 18

GlycoCT —a unifying sequence format for carbohydrates

GlycoCT —a unifying sequence format for carbohydrates. S. Herget , R.Ranzinger , K.Maass and C.-W.v.d.Lieth Presented by Yingxin Guo. An overview of the sequence formats used in glycobioinformatics. Special structural features.

rory
Download Presentation

GlycoCT —a unifying sequence format for carbohydrates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GlycoCT—a unifying sequence format for carbohydrates S. Herget, R.Ranzinger, K.Maass and C.-W.v.d.Lieth Presented by YingxinGuo

  2. An overview of the sequence formats used in glycobioinformatics

  3. Special structural features

  4. Uniqueness—A central requirement for encoding carbohydrate sequences • Why • Server as primary key in database • Beneficial for the implementation of exact structure search • How • Apply strict sorting rules • Define a controlled vocabulary • Support encoding of uncertain linkages and unspecified monosaccharides

  5. General idea of GlycoCT

  6. Basic monosaccharide namespace

  7. Basic residue(RES) entities in GlycoCT • Substituents and other entities

  8. Modeling the topology • Residue entities are modeled in RES section. • Linkages are modeled in LIN section. • Atom replacement schema.

  9. Encoding linkage

  10. Encoding Repeating units

  11. Encoding alternative units

  12. Encoding underdetermined units

  13. Sorting • Why • One central requirement is to generate a unique representation for all carbohydrates. • Sorting is used to determine the order of appearance of elements. • How • A set of hierarchical rules are used in GlycoCT to define the ordering of residues, linkages and special structural features. • Residue comparison algorithm • Linkage comparison algorithm • Underdetermined subtree comparison algorithm • Alternative subtree comparison algorithm

  14. Residue comparison • Apply when there are multiple starting points exist. • Rules • Number of child residues. • Length of the longest branch. • Number of terminal residues. • Number of branching points. • Lexical order.

  15. Linkage comparison • Rules • Number of bonds between parent and child residues. • Atom linkage position at the parent residue. • Atom linkage position at the child residue. • Linkage type at the parent residue. • Comparison of child residues with residue comparison algorithm. • Decide the internal orderof the RES and LIN sections

  16. Underdetermined subtree & Alternative subtree comparison • The encoding of UND and ALT is handled separately from the description of the other topological features. • Apply the set of rules from the residue and linkage comparison algorithm to each UND and ALT to determine internal order. • The reducing residues of UNDs and ALTs are compared with the residue comparison. • If two compared UNDs are identical, the parent residues and linkages(linkage between UND and main graph) are compared.

  17. First application and results • All the monosaccharides from CarbBank were translated to the naming defined by GlycoCT. • 1439 different names in CarbBank resulted in 474 different basetypes and 29 different substituents, reducing the number of distinct residues by 65%. • Two main reasons for the reduction • The separation of monosaccharides into basetype and substituents • The unique encoding for monosaccharides

  18. Conclusion • A superset of capabilities of all known sequence formats in glycobioinformatics • Support structurally undetermined sequences • The consistent naming scheme for monosaccharides can be easily maintained.

More Related