1 / 22

18 th International Unicode Conference

18 th International Unicode Conference. Documentum and UTF-8: Converting Content Management Software Product Line to Unicode. 27 April 2001 Donald Ziff. Agenda. What is Documentum? Documentum’s I18N Problem How Unicode UTF-8 Saved the Day Other Success Factors Demo.

alena
Download Presentation

18 th International Unicode Conference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 18th International Unicode Conference Documentum and UTF-8: Converting Content Management Software Product Line to Unicode 27 April 2001 Donald Ziff Documentum Proprietary

  2. Agenda • What is Documentum? • Documentum’s I18N Problem • How Unicode UTF-8 Saved the Day • Other Success Factors • Demo Documentum Proprietary Documentum Proprietary and Confidential

  3. About Documentum • Documentum: NASDAQ “DCTM” • The Leader in Web and Enterprise Content Management Solutions • > $128M in revenue 1999. > 800 employees. • Over 900+ Global 2000 customers with strong vertical focus • Over 25 Offices in 10+ countries Documentum Proprietary

  4. DCTM’s I18N Problem • Everyone agrees: we need I18N to fuel growth – especially in Asia • Asian-certified product much more important than multi-lingual • Although demand for multi-lingual is growing… • So why not I18N? Documentum Proprietary

  5. I18N Perception Problems • Too Difficult – won’t fit into a development cycle • Too much Overhead – multiplies QA and Support • Not Sexy – no new functionality Let’s look at these problems… Documentum Proprietary

  6. “I18N is too difficult” Product Layers: • Server (built on RDBMS + Verity) • DMCL: Client Library (C++) • DFC: Foundation Classes (Java) • DTC: Desktop Client – Win32 end-user client • WDK: Web Development Kit • RightSite: Legacy Web-Server Integration • Web Publisher: Web Content Management App • Legacy clients: Workspace (Win32), Intranet Documentum Proprietary

  7. History Lesson • Server v3.1.6.INT, created by consultants for Japanese market, was expensive and time-consuming • 3.1.6.INT attempted to internationalize all the layers in the DCTM architecture at once • 4.0 was released without I18N changes • 4.1 followed, the deltas from 3.1.6 to 3.1.6.INT became hard to apply… Documentum Proprietary

  8. “I18N requires too much overhead” • The DCTM server requires pharmaceutical-strength certification • Dimensions of certifications: • 3 RDBMS platforms: Oracle, Sybase, SQL-Server • 4 Server OS’s: NT, Solaris, HPUX, AIX • The 3.1.6.INT architecture introduced new dimensions, leading us to… Documentum Proprietary

  9. Certification Hell! • New certification dimensions: • 5 DCTM Server code-pages • 5 RDBMS code-pages • Market requires another dimension: • 5 Server OS Localizations • 125 new times 12 old  1500 certs! • Exaggeration, of course… But still… Documentum Proprietary

  10. “I18N not sexy” • DCTM is a growth company, needs sizzle as well as steak • I18N grows markets, but doesn’t add much to marketing message • To be fair: new functionality is not just “sexy” – it is essential to DCTM’s continued survival • Other priorities will move to the top… Documentum Proprietary

  11. DCTM’s I18N Requirements • Crucial need: support Asia from the main code-line. One binary for the world • Backward compatibility essential • Multi-lingual features would be a side-benefit. High on the wish list for a few key customers • I18N project must be scoped down to be achievable Documentum Proprietary

  12. How UTF-8 Saved the Day • UTF-8 moves safely through the server because anything that looks like ASCII actually is • Standardizing on UTF-8 as the only supported internal code-page cuts down certification matrix Documentum Proprietary

  13. Lessons from Double-Byte Experiments • EUC-KR: 4.1 server works (basically) • SJIS: problems! double-byte characters whose second bytes are ASCII: \ ` | • Lessons: • Non-ASCII moves through the server safely • String handling need not be double-byte aware, if ASCII always means ASCII • Solution: UTF-8! Documentum Proprietary

  14. UTF-8: ASCII is ASCII • No need for special string handling • Server 3.1.6.INT replaced all standard c string handling with calls to 3rd-party library • With UTF-8, we stick with standard – yacc and other legacy tools work fine • Greatly improved perception (and reality) of how difficult I18N would be • Now, it’s relatively low-impact Documentum Proprietary

  15. It’s UTF-8, dummy! • Use UTF-8 everywhere, cut down on certification dimensions • Provides safe character-handling for Asia • Even though multi-lingual is not a requirement • Easier to support Documentum Proprietary

  16. Other Success Factors • Rely on RDBMS services to translate between RDBMS code-page and UTF-8 • Market research cut back on OS localization constraints • Transcoding infrastructure Documentum Proprietary

  17. RDBMS transcodes to/from UTF-8 • Oracle and Sybase transcode automatically – SQL Server is a problem • No need for new transcoding calls between Server and RDBMS – lower impact • Upgrade customers have non-unicode RDBMS – no need for them to convert • One less certification dimension! Documentum Proprietary

  18. Cut back on Localized OS certs • Limit RDBMS for Asia – for 4.2, just Oracle • Localized OS certification not necessary for Europe Documentum Proprietary

  19. Transcoding Infrastructure • Server must be aware of interface code-pages • Transcoding done at the interfaces • 3rd party transcoding used: Uniscape’s GlobalC Documentum Proprietary

  20. New I18N Architecture Desktop Client Custom WebApp Web Publisher Intranet Client Administrator WDK (Unicode) Rightsite(NCS) WorkSpace DFC (Unicode) Web Cache ARP(NCS) ( UTF8) DMCL (4.2) DMCL ≤ 4.1 (NCS) e-Content Server (UTF8) Legend: National Character Set Unicode File System Verity RDBMS (Unicode) Documentum Proprietary

  21. Demo • Demo – multilingual WDK • If there’s time, a quick look at localized Desktop Client (Win32 Client) Documentum Proprietary

  22. Conclusion UTF-8 was a crucial technology in DCTM’s I18N strategy: • Provided an easy path for legacy C++ • Supported specific Asian languages consistently, minimizing certifications • Prepared infrastructure for multi-lingual requirements Documentum Proprietary

More Related