370 likes | 380 Views
Content Discovery in Regulated and Litigious Industries: The Pro-active Role of XML. Paul Wlodarczyk VP Content Lifecycle Solutions XMetaL, a JustSystems company 9 November 2006. "We are drowning in information". June 16, 2005 BofA, Brokerage Affiliates to Pay $1.5M E-mail Fine.
E N D
Content Discovery in Regulated and Litigious Industries: The Pro-active Role of XML Paul Wlodarczyk VP Content Lifecycle Solutions XMetaL, a JustSystems company 9 November 2006
"We are drowning in information" June 16, 2005 BofA, Brokerage Affiliates to Pay $1.5M E-mail Fine Bank of America Corp brokerage affiliates will pay the SEC $1.5 million to settle charges they failed to preserve business e-mails. Between January 2001 and February 2004, the units did not ensure its software kept e-mail, the SEC said. INFOGLUT You are here. You will stay here. Failed Methods Turn Information AssetsInto Information Liabilities Source: Gartner
Across the Enterprise Enterprise Trx. Customers Employees Partners Information Products Orgs. Financials Reports E-Mail Web Content Documents Databases Media Management Across All Content Managing Information as a Strategic Asset Delivers Value Efficiency Differentiation Process Simplification Promote reuse and data quality Compliance Transparency of information "Infoglut" Manage expanding volumes Vendor Consolidation Spend less on same technology M&A Reduce integration burdens Enterprise Agility Sense and respond Continuous Flow Real Time Closed-loop analytics Single View Consistent and holistic view across all channels Relationship management Revenue Optimization Support top-line growth on cross-sell/upsell Leverage global purchasing power Source: Gartner
Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the people, process, and technology enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?
Gartner: Defining Enterprise Information Management Enterprise information management (EIM) is an integrative discipline for structuring, describing and governing information assets regardless of organizational and technological boundaries to improve operational efficiency, promote transparency and enable business insight. Source: Gartner
Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the process, people, technology and content enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?
Structured vs. Unstructured Information • Business Transactions consist of data (structured information) • Business Decisions are often based on documents (unstructured information)
The Challenge of Structured / Unstructured Convergence Complexity & Dynamics of Data / Document Convergence LitigationInteroperability DiscoveryProcessIntegration RegulationInfoglut Decisions Transactions The World of Data The World of Documents Data = Information Structured for Machine Processing Document = Information Presented for Human Processing
Unstructured • Opaque • A snapshot in time • Passive • Indexed & searched • Mixes content& presentation • Pushed through a deterministic workflow • Protected by applications • All or Nothing (a file) • Application-specific Structured • Self-describing • An audit trail • Active • Discovered • Separates content (meaning)from presentation (format) • Navigates through a dynamic process • Protects & Tracks itself • Fine-grained (objects) • Application-independent Contrasting Unstructured and Structured Content
Strategic Planning Assumption By 2009, organizations will spend on the order of $3 billion in the worldwide market on unstructured data management – at least half of what they spend on structured data management (0.8 probability).
Unstructured content creation in the enterprise • Office Documents (word processing, spreadsheet, email) • Many decision documents (contracts, policies/procedures, proposals, forms) still largely unstructured, little or no semantic markup • Content entry through enterprise applications • Exists as plain text or XHTML, e.g. • ERP • e-Commerce • Call center / CRM / customer support applications • PLM – Product Lifecycle Management • Little or no semantic markup • Desktop publishing • Largely unstructured outside of high tech / technical publications • Starting to move to XML because of L10N, multi-channel • User-generated content through Web • Blogs, forms or wiki markup – little or no semantic markup • Rich media • E.g. e-learning, rich communications e.g. Flash – little or no semantic markup
Content Must Be Described to Be Processed by Machines Word Processing ECM BCS Business Intelligence Security screening Spreadsheet Applications KM Data Mining Transactions Information Access E mail SQL ODF XBRL .doc, .xsl, .ppt ASCII/Unicode Flash Sign Language Standards RSS DITA XML mpeg SOAP, WSSDL jpeg Open XML Doc Format OWL RDF Text Master data Repositories Paper Audio Calculations Illustration Indexes Formats Photographs XML vocabularies Graphics Metadata Less structure, machine inaccessible Humans process Orientation Machines process More structure, machine accessible Minimal Metadata Database tables Hierarchy + Metadata + References Content Types Blobs Files, Repositories Cells Source: Gartner plus JustSystems
Strategic Planning Assumption By 2009, separate and sometimes conflicting approaches to dealing with documents and databases will give way to enterprise information management programs that deal with all data as part of the organization's enterprise architecture strategy (0.7 probability).
Approaches to EIM • Reactive • Indexing and searching content post facto; data-mining (e.g. Autonomy, Clear Forest, Google, etc.) • Requires technology investment only • Proactive • Indexing content as it is created (XML, metadata, taxonomies, records management, etc.) • Requires investments in people, process, technology, and content
Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the process, people, technology and content enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?
Enablers to Proactive EIM • Process • Best methods for EIM need to be defined and propagated (e.g. Gartner model) • People • Information Architects to do the work • CWA and other ethnographic approaches to assure uptake and compliance • Content • Broader definition and adoption of standard XML vocabularies like DITA • Technology • Maturing of the XML ecosystem
Vision Strategy Governance Organization Process Enabling Infrastructure Metrics Process: Proactive EIM is a comprehensive program, not just technology Vision: How is information perceived and valued in the organization? Is it a bi-product, a shareable resource or source of differentiation? Gartner's Essential Building Blocks for EIM Strategy: How is information currently managed? Is it ad-hoc, departmental, or is there an enterprise focus? Governance: What decision rights and controls exist for managing information as an asset and who is involved? Organization: What information-centric roles exist and where are they located? Process: Are there practices (such as stewardship) and standards around the information lifecycle? Enabling infrastructure: How well do information management technologies support current and future needs? Metrics: How much is spent managing information? How much information is redundant? How much poor quality information exists and what impact does it have on the business? Source: Gartner
Strategic Planning Assumption By 2007, information architects will establish the principles, governance processes, models and framework for improving the accuracy and integrity of information assets as part of an organization's commitment to enterprise information management (0.7 probability).
People: Information Architect Roles Contribute to EIM Success Information Architect (Web, Records Management or Content Level) Information Architect (BI or Application Level) Information Architect (Enterprise Level - EIA) • Focus on strategic information requirements • Publish enterprise standards • Draft enterprise information models and meta models • Formalize principles • Establish governance • Develop Information Value Network Model • Who: Enterprise Planners and Modelers • Methods of classification: modeling and frameworks (e.g. Gartner Enterprise Architecture, Zachman, FEAF, IEEE, OMG) • Create data models and meta models • Implement stewardship and quality objectives • Focus on integration • Oversee sourcing, profiling and transformation • Implement Common Business Vocabularies • Who: Data Modelers, DBAs • Follow rigorous SDLC • Methods for classification: data models, process models, object models • Work with multimedia tools • Content-driven, not metadata-driven • Navigation, personalization • XML DTD design, standards and forms creation • Create document and data retention schedules • Who: Records Management Specialists, Information science, library science or cognitive science backgrounds, portal • Methods for classification – taxonomies, ontologies, tagging Source: Gartner
Strategic Planning Assumption The need to deliver business value from information assets will force Enterprise Information Architecture to mature as a discipline in 70% of Global 2000 organizations by 2008 (0.7 probability).
Strategic Planning Assumption Fully mature semantic reconciliation tools will not be available until 2011 (0.7 probability). By year-end 2009, 40 percent of a multinational company's data will be defined in some way by XML (0.7 probability). By year-end 2009, 75% of the Global 500's inter-application messaging infrastructure will be formatted in XML (0.7 probability).
Content: Business Drivers for XML Adoption • Support faster product cycles • Reuse content to accelerate time to market • Enable simultaneous product release in multiple markets • Reduce cost and improve efficiency • Automate publishing and translation processes • Meet regulatory and quality requirements • Enable content discovery for litigation support • Validate that content is accurate, consistent and complete to improve customer experience • Support personalized outputs • Serve local language and cultural needs
Content: DITA To The Rescue • A standardized framework for management and extensibility of XML document types • The Next Step in XML Manageability • Interoperability and tool independence • Reuse • Collaborative authoring • Originally developed by IBM • Published as an OASIS Specification in May 2005
DITA - Darwin Information Typing Architecture • Darwin: Allows natural evolution of document types through inheritance and specialization • Information Typing: Provides an information architecture for technical documents with base topic types of Concept, Task, and Reference • Architecture: A model that encapsulates best practices for both design and processes
Topic Oriented Information Development • Information created and managed as modular chunks (topics) • Topics become the building blocks of your information products • Topic Characteristics* • Discrete units of information covering a specific subject with a specific intent • Small enough to promote reuse across multiple contexts and output media • Large enough to be easily authored and large enough to be readable and coherent • Organizable into a wide variety of structures from linear to networked *Source: CIDM, JoAnn Hackos
People, process, technology, and content:The enterprise with self-describing content
Questions we will answer today • What is enterprise information management (EIM)? • What are the issues driving convergence of data and documents? • What are the people, process, and technology enablers for EIM? • What are new approaches to make content available to the enterprise for discovery?
Strategic Planning Assumption Through 2010, organizations implementing both customer data integration and product information management MDM initiatives will link these efforts as part of an overall enterprise information management program (0.7 probability).
Example 1: Structuring Product Information • Structured content analysis for knowledge workers in product teams, call center • XML editing embedded into enterprise applications (e.g. PLM, CRM) • XML/DITA for enterprise product-related publishing • Structured WIKI and blogs for User-generated Content (UGC) known issue XML Contact Center Knowledge Base new issue XML DITA Support DITA RSS Web Self Service web phone email / chat notification XML Product Design XML DITA DITA XML XML CMS of Topics FAQs Procedures Specs Best Practices Learning Collaborative Authoring RSS XML user generatedcontent publications Customers Info Dev
Example 2: Structuring e–Commerce Content • Data/document convergence solutions for knowledge workers in marketing and e-commerce • XML editing embedded into e-commerce and e-merchandizing • DITA / XML for enterprise publishing of marketing communications • Structured editor, WIKI and blogs for UGC on retail sites (ActiveX, AJAX) 3rd Party Sites: • Retailers • Communities reviews XML RSS Customers purchases Product Marketing DITA notification blogs news XML eCommerce site XML forums CMS of Topics Product Catalog Feature / Benefits Specs Reviews Ad Content Collaborative Authoring purchases Mar-Comm XML XML reviews news RSS DITA DITA XML notification Merchandizing (e.g. atg) eCommerce
ITEM ORDER CUST SHIP xfy - Display and Analyze Content Exposed through XML Customer News Customer Service History Press Releases XML Prod PTR Act Date HIJ HIJ ABC ABC DEF DEF Sales History ABC ABC Delivery Log ABC ABC DEF ABC ABC ABC HIJ Proposals ABC 2003 2004 2005 ABC XML Engine Adaptive Vocabulary DOM tree Compound XML schema XML object scripting XML object scripting Adapter Adapter Web Services (SOAP, WSDL) XML X Query XML XML Documents XML Content Defined Schema Document vocabulary SQL Server, Oracle, DB2, etc. Business Applications
IBM Information On Demand SAP NetWeaver Vendors Attempts At EIM Through MDM Oracle Fusion Middleware Large vendors focus on master data management …one part of an overall EIM program. Source: Gartner
Global Shipping and Logistics company • Key issues: • HR Policies and Procedures (litigation is driver) • Operations procedures – Sharing best practices in operations worldwide (compliance, localization of practices and language are keydrivers) • Implementing ECM infrastructure • Implementing XML and topic-oriented authoring, review, and content management • Exploring Knowledge Management • Governance • Technologies • Content models – including DITA for self-describing content
Leading Tobacco Products company • Key issues: • Document discovery (consumer and regulatory litigation is key driver) • Knowledge management – sharing of R&D across units is a secondary factor • Implementing DITA / XML for R&D documents • Implementing topic-oriented content management • Implementing topic-oriented review / approval and workflow
Auto Manufacturer • Key issues: • Regulation / Litigation (TREAD act - Transportation Recall Enhancement, Accountability, and Documentation) – discovery of all documents related to vehicle product safety issues – who knew what, when • Compliance – getting employees to adhere to records management and content classification procedures • Issue: Office documents are not self-describing, need to be classified manually. • Implementing EIM for product related documents, records management • Considering XML as an aid to making content self-describing