120 likes | 135 Views
This article discusses the challenges faced in building a strategic information integration infrastructure in complex and heterogeneous environments, with a focus on diverse data sources, unstructured data, and the need for integration. It also explores the concept of virtualization and grid computing as potential solutions.
E N D
Challenges in Building a Strategic Information Integration Infrastructure Mukesh Mohania IBM India Research Lab
Complex and heterogeneous environments Many different types of systems Many inter-related applications Escalating needs Variety, velocity, volume People are expensive The Integration Challenge The world produces 250MB of information every year for every man, woman and child on earth.
The Challenge Continued… 60% + of CEOs: Need to do a better job capturing and understanding information rapidly in order to make swift business decisions. Only 1/3rd of CFOs believe that the information is easy to use, tailored, cost effective or integrated. 30-50% of application design time is spent on copy management. 85% of information is unstructured. Customers Employees Trx. Partners Orgs. Products Financials 42% of transactions are still paper-based. 30% of people’s time: searching for relevant information. Web Content e-Mails Databases Media Reports Documents The average billion dollar company: 48 disparate financial systems 40% of IT budgets may be spent on integration. 79% of companies have more than two repositories and 25% have more than 15 Sources: IBM & Industry Studies, Customer Interviews, Forrester
Taikang Life Insurance Background • 4th largest Chinese insurance company • 8,000 employees, 150,000 agents • 3.5 million customers • 28 branches, 170 sub-branches • Data in DB2 UDB, Informix, Oracle, SQL Server, XML, e-mail, CRM and Portal applications • Goals: • Up-to-the-minute status for executives • Increased employee productivity • Better customer service Business Challenge Technical Challenge
Taikang Integrated Information Platform Architecture ODS Oracle DB2/400 Informix cache Phone Fax SMS Email Web Store Front Mail Agents Financial Planner Channels Application Platform XML SQL Web Services Data Service Information Integration Platform Integrated Information Mapping (nicknames) Core Systems Group & Banking CSC Personal Life Financials
Challenges in Integrating Information • Structured and unstructured data • Diversity of data sources (content repos, pricing application, databases, …) • Coming up with the model of how information fits together • Understanding what info exists • Finding related pieces • Creating a common format • Deciding how to access and transform data • What should be materialized, what accessed in real-time, how maintained • What pre-defined paths, what unplanned (navigation vs. search) • Configuring the appropriate software • Accessing information in the application • Monitoring the system and understanding usage, problems, etc
Virtualization: Grid Computing • Virtual, collaborative organizations sharing apps, data in open heterogeneous environment. • A potentially vast aggregation of geographically dispersed computing resources • Leverages Intranet, Extranet, and Internet implementations • Lower TCO (Total Cost of Ownership) Virtual Servers, Storage and Instruments Grid Middleware Distributed Physical Servers and Storage
Data Virtualization for Information on the Grid • A Grid should allow information to be: • Virtualized over Heterogeneous, Distributed Data Sources • location & heterogeneity transparency • Accessed via Open Protocols • Autonomically administered • Dynamic • Putting information on the Grid enables: • Access to any data resource in a standard way • Viewing a collection of data resources as a single integrated entity • Placing data so as to exploit available processing/storage for performance and scale Lower TCO
Distributed Data Management and Grid Computing Tasks Required Technologies At Lower TCO (Total Cost of Ownership) Dynamic & Autonomic Static & Manual Enhance Current Technologies
Data Virtualization: Grid Middleware for Integration & QoX Middleware masks dynamic nature of data sources, compute resources
Distributed Data Management & Grid Computing Transparent, Optimized, Integrated Access to Heterogeneous Data at Lower TCO • Discover & Leverage Resources • System • Information OGSA, Web Services for QoS Information Dissemination Data Driven Application Parallelism Services Oriented Architectures Dynamic Federation Data Placement MPP Parallelism Federated Query Monolithic Application Architectures OGSA Data Replication SMP Parallelism Distributed Query FTP, ETML Data Movement Open Standards Transaction Parallelism Query Federation Parallelism