290 likes | 390 Views
Data Services: Addressing the challenges of transformation to a knowledge-driven enterprise. Sri Gopalan Booz Allen Hamilton. Agenda. Challenges of transitioning to a knowledge-driven enterprise Facets of an effective Data Services solution An approach to realizing Data Services
E N D
Data Services:Addressing the challenges of transformation to a knowledge-driven enterprise Sri Gopalan Booz Allen Hamilton
Agenda • Challenges of transitioning to a knowledge-driven enterprise • Facets of an effective Data Services solution • An approach to realizing Data Services • The Way Ahead • Questions and Comments
Challenges of transitioning to a knowledge-driven enterprise
Technology research firm IDC determined that the world generated 161 billion gigabytes of digital information last year Data is contained in a multitude of unstructured (images, video, free text) and structured ( RDBMS, XML, etc…) formats Greater policy requirements both from regulatory concerns (i.e. Sarbanes-Oxley, HIPAA, etc…) and enterprise interests (i.e. security constraints, etc…) Organizations are struggling to get a handle on what information they have, how to search for it, and how to protect it Volume of Data Creation Time The current production rate of digital information exceeds the ability to process it
Within many enterprises, there is no consistent way to discover, access, or share data • Without a priori knowledge of where systems are, how to access them, and how to query them, users find it difficult to get all the information that they need JDBC Dept. B Dept. A DB Web Application Portals Dept D. HTTP Dept. C Email ERP XML Stand-alone Apps Proprietary
Apps Apps Org. A Org. B Data Data Data Format Data Format Service Interface “I need a tank…” Service Interface Providing Business Context to Search • The key element to search is to provide search results relevant to the given business context • While a consumer might make a request in his/her business context, the data providers may interpret that request in their own divergent business context “I have scuba tanks” “I havegas tanks”
“Web 2.0” technologies provide enhanced collaboration and spark community-building activities • Mashups are a great example of re-purposing data, but they are still point-to-point and require a lot of redundant developer effort to create each one HousingMaps = Google Maps + Craigslist.com JobMaps = Google Maps + Indeed Job Search
Lessons Learned from Social Software techniques • Leverage industry strengths • Use technologies and standards that are well supported by commercial and open-source tools in order to facilitate greater adoption • Greatest common factor approach • Develop solutions that meets the requirements of the widest based of users, including those that may be technologically limited or resource constrained • Evolve with the community • Develop solutions that are flexible and adaptable enough to change over time and incorporate community feedback and contributions • Keep it Simple • While Data Services solutions may perform very complicated process in the back end, try to keep the front-end interfaces to it as simple and easy to work with as possible
The importance of Metadata • The main purpose of metadata, or data about data, is to speed up and enrich searching for resources • “What data services have information on recent financial filings?” • “Which data services are associated with a HR data within an enterprise taxonomy?” Types of Metadata Expressiveness
The Need for Data Discovery • Data Discovery provides service consumer agents with a common facility to distribute a search for relevant information across data assets within the enterprise including those that are known a priori and those that are unexpected • Data Discovery exposes the essential metadata of a data resource (e.g. id, title, summary), not the data resource itself • Potential usage scenarios: • An consumer can “subscribe” to a Data Discovery service to automatically receive streams of information about topics he/she is interested in from a variety of data providers he/she may or may not know about • Data providers, both small and large, can more directly advertise their information to interested service consumer agents that it may or may not know about • An analyst may request more metadata about a data resource before accessing it
DB Video Service Discovery Search Aggregator Search Service #4 Search Service #3 Search Service #2 Search Service #1 Images Example Data Discovery Scenario 4 3 1 3 2 5 XML UDDI 3 1.Consumer makes discovery request 2.Search Aggregator queries Service Discovery for relevant Search Services 3. Search Aggregator distributes request to relevant Search Services 4.Search Aggregator aggregates search results 5.Search Aggregator returns all search results
The need for Data Access and Delivery • Once a data resource of interest has been identified via Data Discovery, a service consumer might want to “access” or “deliver” that data resource for further processing • Data Access and Delivery capabilities provide service consumer agents with a common facility to synchronously fetch a data resource or asynchronously route it to a pre-determined endpoint • Potential usage scenarios: • An user at his/her workstation can directly “access” a data resource for detailed inspection • An field technician on the job site can use his/her mobile device to “deliver” a data resource to his/her computer at work to analyze later • Data providers can lower the cost of integration by supporting a common data retrieval interface that is well-understood throughout the local enterprise and industry
DB Retrieve Service #1 Callback Interface Messaging Infrastructure Example Data Access & Delivery Scenario 1 2a 2b 3a 1. Consumer makes data access request 2a. Retrieve Service returns requested information 2b. Retrieve Service forwards requested information to the Messaging Infrastructure 3a. Messaging Infrastructure routes requested information to service consumer 3b. Messaging Infrastructure routes requested information to service consumer receiver agent implementing a Callback Interface 3a
Major issues facing distributed information sharing • Must support for a number of interaction models • Request-response, subscribe-push, probe and match, authenticated and/or single use of data, etc… • Must support a variety of metadata and content formats • Atom, Dublin Core, Images, Video, PDF, Open Document, etc… • Different types of data lend themselves to be queried by different mechanisms • XML can be natively searched XQuery • Images cannot be natively searched with XQuery • Must be designed for controlled evolution • Do not want the addition of new features to alienate current users through constant upgrades or revisions • Discourage specification “lip service” by avoiding unbounded fields
Data Service Objectives • Address the need to enable enterprise-wide data discovery and aggregation across any number of service implementations while offering the end users with relevant information • Enable horizontal discovery, access, and consumption of data of relevance, regardless of physical location, data type, and/or technical implementation • Support a variety of messaging patterns, security and policy requirements, and data needs
Profile-Based Approach to achieving Data Services • Data Services specifications should focus on capturing the high-level process and use-cases requirements (i.e. the need to search against metadata and content), rather than the low-level realizations of those features (i.e. XQuery vs. Keyword search) • Abstract Data Services interface focused on defining a high-level construct to capture intended behaviors that will be implemented by pluggable profiles • Inspired by token profiles within WS-Security • Loosely coupled specification that enables service providers to add new capabilities without having to change the WSDL • Enables service providers to only implement those profiles that satisfy their specific requirements
What are the profiles we need to consider? • Context – What is the business context of the data service operation (search, retrieve) • Ex. A set of taxonomy key-value pairs to search against a UDDI registry • Metadata – What are the metadata formats that I would like to interact against? • Ex. Dublin Core Metadata Element Set, Atom 1.0, RSS • Content – What are the content types that I would like to interact with? • Ex. PDF, Open Document, Open XML, JPEG, MPEG2 • Query – Given the type of metadata and/or content, how would I like to query for information? • Ex. Keyword search, XQuery request, SPARQL requests
Metadata Profile: CriminalMetadata Query Profile: CriminalQL Find Where sex = “male” and race = “white” and height >= “5-09” and height <= “5-10” Content Profile: MugShotContent Query Profile: ImageMatch The combination of different “profiles” can have measurable impact Data Services Request • While “CriminalMetadata”, “MugShotContent”, “CriminalQL” and “ImageMatch” do not exist today, if they are introduced in the future it should not significantly alter the way we process requests for information
Encouraging collaboration with REST and/or SOAP • SOAP is a protocol specification that defines a uniform way of passing XML-encoded data that abstracts the physical transport layer. • Representational State Transfer (REST) are a set of architectural principles that loosely describes any simple interface that uses the use XML over HTTP without an additional messaging layer such as SOAP • SOAP and REST are two different approaches that serve different needs • In many areas the provided functionality overlaps and causes a bit of contention • The two approaches, if used properly, can be complementary and will help to meet the overall data services needs
RESTful feeds may be appropriate for disparate content subscriptions Source: RSS--Promising Technology for Building Customer Relationships (http://www.mediathink.com/rss/rss_marketers2.asp)
DB Video Subscribe Service Retrieve Service #1 Retrieve Service #2 Retrieve Service #3 Retrieve Service #4 Callback Interface Images SOAP-based messages are better suited for complex requests and messaging patterns Subscribe Scheduled Pull Scheduled Pull XML Notify Scheduled Pull
Supporting standards that may help to advance Data Services initiatives • There is a no existing set of standards that fully supports the functionality of a complete Data Services solution
OASIS Data Services Framework Technical Committee (OASIS DSF TC) • Goals and objectives for the TC include: • Collect, analyze and document the requirements for data management and sharing in a networked environment where data services lie under different domains of ownership and stewardship • Aid architects in understanding the conceptual patterns of interaction pertaining to data oriented operations • Create an abstract specification normatively describing a framework of operations to manage and retrieve data in a services environment, across ownership and stewardship boundaries. • Describe service patterns and interactions between a provider, consumer, and other resources and entities
OASIS Data Services Framework Technical Committee (OASIS DSF TC) • Out of Scope Items: • Define a mapping of the functions and elements described in the specifications to any programming language, to any particular messaging middleware, or to specific network transports. • Define new key query algorithms, metadata specifications, or content specifications. • Define concepts or renderings for functions that are of wider applicability including but not limited to: • Addressing • Query frameworks • Routing • Reliable message exchange
Summary • The need for a distributed discovery, aggregation, and access mechanism becoming more an more important • Any Data Services solution must account for a growing number of metadata specifications, content formats, and query mechanism • WS-Security demonstrates that a a profile-based solution can meet the diverse needs of a community • OASIS Data Service Framework TC will identify and fill the gaps to achieve a complete Data Services solution