1 / 25

Amit Sheth CTO/SrVP, Voquette (voquette) [formerly Founder/CEO, Taalee, taalee]

Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) amit@sheth.org.

nita-coffey
Download Presentation

Amit Sheth CTO/SrVP, Voquette (voquette) [formerly Founder/CEO, Taalee, taalee]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) amit@sheth.org Content Management, Metadata & Semantic WebKeynote AddressNet.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001 Metadata Extraction is a patented pending technology of Taalee, Inc. Semantic Engine and WorldModel are trademarks of Taalee. Inc.

  2. Enterprise Content Management – sample user requirements (from a large Financial Svcs Company) • “If a new bond comes into inventory, then we should get a message, an alert...and be able to refine to say that I only have California, Oregon and Washington clients...." • “In the month of July, I received 95 e-mails from my subscriptions. These e-mails included 61 that had 143 attachments that had 67 more attachments. In total therefore, I received almost 400 documents including 5 different types (HTML,PDF, Word, Rich Media, …). Even with this volume, I had subscribed to only 10 categories in the Equities area. There are a total of 26 Equity Subscription areas and a total of 166 categories to which a user can subscribe across all Product Areas.” Professional users of a traditional Content Management Product/Solution

  3. Enterprise Content Management – sample user requirements (from a large Financial Svcs Company) • The real question is, "Which sales ideas may have significant relevance to my book of business?" For example, an earnings warning on an equity rated Hold or Lower and not owned by any of my clients may not be of high relevance to me. Ideally, a relevance analysis would: • Greatly reduce the volume of Product Area Ideas sent to every FA,hopefully to perhaps 10% to 20% or less of today's volume with ideas that are potentially actionable for that FA and his/her client • Result in FAs reading and evaluating the Product Area Ideas, taking appropriate actions, and generating sales because the Product Area Ideas would be relevant • Result in customer satisfaction because clients would understand FAs are paying attention to their needs and developing focused ideas Professional users of a traditional Content Management Product/Solution

  4. Enterprise Content Management – sample product requirements (from a large Financial Svcs Company) • “Content generation is a more complex and probably costly problem to solve ... we reportedly create about 9 million messages a month for field delivery. On average, this would mean 1,000 messages per month per ‘big user’ or perhaps only 500 to 600 per ‘little user’.…I strongly believe an analysis is in order of the nature and necessity of generated content , the establishment of content generation standards, themovement towards development and implementation of a relevance engine, … “ Director (Product Management) of a large company that uses a leading Content Management Product

  5. New Enterprise Content Management Challenges • More variety and complexity • More formats (MPEG, PDF, MS Office, WM, Real, AVI, etc) • More types (Docs, Images -> Audio, Video, Variety of text-structured, unstructured) • More sources (internal, extranet, internet, feeds) • Information Overload • Too much data, precious little information (Relevance) • Creating Value from Content • How to Distribute the right content to the right people as needed?(Personalization -- book of business) • Customized delivery for different consumption options (mobile/desktop, devices) • Insight, Decision Making (Actionable)

  6. New Enterprise Content Management Technical Challenges • Aggregation • Feed handlers/Agents that understand content representation and media semantics • Push-pull, Web-DB-Files, Structured-Semi-structured-Unstructured data of different types • Homogenization and Enhancement • Enterprise-wide common view • Domain model, taxonomy/classification, metadata standards • Semantic Metadata– created automatically if possible • Semantic Applications • Search, personalization, directory, alerts, etc. using metadata and semantics (semantic association and correlation), for improved relevance, intelligent personalization, customization

  7. Semantics • “meaning or relationship of meanings, or relating to meaning” (Webster) • is concerned with the relationship between the linguistic symbols and their meaning or real-world objects • meaning and use of data (Information System) Example: Palm -> Company, Product, Technology, Tree Name, part of location (Palm Spring, Palm Beach) Semantics, Ontologies (Domain Models), Metamodels, Metadata, Content/Data

  8. Semantics: The Next Step in the Web’s Evolution • “The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it. . . . Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim Berners-Lee, Weaving the Web, 1999) • A Content Management centric definition of • Semantic Web: The concept that Web-accessible content can be organized and utilized semantically, rather than though syntactic and structural methods.

  9. Organizing Content Different and Related Objectives: Search, Browse, Summarization, Association/Relationships • Indexing • Clustering • Classification • Controlled Vocabulary, Reference Data/ Dictionary/Thesaurus • Metadata • Knowledge Base (Entities/Objects and Relationships)

  10. Traditional Text Categorization Standard Metadata Feed Source: iSyndicate   Posted Date: 11/20/2000 Statistical/AI Techniques Customer Training Set feed Classify Place in a taxonomy Routing/Distribution Customer Article Feed 4715 Classification of Article 4715 Most traditional Content Management Products support Categorization of unstructured content..

  11. Knowledge-base & Statistical/AI Techniques Voquette/Taalee’s Categorization & Automatic Metadata Creation Article 4715 Metadata Feed Source: iSyndicate Posted Date: 11/20/2000 Company Name: France Telecom, Equant Ticker Symbol: FTE, ENT Exchange: NYSE Topic: Company News Taalee Training Set & KB Automated Content Enrichment (ACE) Place in a taxonomy Catalog Metadata Classify FTE Company Analysis Conference Calls Earnings Stock Analysis Customer Training Set & KB Standard metadata feed ENT Company Analysis Conference Calls Earnings Stock Analysis Semantic metadata NYSE Member Companies Market News IPOs Classification of Article 4715 Semantic Engine™ Precise Personalization/ Syndication/Filtering Article Feed 4715 Routing/Distribution Map to another taxonomy

  12. Technologies for Organizing Content • Information Retrieval/Document Indexing • TF-IDF/statistical, Clustering, LSI • Statistical learning/AI: Machine learning, Bayesian, Markov Chains, Neural Network • Lexical, Natural language • Thesaurus, Reference data, Domain models (Ontology) • Information Extractors • Reasoning/Inferencing: Logic based, Knowledge-based, Rule processing and Most powerful solutions require combine several of these, addressing more of the objectives

  13. Ontology • Standardizes meaning, description, representation of involved concepts/terms/attributes • Captures the semantics involved via domain characteristics, resulting in semantic metadata • “Ontological Commitment” forms basis for knowledge sharing and reuse Ontology provides semantic underpinning.

  14. Terms/Concepts (Attributes) site Functional Dependencies (FDs) latitude eventDate longitude Disaster description Hierarchies site => latitude, longitude damage damagePhoto bodyWaveMagnitude Natural Disaster Man-made Disaster numberOfDeaths conductedBy magnitude Volcano explosiveYield NuclearTest magnitude > 0 Earthquake bodyWaveMagnitude > 0 magnitude < 10 bodyWaveMagnitude < 10 Domain Rules An Ontology

  15. Controlled Vocabularies/ Classifications/Taxonomies/Ontologies • WordNet • Cyc • The Medical Subject Headings (MeSH): NLM's controlled vocabulary used for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Year 2000 MeSH includes more than 19,000 main headings, 110,000 Supplementary Concept Records (formerly Supplementary Chemical Records), and an entry vocabulary of over 300,000 terms.

  16. Open Directory Project (ODP): Classification/Taxonomy & Directory

  17. Example 1 – Snapshots (“Jamal Anderson”) Search for ‘Jamal Anderson’ in ‘Football’ Click on first result for Jamal Anderson View the original source HTML page. Verify that the source page contains no mention of Team name and League name. They were Taalee’s value-additions to the metadata to facilitate easier search. View metadata. Note that Team name and League name are also included in the metadata

  18. Example 2 – Snapshots (“Gary Sheffield”) Search for ‘Gary Sheffield’ in ‘Baseball’ Click on first result for Gary Sheffield View the original source HTML page. Verify that the source page contains no mention of Team name and League name. They were Taalee’s value-additions to the metadata to facilitate easier search. View metadata. Note that Team name and League name are also included in the metadata

  19. Intelligent Content = What You Asked for + What you need to know! SEC Semantic Web – Intelligent Content(supported by Taalee Semantic Engine) Related Stock News COMPANY Competition COMPANIES inINDUSTRY with Competing PRODUCTS COMPANIES in Same or Related INDUSTRY Regulations Technology Products Impacting INDUSTRY or Filed By COMPANY EPA Industry News Important to INDUSTRY or COMPANY

  20. Automatic 3rd party content integration Focused relevant content organized by topic (semantic categorization) Related news not specifically asked for (Semantic Associations) Automatic Content Aggregation from multiple content providers and feeds Competitive research inferred automatically Semantic Application – Equity Dashboard

  21. ASP/Enterprise hosted World Model SemanticApplication Internal Source 1 Research Semantic Engine Extractor Agent 1 2 Consults Knowledge Base for Cisco’s competition Lucent story from external feeds picked for publishing as “semantically related” to Cisco story – passed on to Dashboard 4 Internal Source 2 Returns result: Lucent is a competitor of Cisco Extractor Agent 2 3 Story on Cisco 1 Cisco story from Source 1 passed on to add semantic associations Voquette Metabase External feeds/Web (e.g. Reuters) Extractor Agent 3 Story on Lucent Third-party Content Mgmt And Syndication XCM-compliant metadata, XML or other format Metadata centric Content Management Architecture

  22. Semantic Technology Features • Unstructured Text Content • Semi-Structured Content • Structured Content • Audio/Video Content with associated text (transcript, journalist notes) • Create a Customized "World Model" (Taxonomy Tree with customized domain attributes) • Automatically homogenize content feed tags • Automatically categorize unstructured text • Automatically create tags based on text Itself • Create and maintain a Customized Knowledge Base for any domain • Automatically enhance content tags based on information beyond text • Build contextually relevant custom research applications • Contextual Search (an order of magnitude better than keyword-based search) • Support push or pull delivery/ingestion of content • Personalization/Alerts/Notifications • Real Time Indexing (stories indexed for search/personalization within a minute) • Provide the user with relevant information not explicitly asked for (Semantic Associations)

  23. Along with the evolution of metadata and semantic technologies enabling the next generation of the Web, Content Management has entered the next generation of Enhanced Content Management.

  24. Resources/References • RDF:www.w3.org/TR/REC-rdf-syntax/ • ICE: www.icestandard.org • Meta Object Facility (MOF) Specification, Version 1.3, September 27, 1999: http://cgi.omg.org/cgi-bin/doc?ad/99-09-05 • XML Metadata Interchange (XMI) Specification, Version 1.1, October 25, 1999: http://cgi.omg.org/cgi-bin/doc?ad/9910-02http://cgi.omg.org/cgi-bin/doc?ad/99-10-03 • DAML: www.daml.org • NEWSML: newsshowcase.reuters.com • PRISM: www.prismstandard.org/techdev/prismspec1.asp • RIXML: www.rixml.org • XCM: www.vignette.com • OIL: www.ontoknowledge.org/oil • SEMANTICWEB: www.semanticweb.org, business.semanticweb.org • VOICEXML: www.voicexml.org • MPEG7: www.darmstadt.gmd.de/mobile/MPEG7/ • Taalee: www.taalee.com • Applied Semantics: www.appliedsemantics.com • Ontoprose: www.ontoprise.com

  25. Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Amit Sheth & Wolfgang Klas, Eds., McGraw Hill, ISBN: 0-07-057735-8, 1998. • Information Brokering, Vipul Kashyap & Amit Sheth, Kluwer Academic Publishers, 2001. • Voquette Semantic Technology White Paper. • Mysteries of Metadata, Speaker – Amit Sheth, Workshop at Content World 2001. • Infoquilt Project, LSDIS lab. http://www.taalee.com http://lsdis.cs.uga.edu/~amit

More Related