1 / 26

Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002)

Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002). Based on Keynote CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002. Amit Sheth CTO, Semagix Inc. Large Scale Distributed Information Systems (LSDIS) Lab

meadow
Download Presentation

Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Snapshot of Semantic Web Commercial State of the Art(presented at Science on the Semantic Web, Rutgers, October 2002) Based on Keynote CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002 Amit Sheth CTO, Semagix Inc. Large Scale Distributed Information Systems (LSDIS) Lab University Of Georgia; http://lsdis.cs.uga.edu October 24, 2002 © Amit Sheth

  2. I am not selling any product here. It is interesting to note SW = Software has move to SW = Semantic Web

  3. Fundamental Issue • Ontology Creation and maintenance • Human consensus + automatic KB (assertion) extraction • Automatic Semantic Annotation • Extremely fast computations exploiting semantic metadata • Especially named relationships

  4. Produce Aggregate Catalog/ Index Integrate Syndicate Personalize Interactive Marketing Where is the content? Whose is it? What is this content about? What other content is it related to? What is the right content for this user? What is the best way to monetize this interaction? Broadcast, Wireline, Wireless, Interactive TV Semantic Metadata Central Role of Metadata Back End Applications "A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV “Metadata increases content value in each step of content value chain.” Amit Sheth

  5. More Semantics for Relevance to tackle InformationOverload!! A Metadata Classification User Ontologies Classifications Domain Models Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Direct Content Based Metadata (inverted lists, document vectors, LSI) Content Dependent Metadata(size, max colors, rows, columns...) Content Independent Metadata(creation-date, location, type-of-sensor...) Data(Heterogeneous Types/Media)

  6. WWW, Enterprise Repositories Nexis UPI AP Digital Videos Data Stores Feeds/ Documents Digital Maps . . . . . . Digital Images Digital Audios . . . Semantic Metadata Extraction, Semantic Annotation Key challenge: Create/extract as much (semantics)metadata automatically as possible EXTRACTORS METADATA

  7. Semantic Content Organization and Retrieval Engine (SCORE) technology • Automatically aggregates and extracts information from • disparate sources and multiple formats • Automatically tags/annotates and categorizes content • Automatically creates relevant associations • Maps content topics and their relationships • Semantic query engine relates information and knowledge • both internal and external to the organization into a single • view

  8. Semagix Freedom Product Components

  9. JIVA Office of Foreign Assets Control (OFAC) Capital Advantage (CA) Federal Bureau of Investigation (FBI) The Interdisciplinary Center (ICT) Central Intelligence Agency (CIA) Federation of American Scientists (FAS) Data supplied from NASA (DPL) Hoover’s (H) ZDNet (ZD) Market Guide (MG) Knowledge Sources Used Entity Classes and Relationships populated by these knowledge sources: THING -event (ICT) terroristOrganization participated in terroristSponsoredEvent (ICT) -politicalOffice (CIA, CA) politicalOffice office(s) within govtOrganization politicalOffice associated with organization -watchList (OFAC, FBI, DPL) terroristOrganization appears on watchList (OFAC, FBI, DPL) -organization (OFAC, FBI, FAS, ICT, CA, CIA) organization appears on watchList organization memberOf suborganization -company company manufactures product (ZD) company identifiedBy tickeySymbol (H) companyposition position in company (MG) company memberOf industry (H) -tickerSymbol (H) tickerSymbol memberOf exchange (H) PERSON(OFAC, FBI, DPL) -politician (OFAC, FBI, CIA, CA) politician associated with politicalOrganziation politician held politicalOffice politician associated with politicalOffice -terrorist (OFAC, FBI, DPL) terrorist memberOf organization terrorist appears on watchList -companyExecutive (MG) companyExecutive holdsOffice companyPosition person has permanent address address (OFAC, FBI) person has dob(date of birth) (OFAC, FBI) person has pob(place of birth) (OFAC, FBI) PLACE -organization located in place (H, OFAC) -religiousAffiliation practiced in place (CIA) -company headquarters in city (H)

  10. Auto Categorization Semantic Metadata Automatic Categorization & Metadata Tagging (unstructured text) Video withEditorialized Text on the Web

  11. Semantic Metadata Extraction/Annotation:Semi-structured source Web Page Enhanced Metadata Asset Extraction Agent

  12. Syntax Metadata Semantic Metadata Semantic Content Enhancement Workflow

  13. Content Asset Index Evolution

  14. Automatic 3rd party content integration Focused relevant content organized by topic (semantic categorization) Related relevant content not explicitly asked for (semantic associations) Automatic Content Aggregation from multiple content providers and feeds Competitive research inferred automatically Semantic Application Example – Analyst Workbench

  15. SEC Semantic Web – Intelligent Content Intelligent Content = What You Asked for + What you need to know! Related Stock News COMPANY Competition COMPANIES inINDUSTRY with Competing PRODUCTS COMPANIES in Same or Related INDUSTRY Regulations Technology Products Impacting INDUSTRY or Filed By COMPANY EPA Industry News Important to INDUSTRY or COMPANY

  16. Syntax Metadata Semantic Metadata Knowledge-based & Manual Associations Human-assisted inference Same entity led by

  17. Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)

  18. Innovations that affect User Experience • BSBQ: Blended Semantic Browsing and Querying • Ability to query and browse relevant desired content in a highly contextual manner • Seamless access/processing of Content, Metadata and Knowledge • Ability to retrieve relevant content, view related metadata, access relevant knowledge and switch between all the above, allowing user to follow his train of thought • dACE: dynamic Automatic Content Enhancement • Ability to provide enhanced annotation features, allowing the user to retrieve relevant knowledge about significant pieces of content during content consumption • Semantic Engine APIs with XML output • Ability to create customized APIs for the Semantic Engine involving Semantic Associations with XML output to cater to any user application

  19. Boarding Gate Airport Airspace Interrogation Visionics AcSys Security Portal ARC AvSec Manager Data Management Data Mining Semagix Ontology Metabase Threat Scoring Check-in IPG Airport LEO Gov’t Watchlists News Media Web Info LexisNexis RiskWise Passenger Records Reservation Data Airline Data Airport Data Airline and Airport Data Futureand Current Risks

  20. Sources Used

  21. John Smith Interrogation Kiosk – Unique Advantages of Semagix Semagix’s Semantic Technology enables flight authorities to :- take a quick look at the passenger’s history- check quickly if the passenger is on any official watchlist- interpret and understand passenger’s links to other organizations (possibly terrorist)- verify if the passenger has boarded the flight from a “high risk” region- verify if the passenger originally belongs to a “high risk” region- check if the passenger’s name has been mentioned in any news article along with the name of a known bad guy

  22. appearsOn watchList: FBI Flight Coutry Check 45 0.15 Person Country Check 25 0.15 Nested Organizations Check 75 0.8 John Smith Aggregate Link Analysis Score: 17.7 LEXIS NEXIS ANNOTATION Action: Information about or related to the passenger returned by Lexis Nexis is enhanced by linking important entities to Semagix’s rich ontology Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text and further automatically co-relate it with other data in the ontology to present a clear picture about the passenger to the flight official LINK ANALYSIS Action: Semantic analysis of the various components (watchlist, Lexis Nexis, ontology search, metabase search, etc.) to come up with an aggregate threat score for the passenger Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text, automatically co-relate it with other data in the ontology, search for relevant content to present an overall idea of the threat level fo the passenger, allowing him to take quick action ONTOLOGY SEARCH Action: Semagix’s rich ontology is searched for this name and associated information like position, aliases, relationships (past or present) of this name to other organizations, watchlists, country, etc. are retrieved Ability Proven: Ability to automatically aggregate relevant rich domain knowledge about a passenger and automatically co-relate it with other data in the ontology to present a visual association picture to the flight official METABASE SEARCH Action: Semagix’s rich metabase is searched for this name and associated content stories mentioning the passenger’s name are retrieved Ability Proven: Ability to automatically aggregate and retrieve relevant content stories, field reports, etc. about the passenger that can be used by flight officials to determine if the passenger has any connections with known bad people or organizations WATCHLIST ANALYSIS Action: Semagix’s rich ontology is automatically searched for the possible appearance of this name on any of the watchlists Ability Proven: Ability to automatically aggregate relevant rich domain knowledge and automatically co-relate it and rank the threat factors to indicate threat level of the passenger on the watchlist front Threat Score Components

  23. Query Comparison: Semagix vs. RDBMS

  24. Queries per server per hour > 1,980,000 Query Response Time (light load) 1 - 10 ms Query Response Time (64 concurrent users)  65ms Incremental Index Update Frequency 1 minute (near real-time) Population/update rate in a Ontology with 1 million entities/relationships > 10,000 entities/relationships per hr. Performance

  25. More at www.semagix.com and http://lsdis.cs.uga.edu/lib/presentations.html

More Related