380 likes | 472 Views
A Semantic Web-Based Approach for Personalizing News. Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam. * Joint work with Kim Schouten, Philip Ruijgrok, Jethro Borsje, Leonard Levering, and Frederik Hogenboom. Contents. Motivation Hermes Framework: News Classification
E N D
A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten, Philip Ruijgrok, Jethro Borsje, Leonard Levering, and Frederik Hogenboom
Contents • Motivation • Hermes Framework: • News Classification • Knowledge Base Updating • News Querying • Results Presentation • Hermes News Portal: • An example • Evaluation • Conclusions • Future Work
Motivation • Large quantity of news on the Web: • Difficult to find the ones of interest • News messages have a strong impact on stock prices • Limited annotation of RSS feeds: • Broad categories (business, cars, entertainment, etc.) • Google finance shows direct news which pertain to a certain portfolio: • Indirect news (competitors of Google like Microsoft) are not presented • Not possible to ask time-related queries about news
Motivation • Need for an intelligent system to personalize news • The world is changing: • It is important to have an up-to-date representation of the world into the system • News have a dual function: • Find the information of interest • Update our previous knowledge on the state of the world • Feedback loop: • The extracted information helps in the next iteration to refine your domain of interest
Hermes Framework • Input: • News items from RSS feeds • Domain ontology linked to a semantic lexicon (e.g., WordNet) • User query • Output: • News items as answers to the user query • Four phases: • News Classification • Relate news items to ontology concepts • Knowledge Base Updating • Update the knowledge base with news information • News Querying • Allow the user to express his concepts of interest and the temporal constraints • Results Presentation • Present the news items that match user’s query
1. News Classification • Concept defined in the ontology (class or individual) • Multiple lexical representations for the same concept: • Ontology synonyms (e.g., New York→ “New York”, “Big Apple”) • Semantic lexicon synonyms (e.g., buy→“acquire”) • Concepts without subclasses or instances: • Semantic lexicon hyponyms (e.g., company→dot-com) • Lookup ontology concepts into news items • A longer match supersedes a shorter match (“European Central Bank” supersedes “European”)
1. News Classification 1.1 Tokenization (words, punctuation signs) 1.2 Sentence splitting (sentences) 1.3 Part-of-speech tagging (e.g., noun, verb, adj., etc.) 1.4 Morphological analysis (e.g., lemma “read” for “reading” as a verb) 1.5 Word sense disambiguation (e.g., Structural Semantic Interconnection (SSI) based on word context) 1.6 Adding “hits” between news items and the domain ontology
2. Knowledge Base Updating • Knowledge base updates are based on recognized events • Events have associated rules with: • Alternative patterns for event detection • Sequence of actions for knowledge base update • Before knowledge base updating the discovered events need to be validated by the user • E.g., an event is kb:newCEOwhich represents theappointment of a new CEO
2. Knowledge Base Updating 2.1 Event Rules Patterns Construction • Make use of lexico-semantic patterns • Lexico-Semantic patterns are based on triples (Subject, Predicate, Object) where • [type] stands for knowledge instances of the enclosed types E.g., [kb:Company] represents all company instances (all their associated lexical representations): • “IBM”, “International Business Machines”, etc. • “EBay”, “E-bay”, “Ebay”, etc. • Etc. • Otherwise they represent knowledge base instances • $name represent variables
2. Knowledge Base Updating 2.1 Event Rules Patterns Construction • Two types of patterns: • SP patterns: E.g., $c:[kb:Company] kb:GoesBankrupt matches “WorldCom goes bankrupt”, “WorldCom filed for Chapter 11”, etc. • SPO patterns: E.g., $p:[kb:Person] kb:BecomesCEO $c:[kb:Company] matches “Steve Ballmer appointed CEO of Microsoft”, “Steve Ballmer becomes new Chief Executive Officer of Microsoft”, etc.
2. Knowledge Base Updating 2.2 Event Rules Patterns Execution • Extract information from text • Assign ontology concepts to the variables 2.3 Event Validation • The knowledge extraction process is not flawless • User validates the extracted knowledge 2.4 Event Rules Actions Construction • Two types of actions: • Insert triples E.g., INSERT $c kb:hasCEO $p • Delete triples E.g., DELETE $c kb:hasCEO $p
2. Knowledge Base Updating 2.4 Event Rules Actions Construction (Cont’d) • Per event a sequence of actions is defined • The order of actions is important E.g., for the event kb:newCEO two actions are defined: DELETE $c kb:hasCEO $pp INSERT $c kb:hasCEO $p • Unbound variables stand for anything and are not allowed in INSERT actions (e.g., $pp in the example) 2.5 Event Rules Actions Execution • Execute the actions associated to events in the order they are found in the news • Per event execute in the given order the associated actions
3. News Querying 3.1 Query Formulation • Present the domain knowledge as directed labeled multi-graph with the additional constraint that arcs between two nodes are not allowed to share the same label (called conceptual graph) • User selects the concepts of interest in the conceptual graph (e.g., Google) • User is able to add to its selection concepts related to the concepts of interests using specified relations (e.g., kb:hasCompetitors: kb:Microsoft, kb:eBay, and kb:Yahoo) • The selected concepts are presented in a separate graph (called search graph)
3. News Querying 3.1 Query Formulation (Cont’d) • News are time stamped • User is able to specify that only news in a certain time interval should be retrieved • Time constraints: • Last hour • Last day • Last year • [2007-03-01T00:00:00.000+00:01, 2007-05-31T00:00:00.000+00:01 ] • [Future: order constraints (e.g., order by time)]
3. News Querying 3.2 Query Execution • Generate the query in a semantic query language: • Map concepts of interest to query restrictions (current: disjunctive queries) • Map temporal constraints to query restrictions • Execute the semantic query • The order of the relevant news items is not important here
4. Results Presentation 4.1 News Sorting • Return news items that match a query • Sort the news items based on their relevance degree to the query • The relevance degree is determined empirically: • based on a weighted sum of the number of hits in title (higher weight) and body (lower weight) of the news item • News items that have the same relevance degree are sorted in descending timestamp order
4. Results Presentation 4.2 News Presentation • Present the concepts involved in the query • Per each news items show a summary: • Title • Source • Date • Few beginning lines from the news item ([Future: snippet]) • Emphasize the hits (found concepts from the ontology) in the retrieved news items • Show the icons of the most important query concepts found in a news item: • based on a weighted sum of the number of hits in title (higher weight) and body (lower weight) of a concept in a news item
Hermes News Portal • Hermes News Portal (HNP) is an implementation of the Hermes framework • Implementation language: Java • Ontology represention language: OWL (e.g., cardinality restrictions, inverses, etc.) • Semantic lexicon: WordNet • Graph visualization: Prefuse (OWL2Prefuse) • Query language: SPARQL/Update • Query language: SPARQL extended with custom time functions (e.g., currentDate(),currentTime(), etc.) • Natural language processing: GATE
An Example • Query: Which are the news items about Google or one of its competitors from the past six months?
1. News Classification – News Item Google “SAN FRANCISCO (Reuters) -Web search leader Google Inc. on Monday said it agreed to acquire top video entertainment site YouTube Inc. for $1.65 billion in stock, putting a lofty new value on consumer-generated media sites.” [October 9th, 2006 at 20:15:33 CET] • Three concepts are founded in the news: • kb:Google • kb:Buy • kb:YouTube • kb:Relation class instances store hits between the news item and the found concepts (Semantic Web best practice recommendation for modeling N-ary relationships) acquire Inc. YouTube Inc.
2. Knowledge Base Updating – Rule Editor (Define Event Rule Patterns)
2. Knowledge Base Updating – Rule Editor (Event Validation)
2. Knowledge Base Updating – Rule Editor (Define Event Rule Actions)
3. News Querying- Search Graph Individuals Classes Selected concepts Concepts related to the selected node Concepts from keyword search
3. News Querying- SPARQL PREFIX hermes: <http://hermes-news.org/news.owl#> SELECT ?title WHERE { ?news hermes:title ?title . ?news hermes:time ?date . ?news hermes:relation ?relation . ?news hermes:relatedTo ?concept . FILTER ( ?concept hermes:relatedTo hermes:Google || ?concept hermes:relatedTo hermes:Micosoft || ?concept hermes:relatedTo hermes:Ebay || ?concept hermes:relatedTo hermes:Yahoo ) FILTER ( ?date > "2009-02-01T00:00:00.000+00:01" && ?date < "2009-07-31T00:00:00.000+00:01" ) } • SPARQL query:
3. News Querying- tSPARQL • Custom time functions:
3. News Querying- tSPARQL PREFIX hermes: <http://hermes-news.org/news.owl#> SELECT ?title WHERE { ?news hermes:title ?title . ?news hermes:time ?date . ?news hermes:relation ?relation . ?news hermes:relatedTo ?concept . FILTER ( ?concept hermes:relatedTo hermes:Google || ?concept hermes:relatedTo hermes:Micosoft || ?concept hermes:relatedTo hermes:Ebay || ?concept hermes:relatedTo hermes:Yahoo ) FILTER ( ?date > hermes:dateTime-substract(hermes:now(), P0Y6M) && ?date < hermes:now() ) } • tSPARQL query:
Evaluation • Test set: 200 new items from Yahoo! business and technology news feed • Precision for concept identification: 86% • Recall for concept identification: 81% • Precision for event identification 62% • Recall for event identification 53% • Subsecond performance for one news item processing time
Evaluation • Test users: 9 students following a Semantic Web course • Usability: build one query (the one from the presentation) using HNP and in SPARQL • Quantitative evaluation: • Measure the time it takes to build the query in the two approaches • Faster to build the news query using HNP • Qualitative evaluation: • Questionnaire • Easier to build the news query using HNP • HNP pros: graphical user interface, predefined time functionality, results explanation by highlighting the found concepts • HNP cons: the layout changes from conceptual graph to search graph, results are not ordered by time
Conclusions • Hermes Framework: presents news items that match the user interests • Hermes Framework: • News Classification • Knowledge Base Updating • News Querying • Results Presentation • Hermes News Portal (HNP): an implementation of the Hermes framework • HNP based on: • WordNet semantic lexicon, OWL ontology, (extended) SPARQL queries, Prefuse visualization, GATE natural language processing
Future Work • Limited query expressivity: • Add conjunction to queries (e.g., retrieve all news items that mention both Google and Yahoo!) • Add negation to queries (e.g., retrieve all news items that do not mention Google) • Add patterns to queries (e.g., retrieve all news items that refer to Google acquiring another company) • Add snippets and temporal ordering to query results presentation • Evaluate the tool outside the university lab • Evaluate the tool for another domain (e.g., politics instead of finance)