1 / 38

Goal 2 Activities 4, 6, 7

Letizia Tanca Politecnico di Milano. Goal 2 Activities 4, 6, 7. Goal 2: Knowledge Management ( Polimi ) ‏. Activity 4: Knowledge extraction from natural language actions ( Polimi + IBM + Bari) ‏

ivory
Download Presentation

Goal 2 Activities 4, 6, 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Letizia Tanca Politecnico di Milano Goal 2Activities 4, 6, 7

  2. Goal 2: Knowledge Management(Polimi)‏ Activity 4: Knowledge extraction from natural language actions (Polimi + IBM + Bari)‏ Activity 6:Knowledgeextraction, modeling and integrationfromsemi-structured information sources, drivenby domain ontologies(PoliMI)‏ Activity 7:Knowledgefusion, “tailoring” and disseminationfor business modelredesign(PoliMI)‏

  3. Context-aware Web Portal

  4. Contextual data analysis • At GialloRosso the oenologist and the agronomist interact with the data related to harvesting and to the wine ageing • the information they interact with depend on their role and on the workflow phase • The agronomist inserts information related to the nature of the natural phoenomena • The agronomist and the oenologist ask information related to the phase • At BiancoRosso the sales manager: • analyzes sales data • in a different moment analyzes the market trends, then • reads similar information in natural language from the web • GialloRosso performs market analyses by accessing its own information combined with market information collected by its ally BiancoRosso

  5. GialloRossoLogical Schema

  6. BiancoRossoLogical Schema VINO(ID_Vino, nome, vinificazione, invecchiamento, denominazione, temperatura, min_temp, note) EVENTO(ID_Evento, nome, tipo, data, luogo) TRENDSETTER(ID_Trend, nome, professione) FONTE(ID_Fonte, nome, uri, tipo, rilevanza, provenienza, descrizione) DOCUMENTO(ID_Doc, riassunto, url, data, autore, titolo, argomento, descrittore, ID_Fonte) VALUTAZIONEMERITO(ID_valutazione, descrizione, giudizio, lingua) RISULTATORICERCA(ID_risultato, ID_vino, ID_evento, ID_trend, ID_fonte, ID_doc, posizione, ID_valutazione)

  7. Context-aware data tailoring

  8. Data tailoring via viewcomposition

  9. ContextDimensionTree

  10. Some relevantareas

  11. At GialloRosso the oenologist and the agronomist interact with the data related to cultivation and to the cellar

  12. oenologist A portion of the CDT of our scenario

  13. Some contextualviews C1=<role=agronomist, *, phase=harvesting> C2 =<role=agronomist, *, phase=ageing> C3=<role=enologist, *, phase=harvesting> C4 =<role=enologist, *, phase=ageing>

  14. Some contextualqueries The agronomist during the harvesting phase (context C1) wants to collect all the available information coming from sensors: SELECT m.date_time,m.value,s.s_id,s.meas_unit FROM sensor s, measure_data m WHERE s.s_id=m.s_id; S/he obtains only the information from sensors placed in the vineyards (see Rel(C1))

  15. Some contextualqueries The oenologist during the harvesting phase (context C3) wants to collect all the available information about bottles of “Aglianico” wine: SELECT * FROM bottle b WHERE b.appellation="aglianico"; But the query is out of context, in the context C3 only information about vineyard and grapevine are available for the oenologist.

  16. Some more contextualqueries The previous query makes sense in context C4, where the oenologist is in the ageing phase: SELECT * FROM bottle b WHERE b.appellation="aglianico"; Produces a non- empty result.

  17. At BiancoRosso: the sales manager analyzes sales data The oenologist analyzes wine features to design a new wine Then s/he reads similar information in natural language from the web also intensional queries are performed

  18. Sales and promotions planning (Q1) Sales and promotions planning for events and festivals • The sales manager of BiancoRosso wants to select the wines to promote for each event or festival • For each event or type of event he/she needs to identify the most related wines • Interesting wines for each event can be obtained by analyzing frequent rules in the form • EventType=value → Wine=value • E.g., EventType=“Summer party” → Wine=“White wine” support=20%, confidence=36%

  19. Sales and promotions planning (Q2) Sales and promotions planning depending on time periods • The sales manager wants to plan specific promotions for each time period of the year • For each time period (e.g., month) the manager needs to select the most related wines • Interesting wines can be obtained by analyzing frequent rules in the form • Month=value → Wine=value • E.g., Month=“June” → Wine=“White wine” support=20%, confidence=36%

  20. Design of wine (Q3) Analysis of the main characteristics of wines • The oenologist of BiancoRosso wants to produce new wines • He/she needs to know the main characteristics of each wine to select the most interesting wines to produce • He/she obtains the characteristics of each wine by exploiting rules in the form • Wine=value → Characteristic=value • E.g., Wine=“White wine” → Characteristic=“Mainly drunk in a specific time period” support=6%, confidence=100%

  21. Design of wine (Q4) Identification of correlations between wines and time periods • The time period in which each wine is mainly consumed is useful to select the wines to produce • For each wine the oenologist wants to obtain the time period (e.g., month) in which the wine is mainly consumed • Allows selecting wines related to time periods not already covered by the wines currently produced by BiancoRosso • He/she uses rules in the form • Wine=value → Month=value • E.g., Wine=“White wine” → Month=“June” support=20%, confidence=100%

  22. Design of wine (Q5) Identification of correlations between wines and information sources • Once the oenologist has selected the new wines to be produced, he/she needs to identify the sources containing documents related to the selected wines • The oenologist identifies the sources containing information about the wines of his/her interest by exploiting the following rules • Wine=value → Source=value • E.g., Wine=“Montello e colli asolani cabernet superiore” → Source=“Gambero Rosso” support=11%, confidence=100%

  23. DIESIRAEA semantic search engine based on Natural Language Processing

  24. Knowledge Management

  25. Domain model  Ontology (W3C OWL standard) Describes the concepts of the domain Domain vocabulary  Semantic Network Describes the lemmas of the domain Mapping model  Stochastic model 2° order HMM-inspired model Transition probs approximated by means of MaxEnt models Solves mapping ambiguities Queries: Keyword-based (AND/OR; max probability/exaustive) Phrase-based (Disambiguated Word queries and Ontological queries) Knowledge Indexing & Extraction: Goals

  26. Knowledge indexing & extraction: Functionalities Training Indexing, querying, and extending

  27. Linguistic Context Extractor: Calls linguistic tools (Stanford Parser, FreeLing, JavaRAP,…) words Wi (lemmas Li , linguistic context information Ii ) Knowledge indexing & extraction: Information Extraction Engine • MaxEnt Models: • Calculates HMM transition probabilities (takes in account the linguistic context info) • Extended Viterbi: • (Li , Ii)  concepts Ci • TF-IDF: • Document ranking, based on concept frequencies Training Indexing, querying, and extending

  28. Art deco Wine Domain Ontology

  29. Sequence of isolated words No linguistic structure Exhaustive AND/OR keywords No concept disambiguation Searches for multiple tuples Example: light wine several meanings found…country wine  search for instances… taste wine  search for subclasseses… Max probability AND/OR keywords Searches for a single tuple Exploits the a-priori concept probabilities Example: [light wine]  max probability meaning Keyword-based queries

  30. Phrase Linguistic structure Context-based disambiguation Disambiguated Word queries Context used for concept disambiguation Index the phrase (  extract concepts) Search for AND-ed concepts Example: (fruit taste)  disambiguates fruit Ontological queries Context used to select the request to the ontology Indexes the sentences Select the request; searches the ontology for the mapped concepts Example: “type of tannins in wine” instance list Phrase-based queries

  31. GialloRosso performs market analyses by accessing its own information combined with market information collected by its ally BiancoRosso

  32. DATA SOURCE 1 (RDBMS) DATA SOURCE 4 (Base station) DATA SOURCE2 (XML) DATA SOURCE3 (WWW) The Integration problemfrom the user point of view User query answer GLOBAL KNOWLEDGE INTERFACE APPLICATION

  33. Information integration in ART DECO

  34. Knowledgeretrievalfrom the sources In order to integrate the two original sources, we define the following query to populate the ontology: PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX fn: http://www.w3.org/2005/xpath-functions# PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl# SELECT ?w1 ?w2 ?wn1 ?wn2 ?wb ?bq ?dse ?dso ?sn FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl WHERE { ?w1 rdf:type do:WineInFarm . ?wb rdf:type do:WineBottle . ?wb do:containsWine ?w1 . ?wb do:bottleQuantity ?bq . ?w1 do:appellationInFarm ?wn1 . ?w2 do:appellationInDocument ?wn2 . ?w2 rdf:type do:WineInDocument . ?dse rdf:type do:DocSearch . ?dso rdf:type do:DocSource . ?dse do:searchWineID ?w2 . ?dse do:searchSrcID ?dso . ?dso do:docSrcName ?sn . }

  35. Query 1 Quantity of bottles (in the GialloRosso DB) available for each wine cited by the web source “Percorsi di Vino” (stored in the BiancoRosso DB): PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX fn: http://www.w3.org/2005/xpath-functions# PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl# SELECT ?wine_name sum(?bottle_quantity) FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl WHERE { ?w1 rdf:type do:WineInFarm . ?wb rdf:type do:WineBottle . ?wb do:containsWine ?w1 . ?wb do:bottleQuantity ?bottle_quantity . ?w1 do:appellationInFarm ?wn1 . ?w2 do:appellationInDocument ?wine_name . ?w2 rdf:type do:WineInDocument . ?dse rdf:type do:DocSearch . ?dso rdf:type do:DocSource . ?dse do:searchWineID ?w2 . ?dse do:searchSrcID ?dso . ?dso do:docSrcName ?source_name . FILTER regex(?source_name, “PercorsiDiVino") FILTER fn:contains(?wine_name, ?wn1) } GROUP BY ?wine_name ?source_name

  36. Query 2 Which sources (from BiancoRosso) cite wines of which we (GialloRosso) have at least a bottle available? PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX fn: http://www.w3.org/2005/xpath-functions# PREFIX do: file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl# SELECT ?wine_name ?source_name FROM file:///home/lele/workspace/TIS-RewSparQL_progetto/ontologies/art-deco/wineDomain.owl WHERE { ?w1 rdf:type do:WineInFarm . ?wb rdf:type do:WineBottle . ?wb do:containsWine ?w1 . ?wb do:bottleQuantity ?bottle_quantity . ?w1 do:appellationInFarm ?wn1 . ?w2 do:appellationInDocument ?wine_name . ?w2 rdf:type do:WineInDocument . ?dse rdf:type do:DocSearch . ?dso rdf:type do:DocSource . ?dse do:searchWineID ?w2 . ?dse do:searchSrcID ?dso . ?dso do:docSrcName ?source_name . FILTER (?bottle_quantity > 0) FILTER fn:contains(?wine_name, ?wn1) } GROUP BY ?wine_name ?source_name

  37. Q & A Q & A (If you see this slide we’ve not run out of time)‏

  38. Part 3 of the book • Ontology-based knowledge elicitation: an architecture (Chapter editor Licia Sbattella, Roberto Tedesco, Giorgio Orsi, Politecnico di Milano, Marcello Montedoro, IBM Italia) • Knowledge extraction from Natural Language (Chapter editor Licia Sbattella, Roberto Tedesco, Politecnico di Milano) • Knowledge extraction from event flows (Chapter editor Alberto Sillitti, Università di Bolzano) • Context-aware knowledge querying in a networked enterprise (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Fabio A. Schreiber, Politecnico di Milano, Teresa Baldassare, Università di Bari) • On-the-fly and Context-Aware Integration of Heterogeneous Data Sources (Chapter editors Giorgio Orsi, Letizia Tanca, Politecnico di Milano) • A methodology for context-driven data-warehouse design (Chapter editor Cristiana Bolchini, Elisa Quintarelli, Letizia Tanca, Politecnico di Milano)

More Related