140 likes | 256 Views
NRW KULTURsekretariat Relevanz: 5.9% - - Pfadfinderinnenschaft Sankt Georg Relevanz: 5.7% - - Konferenz der deutschsprachigen Mathematikfachschaften Relevanz: 5.2% - - Leonard Monheim Relevanz: 5.1% - - Andreas Kruse Relevanz: 4.9% - - Holzbau Relevanz: 4.9% - -
E N D
NRW KULTURsekretariatRelevanz: 5.9% - - Pfadfinderinnenschaft Sankt GeorgRelevanz: 5.7% - - Konferenz der deutschsprachigen MathematikfachschaftenRelevanz: 5.2% - - Leonard MonheimRelevanz: 5.1% - - Andreas KruseRelevanz: 4.9% - - HolzbauRelevanz: 4.9% - - Wolfgang SeifenRelevanz: 4.9% - - Feldpost der Belgier in Deutschland nach dem Ersten Weltkrieg 1918–1935Relevanz: 4.1% - - Konferenz der InformatikfachschaftenRelevanz: 4.0% - - UNESCO-ClubRelevanz: 3.7% - - Kaiser/Riegraf-Gruppe (Heilbronn)Relevanz: 3.7% - - Niederländische Annexionspläne nach dem Zweiten WeltkriegRelevanz: 3.6% - - Results for „Konferenz Aachen“ Find a pageof a conferencethat is related to Aachen. Limit query to certainclasses of result pages BTW 2007, Aachen
Source for Classes: WordNet Thesaurus ROOT entity group living_thing thing meeting minority person congress conference entertainer scientist musician actor physicist biologist More than 81000 concepts BTW 2007, Aachen
Mapping Pages to Concepts city Automatic mappingwith high quality BTW 2007, Aachen
Architecture Wikipedia Pages(XML) HTML Concept Mapper Wikipedia Pages(Annotated XML) TopX Search Engine Wikipedia Pages(Wiki Markup) BTW 2007, Aachen
Manually added category information in most pages Example: Albert Einstein Excellent_articles 1879_births Physics Swiss_physicists Concept Mapping (1): Categories Technically: exclude admin categories, shallow parsing of category labels,stemming, mapping heuristics BTW 2007, Aachen
Regular structures (list, tables, …) often indicate similar concepts Example: List of people Concept Mapping (2): Regular Structure • Albert Einstein • Max Planck • Nils Bohr • Werner Heisenberg physicist Technically: grouping of similar XPathexpressions, find coherent annotations,frequency & confidence thresholds BTW 2007, Aachen
Regular structures (list, tables, …) often indicate similar concepts Example: List of people Concept Mapping (2): Regular Structure • /article[1]/…/list[3]/item[1]/link[1] • /article[1]/…/list[3]/item[2]/link[1] • /article[1]/…/list[3]/item[3]/link[1] • /article[1]/…/list[3]/item[4]/link[1] Technically: grouping of similar XPathexpressions, find coherent annotations,frequency & confidence thresholds BTW 2007, Aachen
Concept Mapping (3): Outlier Detection Kings_of_Spain European_rulers ROOT Sometimes conflicting annotations of the same page: entity living_thing thing person artifact instrument ruler king ? ruler Solution:Compatibility matrixfor high-level concepts BTW 2007, Aachen
Add concept tag(s) to articles<citysource=„categories“ confidence=“1.0“> <article>…</article></city> Add concept tag(s) to outgoing links…<citysource=“lists“ confidence=“0.9“> <link target=„“…“>Saarbrücken</link> </city> YAWN: Annotated XML BTW 2007, Aachen
Map concept queries to XPath expressions „conferences in Aachen“://conference[contains(.,“Aachen“)] „scientists who won a nobel prize“://scientist[contains(.,“Nobel prize“)] „musicians who performed a song where ‚space‘ occurs in the title“: //musician[contains(//song,“space“)] Querying YAWN Not for end users!Needs good user interface BTW 2007, Aachen
XML Conversion Templates Preliminary evaluation Left Overs and Summary See paper Automated detection and annotationof concepts is useful for retrieval. BTW 2007, Aachen
The Future: YAGO [WWW‘07] is_a is_a instance_of instance_of located_in area state city Aachen NRW Querying the knowledge representation BTW 2007, Aachen
Thank you! BTW 2007, Aachen