1 / 20

TEMPLATE-DRIVEN KNOWLEDGE MINING . KNOWLEDGE PROSPECTOR . NET

A framework supporting various languages, integrating with Knowledge Net Algorithm for document analysis. Perform morphological and semantic analysis, optimize resulting graph, and save results efficiently.

berts
Download Presentation

TEMPLATE-DRIVEN KNOWLEDGE MINING . KNOWLEDGE PROSPECTOR . NET

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Saint-Petersburg State University TEMPLATE-DRIVEN KNOWLEDGE MINING.KNOWLEDGEPROSPECTOR.NET Speaker Alexey L. Smolyakov Project team (Knowledge.Net)Anton V. NovikovMaxim V. SigalinAlexey L. Smolyakov Dmitry G. Cherepanov Scientific Adviserprof. Vladimir V. Safonov

  2. Project goals • Flexible framework • Supporting different languages • Integration withKnowledge.Net

  3. Algorithm • Getting documents and first-step text analysis • Morphological analysis of text blocks • Semantic analysis of entities sets using templates • Optimizing resulting graph • Saving results

  4. Getting documents and first-step text analysis • Getting documents from providers • Divide document into articles (just text, list, table etc.) • Divide text into blocks … Текстовый формат – это очень гибкий путь для описания различных типов информации… 1) Один 2) Два 3) Три Страна. Столица. Англия. Лондон. Украина. Киев.

  5. Morphological analysis of text blocks Word(«Documents») • Language recognition • Morphological form recognition using dictionaries • Creating entities Russian English … MRD XML … «Documents» current m. f. : Noun, plural «Document» base m. f.: Noun, singular EntityClass(«Document»)

  6. Morphological analysis >Entities types>“Simple”entities • Entity “separator". Example «.,;:!?()[]{}…» • Entity “unknown" • Entity “changeable". Example «good» • Entity “relationship". Example «Planet Earth is LESS then Sun»

  7. Morphological analysis >Entities types>“True”entities • Entity “class" (class). Example «document». • Entity “property".Example «useful». • Entity “datatype". • Datetime • Integer

  8. Semantic analysis >Goals • Creating relationships between entities • Creating new entities • Adding true entities into resulting graph Class(«house») Property-Class Subclass Property(«comfortable») Class(«building») Property-Class Property(«brick»)

  9. Semantic analysis >Relationship types • Relationship between property and class • Relationship “subclass” • Relationship “subproperty” • Relationship “equality” • Relationship between two classes • Relationship “conditional rule”

  10. Semantic analysis >Template description • Priority • Pattern • Handlers <Template Priority="10000" Pattern="#E.P #E.C ,? а? значить #E.P"> <Handler Name=“PropertyRelationship" Arguments="0, 1" /> <Handler Name="PropertyRelationship" Arguments="5, 1" /> <Handler Name="ConditionalRule" Arguments="1, 0, 5" /> </Template>

  11. Semantic analysis >Pattern description • Logical operands: «&»(and), «|»(or), «^»(not). • Occurrence:not set (once), «+», «*», «?» • #E.P, #E.C, #E.S, #E.U, #E.Int, #E.DateTime • #M.Noun, #M.Adjective, #M.Verb, … • #W.Month, #W.Number, … - words holder • #H.Class, …- clauses holder [#E.P #M.Adjective]+ [#E.C #M.Noun]

  12. Semantic analysis >Pattern description>Words holder <WordHolder Name="Month"> <Item Word=“JANUARY" Value="1" /> <Item Word=“FEBRUARY" Value="2" /> <Item Word=“MARCH" Value="3" /> ... </WordHolder> Clauses holder <ClauseHolder Name="Class"> <Item Pattern="[#E.P #M.Adjective]* #E.C" Index="1" /> <Item Pattern="[#E.P #M.Adjective] , [#E.P #M.Adjective] #E.C" Index="2" /> </ClauseHolder>

  13. Semantic analysis >Handlers • Replace • Create datetime entity • Create «property-class» relationship • Create «subclass» relationship • Create «subproperty» relationship • Create «conditional rule» relationship • Create «class-class» relationship

  14. Semantic analysis >Creating relationships Property(«useful») Class(«document») + <Template Priority=“4" Pattern="[#E.P #M.Adjective]+ [#E.C #M.Noun]"> <Handler Name=“PropertyRelationship" Arguments="0, 1" /> </Template> = «property-class» relationship Property(«useful») Class(«document»)

  15. Semantic analysis >Creatingnew entities Integer(«7») Class(«December») Integer(«2006») Class(«Year») + <Template Priority="11000" Pattern="#E.INT #W.Month #E.INT year"> <Handler Name="Replace" From="0" Count="4" > <CreateEntityHandler Name="CreateDateTime« Arguments="day=0, month=1, year=2" /> </Handler> </Template> = Datetime (7.12.2006)

  16. Optimizing resulting graph Class(«vehicle») • Removing redundant «subclass» relationships • Removing redundant relationships between properties and classes Subclass Subclass Property-class Class(«transport») Property(«fast») subclass Property-class Class(«bus»)

  17. Saving results • Saving acquired knowledge into Knowledge.Net format • Into OWL • Saving (and loading) knowledge from own binary format files

  18. Current project status • Developed working prototype • Created test temples • Attached «Mrd» dictionary (Russian and English)

  19. Plans • Supportcreating «compound» entities (compound from several words: «creation of human hands») • Functionalityextension (adding new entities, relationships, templates, handlers, …) • Program for generating templates • Developing good examples

  20. ? Contact information: smlkvalex@mail.ru http://www.knowledge-net.ru http://polyhimnie.math.spbu.ru

More Related