1 / 60

Mitsuru Ishizuka Dept. of Creative Informatics & Dept. of Info. and Communication Eng.

A Common Concept Description of Natural Language Texts as a Foundation of Semantic Computing on the Web. Mitsuru Ishizuka Dept. of Creative Informatics & Dept. of Info. and Communication Eng. School of Information Science and Technology. Semantic Computing Initiative.

tauret
Download Presentation

Mitsuru Ishizuka Dept. of Creative Informatics & Dept. of Info. and Communication Eng.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Common Concept Descriptionof Natural Language Texts as a Foundation of Semantic Computingon the Web Mitsuru Ishizuka Dept. of Creative Informatics & Dept. of Info. and Communication Eng. School of Information Science and Technology

  2. Semantic Computing Initiative lay a foundation that allows computers to understand the semantic meaning of Web contents so that they canperform semantic computing on the Web. The aims of CDL are 1) to realize machine understandability of Web text contents, and 2) to overcome language barrier on the Web.

  3. Major Differences from Semantic Web Semantic Computing Initiative • Target of representation: Semantic concepts expressed in texts. • Universal vocabulary (+ additional specific vocabulary in a domain if necessary), and pre-defined relation set. • CDL.nl (richer than RDF) Semantic Web • Target of representation: Meta-data extracted from Web contents. • Domain-dependent ontologies (which cause the difficulty of wide inter-boundary usage) • RDF / OWL (description logic is hard for ordinary people to understand) Main body: Institute of Semantic Computing (ISeC) in Japan Int’l Standardization Activity: W3C Common Web Language(CWL)-XG Tim Berners-Lee says that: “the Data Web” is more adequate rather than “the Semantic Web”. (2007)

  4. Incubator Group Activity at W3Cfrom Oct. 2006 to March 2008

  5. 2nd Incubator Group at W3C from May 2008

  6. CDLs and Semantic Web Tim Berners-Lee(2007): The Semantic Web  The Data Web (more adequate)

  7. Another BroaderView of CDL Development • In 1960s – 1970s The foundation on the common representation and manipulation (retrieval) of Data. Database • In 2000s – 2010s The foundation of the common representation and manipulation of Semantic Information. Common Concept Base It is preferable that this is language independent; in other words, Computer Esperanto Language which is understandable by computers.

  8. Functions and Supporting Standards

  9. English Japanese Chinese UNL (Universal Networking Language) CDL(Concept Description Language) Pivot Language could be Computer Esperanto Language Minimal sufficient relations have been chosen to represent the surface-level concept meaning of texts. From Machine Translation Transfer method Pivot method

  10. UNL (Universal Networking Language) • The development started in1997 at the United Nations Univ. (Tokyo). The chief scientist has been Dr. Hiroshi Uchida. It is now continuously developed under the UNDL foundation. • The purpose is to let people in the world exchange and share textural info. on the Web beyond language barrier. • The design is based on the results of Machine Translation (especially, Pivot method) and Electric Dictionaries. • There have been activities wrt English, Japanese, Chinese, Spanish, French, Arabic, etc.

  11. The defining method of one unique sense of a word in UW (Patent of UN Univ.) • Defining category swallow(icl>bird) the bird “One swallow does not make a summer” swallow(icl>action) the action of swallowing “at one swallow” swallow(icl>quantity) the quantity “take a swallow of water” • Defining possible case relations spring(agt>thing,obj>wood)bending or dividing something spring(agt>thing,obj>mine))blasting something spring(agt>thing,obj>person,escaping (from) prison src>prison)) spring(agt>thing,gol>place)jumping up “to spring up” spring(agt>thing,gol>thing)jumping on “to spring on” spring(obj>liquid)gushing out “to spring out”

  12. UW(Universal Words) in UNL Universal Word uw{(equ>Universal Word)} adjective concept{(icl>uw)} uw(aoj>thing{,and>uw,ben>thing,cao>thing,cnt>uw,cob>thing,con>uw,coo>uw,dur>period,man> how,obj>thing,or>uw(aoj>thing),plc>thing,plf>thing,plt>thing,rsn>uw(aoj>thing),rsn>do,icl>adjective concept}) Achaean({icl>uw(}aoj>thing{)}) Afghan({icl>uw(}aoj>thing{)}) African({icl>uw(}aoj>thing{)}) African-American({icl>uw(}aoj>thing{)}) Ainu({icl>uw(}aoj>thing{)}) Alaskan({icl>uw(}aoj>thing{)}) Albanian({icl>uw(}aoj>thing{)}) Aleutian({icl>uw(}aoj>thing{)}) Alexandrian({icl>uw(}aoj>thing{)}) Algerian({icl>uw(}aoj>thing{)}) Altaic({icl>uw(}aoj>thing{)}) American({icl>uw(}aoj>thing{)}) Anglian({icl>uw(}aoj>thing{)}) Anglo-American({icl>uw(}aoj>thing{)}) Anglo-Catholic({icl>uw(}aoj>thing{)}) Anglo-French({icl>uw(}aoj>thing{)}) Anglo-Indian({icl>uw(}aoj>thing{)}) Anglo-Irish({icl>uw(}aoj>thing{)}) Anglo-Norman({icl>uw(}aoj>thing{)}) Arab({icl>uw(}aoj>thing{)}) Arab-Israeli({icl>uw(}aoj>thing{)}) Arabian({icl>uw(}aoj>thing{)}) Arabic({icl>uw(}aoj>thing{)}) 40,000 lexicons are open to public. The full vocabulary includes 200,000 lexicons as of 2007.

  13. CDLs • CDL.core • defines the basic format. • CDL.nl(or Common Web Language) • describes every possible concept expressed in natural language text in any languages. It provides a concept description scheme wrt word, phase, sentence and documents in any languages, based on the CDL concept model. • A basic vocabulary set including relation vocabulary is given. • CDL.jpn, CDL.eng, CDL.chi, CDL.spa, etc. • Articulate Japanese (明晰日本語) • defines a style of Japanese sentence expressions which suppress ambiguity referring to CDL.nl

  14. CDL.core • Basic Description Elements: EntityNode • Elementary Entity • Composite Entity Hyper-Node RelationLink • Attribute-Value pairscan be added to the entity (and the relation). • These are quasi-relations; if the value takes a complex entity, then we can treat it as a relation.

  15. Event#A01 tmp=‘past’ Event#B01 tmp=‘past’ agt agt report#a01 buy#b01 John# computer#b02 ral=‘def’ obj obj gol tim yesterday#b03 Alice# Representation with CDL.nl • <John reported to Alicethat he bought a computer yesterday.> • {#A01 Event tmp=‘past’; {#B01 Event tmp=‘past’; {#b01 buy;} {#b02 computer ral=‘def’;} {#b03 yesterday;} [#b01 agt John] [#b01 obj #b02] [#b01 tim #b03] } {#John John;} {#Alice Alice;} {#a01 report;} [#a01 agt #John] [#a01 gol #Alice] [#a01 obj #B01] }

  16. This set is not sufficient for representing every concept expressed in natural language texts. It cannot be used for every language due to its language (English) dependency. Semantic Role Labels in PropBank The focus is on Predicate-Argument Structure. • Arg0(prototypical agent) • Arg1(prototypical patient) • Arg2 (indirect object/benefactive/instrument/attribute/end state) • Arg3(start point/benefactive/instrument/attribute) • Arg4(end point) • Arg5( ) • TMP(time) • LOC(location) • DIR(direction) • MNR(manner) • PRP(purpose) • CAU(cause) • MOD(modal verb) • NEG(negative marker) • ADV(general-purpose modifier) • DIS (discourse particle and clause) • PRD(secondary predication) These are defined wrt each word sense. Ex) buy:: Arg0: buyer Arg1: thing bought Arg2: seller (bought-from) Arg3: price paid Arg4: benefactive (bought-for)

  17. CDL.nl Relations (1) • RELATION • ElementalReration要素関係 • FunctionalRelation機能的観点 [AgentRelation主体関係] • agt(agent:動作主) • cag(co-agent:並行動作主) • aoj(thing with attribute:属性主) • cao(co-thing with attribute:並行属性主) • ptn(partner:相手) [PatientRelation被行為体関係] • obj(affected thing:対象) • cob(affected co-thing:並行対象) • opl(affected place:場所対象) • ben(beneficiary:受益者) [InstrumentRelation道具関係] • ins(instrument:道具) • mat (material:材料) • met(method or means:方法) [PlaceRelation場所関係] • plc(place:場所) • plf(initial place:起点) • plt(final place:終点) • vip (intermediate place, via place:場所経由) [StateRelation状態関係] • sta(state:状態) • src(source, initial state:始状態) • gol(goal, final state:終状態) • vis(intermediate place or state:経由) [TimeRelation時間関係] • tim(time:時間) • tmf(initial time:始時間) • tmt(final time:終時間) • dur(duration:期間)

  18. CDL.nl Relations (2) [SceneRelation場面関係] • scn(scene:場面) • vic(via scene:場面経由) [CausalRelation原因関係] • con(condition:条件) • pur(purpose or objective:目的) • rsn(reason:理由) [OrderRelation順序関係] • coo(co-occurrence:同起) • seq(sequence:先行) [MannerRelation様態関係] • mal (qualitative manner:質的仕方) • mat(quantitative manner:量的仕方) • bas(basis for expressing a standard:基準) [ModificationRelation限定関係] • mod (modification:限定) • qua (quantity:量的限定) • pos(possessor:所有者) • cnt(content, namely:内容) • nam(name:名前) • per(proportion, rate or distribution:単位) • fmt (range/from to:範囲) • frm (origin:起源点) • to (destination:目的点) [LogicalRelation論理関係] • and(conjunction:連言的) • or (disjunction, alternative:選言的) • not (complement:補集合) [ConceptRelation概念関係] • equ (equivalent:同義) • icl (included / a kind of:上位) • tof (type-of:具体化) • pof(part-of:部分)

  19. CDL.nl Relations (3) • InterThingOrInterEventRelation間事物・間事象関係 [ConnectingRelation接続関係] • cau(causal:順接) • adv(adversative:逆接) • adt (aditive:添加) • cot (contrastive:対比) • par (parallel:同列) • att(attached:補足) [RefferingRelation参照関係] • rfi(reffered by identically:同一参照) • rfp (reffered by partially:部分参照) • rfw (reffered by wholly:包含参照) • AttensionRelation 注目関係 • ent (entry:主概念) • foc (focus:焦点) • qfo (question focus:質問焦点) • tpc (topic:トピック) • com (comment:コメント)

  20. Rough Correspondence between Semantic Relations of PropBand and CDL.nl (1) • Arg0(prototypical agent)agt(agent),cag(co-agent),aoj(thing with attribute),cao (co-thing with attribute) • Arg1(prototypical patient)obj (affected thing),cob(affected co-thing) • Arg2(indirect object/benefactive/instrument/attribute/end state) ---,ben(beneficiary),ins (instrument),mat (material), met (method or means), sta (state),gol (goal, final state) • Arg3(start point/benefactive/instrument/attribute) plf(initial place),ben (beneficiary),ins(instrument), mat(material),met (method or means),sta (state) • Arg4(end point)plt (final place),to (destination) • TMP(time)tim (time),tmf(initial time),tmf (final time),dur(duration) • LOC (location)plc(place) • DIR (direction)to (destination) • MNR(manner)mal(qualitative manner),mat(quantitative manner) • PRP (purpose)pur (purpose or objective) • CAU (cause)rsn(reason) • MOD(modal verb)an attribute in CDL.nl • NEG(negative marker)an attribute in CDL.nl • ADV (general-purpose modifier) mod (modification),qua (quantity),pos(possessor),cnt (content), nam(name),per(proportion, rate or distribution), fmt (range/from to), frm(origine) • DIS(discourse particle and clause)[inter-sentence relation] • PRD (secondary predication)[unique in English]

  21. Rough Correspondence between Semantic Relations of PropBand and CDL.nl (2) Other CDL.nl Relations • [AgentRelation] ptn(partner): an indispensable non-focused initiator of an action Ex) He competes with John. Mary collaborates with him. • [PatientRelation] opl(affected place): a place in focus affected by an event. Ex) He cut the paper in middle in the room. • [PlaceRelation] vip(intermediate place, via place) • [StateRelation] src(source, initial state) vis(instermediate place or state) • [SceneRelation] scn(scene): a scene where an event occurs, or state is true, or a thing exists. A scene is different from plc in that plc is the real place something happens, whereas scn is an abstract or metaphorical world. Ex) He won a prize in a contest. He played in the movie. vic(via scene) CDL.nl’s Relations other than Predicate-Argument Relations • [OrderRelation] coo (co-occurrence) seq(sequence) • [LogicalRelaion] and (conjunction) or (disjunction, atternative) not (complement) • [ConceptRelation] equ (equivalent) icl (included/a kind of) tof (type of) pof (part of) • [ConnectingRelaion] cau (causal) adv (adversative) adt (additive) cot (contrastive) par (parallel) att (attached) • [ReferringRelaion] rfi (referred by identically) rfp (referred by partially) rfw (referred by wholly) • [AttensionRelation] ent (entry) main (main element) in Connexor foc (focus) qfo (question focus) tpc (topic) com (comment)

  22. Rich Attributes in UNL and CDL.nl • Express subjectivity evaluation of the writer/speakerfor the sentence. • Ex.) tense, aspect, mood, etc. • Time with respect to speaker @past @present @future • Writer’s view on aspect of event @begin @complete @continue @custom @end @experience @progress @repeat @state • Writer’s view of reference @generic @def @indef @not @ordinal • Writer’s viewof emphasis, focus and topic @emphasis @entry @qfocus @theme @title @topic • Writer’s attitudes @affirmative @confirmation @exclamation @imperative @interrogative @invitation @politeness @respect @vocative • Writer’s View of reference @generic @def @indef @not @ordinal • Writer’s feeling and judgements @ability @get-benefit @give-benefit @conclusion @consequence @sufficient @grant @grant-not @although @discontented @expectation @wish @insistence @intention @want @will @need @obligation @obligation-not @should @unavoidable @certain @inevitable @may @possible @probable @rare @regret @unreal @admire @blame @contempt @regret @surprised @troublesome • Describing logical characters and properties of concepts @transitive @symmetric @identifiable @disjoint • Modifying attribute on aspect @just @soon @yet @not • Attribute for convention @passive @pl @angle_bracket @brace @double_parenthesis @double_quote @parenthesis @single_quote @square_bracket

  23. Discourse (Inter-sentence) Relations are missing in current CDL.nl Discourse Relations at ISO/TC37/SC4/TDG3 (34 types) • derivation • causes • conditional • inference • purpose • trigger • compromise • conflict • contrast • unconditional • comparison • disjunction • dissimilar • manner • otherwise • proportion • similar • strongComparison • detail • element • example • extraction • general-specific • minimum • part • process-step • Restatement • constraint • supplement • background • content • evaluation

  24. Surface Level Concept Description Deep Semantic Level Concept Description Levels • There are several choices for the deep semantic-level description depending on applications. On the other hand, a certain consensus has been made wrt “Concept Description” which is slightly below the surface level, through decades-long researches on NLP, machine translation and electric dictionaries. • Whereas a complete consensus has not been achieved yet regarding the Concept Description level and its description scheme, it is meaningful to set up a common concept description format as an international standard today.

  25. situation (discourse) temporal and causal relations, etc., and coreference composite concept/event (complex sentence) agent-patient relation, phrasal relation, etc. single event (single sentence) consisting of proposition and modality components composite entity predicate, case components, predicate-modification components, etc. elementary thing/entity corresponding to disambiguated word sense Hierarchical Construction of Concept Representation in CDL.nl

  26. Current Major Issues in CDL.nl • Semi-automatic Conversion from Text. (Text generation from CDL.nl is not so difficult.) • Semantic Retrieval of CDL Data (The design of a CDL Query Language (CDQL) and its processing mechanism) • Killer Application(s) -- information exchange and share beyond language barrier. -- semantic patent document retrieval. 2014/8/15 26

  27. Approaches for Generating CDL Data • Manual Coding & Editing • Even in this case, a graphical input editor is necessary. • Graphical Input & Editing (Hasida’s Semantic Authoring) • Some Manual Tagging to Text, then Conversion into CDL. • Semi-automatic Conversion from Text (1) • Graphical interface for selecting a right one among possible candidates. • Semi-automatic Conversion from Text (2) • Manual disambiguation of the word sense (a pull-down menu selection), then automatic conversion into CDL. • Full Automatic Conversion (ultimate goal) Our current approach An approach taken at UNDL foundation

  28. Semantic Authoring (by K. Hasida):A Graphical Input Approach Coarse Grain Fine Grain

  29. Semantic Parsing • Language processing is going through: • Syntactic parsing • Dependency parsing • Shallow semantic parsing • Semantic Role Labeling • Given a sentence: • The brave soldiers fought with their enemies for their country in the War • Assign predicate-argument roles to sentence elements. (Who did What to Whom, When, Where, Why, How, etc.) • [ARG0The brave soldiers] [rel fought] with [ARG1-with their enemies] for [ARG2-for their country] in [ARGM-loc the War] (in PropBank) • Corpora: PropBank, FrameNet, …

  30. Dependency Parser as a lower basisof our Semantic Analysis Connexor Machines Text Analyser • Syntactic Functions pcomp (prepositional complement) Ex) They are in that red car. phr (verb particle) Ex) She looked up the word in the dictionary. subj (subject) obj (object) comp (subject complement) Ex) John remains a boy. dat (indirect object) Ex) John gave her an apple. oc (object complement) Ex) John called him a fool. corpred (copredicative) Ex) John regards him as foolish. com (comitative) Ex) Drinking with you is nice. voc (vocative) Ex) John, come here! frq (frequency) qua (quantity) meta (clause adverbial) Ex) So far, he has been …. cla (clause initial adverbial) Ex) Under his guidance, they can .... ha (heuristic prepositional phrase attachment) Ex) escape trough ..., fight for ... det (determiner) neg (negator) not attr (attributive nominal) Ex) industrial editor mod (other postmodifer) Ex) … of … ad (attributive adverbial) Ex) So much for modern technology, cc (coordination) Ex) and • Dependency Functions close to the semantic role main (main element) agt(agent) : The agent by-phase in passive sentences. Ex) The dog was chased by the boy. ins(instrument) tmp (time) dur (duration) Ex) ...experience in the past 10 years. man(manner) loc (location) sou (source) Ex) ... move away from the street. goa (goal) Ex) ... shift to a full power. pth (path) Ex) ... travel from Tokyo to Beijing. cnt (contingency (purpose or reason)) Ex) ... unable to say why he was too .... cod (condition) qn (quantifier)

  31. Named Entity Recognition • In Connexor Machinese Text Analyser • + org (organization, company) • + loc(location) • + ind (individual) • + name (name) • + role (occupation, title) This info. is useful as a lexical feature for the semantic parsing.

  32. Conversion of text into CDL.nl through Shallow Semantic Parsing • Original text The records retrieved in answer to queries become information that can be used to make decisions . # • Separate each word with a ID The \ w37 records \ w38 retrieved \ w39 in \ w40 answer \ w41 to \ w42 queries \ w43 become \ w44 information \ w45 that \ w46 can \ w47 be \ w48 used \ w49 to \ w50 make \ w51 decisions \ w52 . \ w53 # • Connexor Analiser’s output det: ( w38 w37 ) subj: ( w44 w38 ) mod: ( w38 w39 ) loc: ( w39 w40 ) pcomp: ( w40 w41 ) mod: ( w41 w42 ) pcomp: ( w42 w43 ) main: ( w36 w44 ) comp: ( w44 w45 ) subj: ( w47 w46 ) v-ch: ( w48 w47 ) v-ch: ( w49 w48 ) mod: ( w45 w49 ) pm: ( w51 w50 ) cnt: ( w49 w51 ) obj: ( w51 w52 ) obj: ( w51 w53 ) # • Hand-coded partial CDL.nl relations obj(w39 w38) gol(w39 w41)pur(w39 w43) aoj(w44 w38) obj(w44 w45) obj(w49 w45)pur(w49 w51) obj(w51 w52)#

  33. Relation/Role Set Comparison • Propbank • describes how a verb relates to its arguments. • FrameNet • describes how to describe words with its arguments in a related common scenario. • Common disadvantages of FrameNet & Propbank role set: • The set covers only predicate-argument roles; they don’t consider any other types of relationships between entities. • CDL.nl • CDL relation set describes how words are correlated and what the meanings of their relationships are. • Advantages of CDL.nl relation set • Each relation in the set is pre-defined along with distinctive information from other similar relations. • It describes not only predicate-argument relations, but also those between each pair of entities there exists a meaningful relationship. Thus it has better coverage. • The set has been chosen so that every concept expressed in texts can be sufficiently encoded. • It is universal, i.e., language independent, and can be applied to any language.

  34. CDL Relation Set • Used to describes shallow semantic structure of text. • Relations have been chosen to be able to sufficiently represent the semantic concepts of texts, and are predefined. • The set of relations contains all relation types which are organized roughly intothree groups: • intra-event relations (22) • agt(agent), aoj(thing with attribute), cag(co-agent), cao(co-thing with attribute), ptn(partner), ….. • inter-entity relations (13) • and(conjunction), con(condition), seq(sequence), ….. • qualification relations (9) • mod(modification), pos(possessor), qua(quantity), …..

  35. nam Mod Obj Aoj And Agt Man Plc Gol Tim Pur Qua #rel 3128 2697 2069 1122 1046 788 446 395 321 289 269 nam Pos Scn Rsn Src Cnt Dur Bas Met Equ Nam Con 46 #rel 86 71 65 63 61 58 49 47 41 41 nam Ben Tmt Pof Frm Or Fmt Tmf Seq To Iof Cag #rel 27 25 24 23 21 20 19 17 12 11 10 nam Icl Via Coo Per Ins Plt Ptn Plf Cao Opl Cob 10 9 8 8 8 7 6 4 2 1 0 #rel Frequencies of CDL Relations • Data sparseness : • The whole number of relation:13487 • Relation type: 44 • Average num per relation: 306.5

  36. Feature spaces Combine information from different language processes: • syntactic analysis • tells the details of the word forms used in the text and the syntactic structures among words. • dependency analysis • A dependency relation specifies anasymmetric relationship between words, where oneword is a dependent of the other word, which iscalled its governor. • lexical construction • Lexical meaning contains two parts of information: word sense and semantic behavior which is all the semantic relationships the word may contain.

  37. Syntax and Dependency Features • Connexor Machinese Text Analyser • based on a functional dependency grammar • Syntax Features • Morphology features • Syntactic features • Dependency Features • Relation type • Dependency Path

  38. Identification of Entity Pair with a Semantic Relation Testing all possible pairs is not efficient. • Step 1: For each input sentence, generate a dependency tree that specifies the syntactic head in the sentence. • Step 2: Find a headNode set from the dependency tree. Each can be a headword of a head entity to govern a relation. We select nodes which have subtree, and omit those which cannot be headNodes by creating a head stoplist. • Step3: For each headNode, check its subtrees to find those that can be tail entities to the headNode. We create a tail stoplist containing those that cannot be root nodes of subtrees of tail entities. Repeat this process. • Step 4: A simple post-processing is applied to correct the boundaries within which the dependency tree does not show correct relationship. A dependency tree generated from Connexor Machinese Analyser Entity pairs [fought, (the brave soldiers)] [fought, (their enemies)] [fought, (their country)] [fought, (the War)] [soldiers, brave] [enemies, their] [country, their] ……..

  39. root Syntactic and Dependency-path features main: loc: pcomp: in fought subj: phr: War ha: det: for soldiers with attr: pcomp: the det: pcomp: brave country The enemies attr: Lexical features from WordNet, VerbNet and UNLKB. attr: their their Features for CDL Relation Recognition Some labels of Connexor Machinese Analyser: ha(prepositional phase attachment),phr(verb particle),pcomp(subject complement)

  40. Experimental Setting • Datasets: • Manual-annotated dataset (1700 sentences documents) • Contains 13487CDL.nl relations (44 types of relations) • We choose to train on top 36 relation types with large number of training instances. • Tools • SVM-light software • Connexor Machinese Text Analyser • Scheme • 10-fold cross validation • One-vs-all

  41. Results Kernel   Precision    Recall     F-value KS 80.10     86.35    83.11 KD85.43      83.57     84.49 KL73.98      82.75     78.12 KS+D87.1986.27     86.73 KS+D+L87.35      88.07     87.71 The table shows the performanceof the SVM using individual kernels incrementally. S: syntactic features D: dependency features L: lexical features

  42. Supporting Input Editor • Selection among possible candidates proposed by a computer analysis. (Like Japanese Input Front-end Processor.) • Graphical verification and editing.

  43. Semantic Search with CDL.nl beyond Keyword-based Search • Baseline: Combination of keywords • (The use of bi-gram, tri-gram,… does not lead to the improvement of performance.) • A pair of dependent words leads to a slight improvement. • Search using natural language queries is preferable in a sense. However, when the search result is unsatisfactory, it is not obvious for a user how to modify the query sentence. • The CDL.nl-based Searchallows a more specified search based on a set of words with named dependency relations, rather than with the simple (non-named) dependency. • It also allows a search with using more specific word concepts such as one modified by attributes and/or larger concept units than a single word. • It allows a search taking account of semantic relevancy, such as similarity between two words, a relation between words in a sentence, etc.

  44. Interface for CDL.nl Data Retrieval (Query) • Query by Natural Language • SQL-like Query Language: CDQL • Graphical Query Interface for CDL.nl data

  45. Approach to Implementing CDQL • 1st Step • It is not easy to implement it from scratch. • Thus we utilize SPARQL which is the query language for RDF data. • SPARQL is backed by Jena RDB. • Next Step • Maybe original implementation for the CDQL processing.

  46. CDL.nl Data Retrieval System

  47. CDL to RDF

  48. CDL.nl Data converted into RDF Graphical Form

  49. CDL Data Retrieval via SPARQL :: a simple case Query (the RealizationLabel of) a person to whom John reported.

  50. CDL.nl Data Retrieval via CDQL (an Extended SPARQL) Query:: What did John report.

More Related