130 likes | 295 Views
CQL / ZING-SRW. 宮澤 彰 国立情報学研究所. 環境. SRW 中の query で使用 <SRW:query> CQL query </SRW:query> XML の中で記述 系列としては ISO 8777 (JIS X0803) の系統であるが、新しい言語. Grammar (1). cql-query ::= cql-query boolean search-clause | search-clause boolean ::= "and" | "or" | "not" | prox
E N D
CQL / ZING-SRW 宮澤 彰 国立情報学研究所
環境 • SRW中のqueryで使用 • <SRW:query>CQL query</SRW:query> • XMLの中で記述 • 系列としては ISO 8777 (JIS X0803)の系統であるが、新しい言語 MIYAZAWA Akira
Grammar (1) cql-query ::=cql-query boolean search-clause | search-clause boolean ::= "and" | "or" | "not" | prox search-clause ::= "(" cql-query ")" | [index-name relation] term MIYAZAWA Akira
Grammar (2) index-name ::=[ index-prefix "."]index-base-name relation ::= base-relation{"/"qualifier} base-relation ::= order-relation | "=" | "exact" | "all" | "any" | "scr" qualifier ::= "relevant" | "fuzzy" | "stem" | "phonetic" order-relation::= "<" | ">" | "<=" | ">=" | "<>" MIYAZAWA Akira
Grammar (3) (prox) prox ::= "prox" [ "/" prox-qualifiers ] prox-qualifiers ::= [ prox-relation ] "/" [distance] "/" [ unit ] "/" ordering |[ prox-relation ] "/" [ distance ] "/" unit |[ prox-relation ] "/" distance |prox-relation unit ::= "word" | "sentence" | "paragraph" | "element" prox-relation ::= order-relation | "=" distance ::= non-negative-integer ordering ::= "ordered" | "unordered" MIYAZAWA Akira
Grammar (4) (basic components) index-prefix ::= identifier index-base-name ::= identifier identifier ::= string term::= string | ""string"" string ::= a character string (space / = < > ( ) " must be double quoted) MIYAZAWA Akira
Term and search-clause term 猫 "犬 猫" search-clause subject = 猫 dc.title = 吾輩 srw.resultSetName = 001 temperature <= 100 MIYAZAWA Akira
Search-clause (2) title = "犬 猫" (word 犬 と 猫がこの順) title all "犬 猫" (word 犬 と 猫の両方) title any "犬 猫" (word 犬 と 猫のどちらか) title exact "犬 猫" ("犬 猫"という文字列) title scr "犬 猫" (server choice relation) (term "犬 猫"は、srw.serverChoice scr "犬 猫") MIYAZAWA Akira
Qualifiers (of Relations) title =/stem "these completed dinosaurs" (matches "The Complete Dinosaur") subject any/relevant "fish frog" (matches "tuna, coelocanth, toad amphibian, etc) author all/fuzzy "kernaghan richie" (matches Kernighan & Ritchie's book) subject =/phonetic rose (matches rows, rhos, roes) -- algorithm is implementation dependent. MIYAZAWA Akira
Pattern matching dinosaur* ??動物 ^動物 (wordの先頭が動物で始まる) 動物^ (wordの最後が動物で終わる) MIYAZAWA Akira
Boolean 犬 or 猫 author = 夏目漱石 and 猫 title = 猫 not subject = *動物 犬 or 猫 and 動物 is same as (犬 or 猫) and 動物 (left to right, no precedence) MIYAZAWA Akira
Prox (近接演算) 犬 prox/<=/3/word/ordered 猫 srw.serverChoice scr 犬 が成り立ち、その後ろ3word以内でsrw.serverChoice scr 猫が成り立つ • proxがboolean扱い(andやorのならび) • 以内(<=)の他、=, <, >, >=, <> • word以外に、sentence, paragraph, element • ordered以外に、unordered (default) MIYAZAWA Akira
問題点(多言語) • wordは、syntaxで決まる(spaceやその他の特殊文字で区切られている)ことを前提にしているが、検索対象は必ずしもそうでない(日本語、中国語、タイ語など)点で、問題がありそう。 • proxのunitで、characterが必要ではないか MIYAZAWA Akira