290 likes | 433 Views
Semantic Space Creation and Associative Search Methods for Document Databases of International Relations. The 7 th IASTED International Conference on Internet and Multimedia Systems and Applications ~ IMSA 2003 ~ Honolulu, Hawaii, USA August 13 – 15, 2003.
E N D
Semantic Space Creation and Associative Search Methods for Document Databases of International Relations The 7th IASTED International Conference on Internet and Multimedia Systems and Applications ~ IMSA 2003 ~ Honolulu, Hawaii, USA August 13 – 15, 2003 *Shiori Sasaki, **Yasushi Kiyoki, ***Taizo Yakushiji *Graduate School of Media and Governance, Keio University **Faculty of Environmental Information, Keio University ***Faculty of Law, Keio University
Outline of our study • Study Purposes • Background of the research and our basic idea • Outline of the Semantic Associative Search • Outline of the Space Creation method • Experiment Experiment 1-1, 1-2, 1-3: verification of the precision of the created space Experiment 2: examination of the feasibility of the created space • Experimental results and analyses • Conclusion
Study Purposes 1) To realize a semantic space for the searching system which can be applied to the document analysis of International Relations (a methodological value in the study of International Relations) 2) To realize a mechanism which measures semantic relation between the technical terms and the general words, by integrating the space constructed from source A (lexicon) and the space from source B (dictionary) (a value for database engineering)
Our basic idea Quantitative document analysis WITH semantic relations between words and documents researcher Background of our studyDiscourse analysis of International Relations (IR) Traditional methods for quantitative analysis ●Content Analysis ●Cognitive Map e. t. c. results WITHOUT semantic relations Documents (Discourse) Databases on WWW official announcements of the governments, policy statements, parliamentary papers, press briefings, activity reports of NGO, announcements in the form of informal talks of politicians...
EX) ‘engagement’ ‘economical contract’? Or ‘military involvement’? ‘development’ ‘economic development’? Or ‘development of weapon’? Background of our studyTraditional methods of document analysis ●Content Analysis [1][2] A method to analyze the contents of documents: By this method, we can measure the appearance frequency of a word or an encoded sentence in document groups and calculates the correlation level of each word, code and document. ●Cognitive Map[3][4] A method to analyze logical routes of cognition of the policy makers: In this method, we consider a logical structure of the document to be a concept-network in the mind of the speaker and calculated the logical main route and the highlight concept. Weakness: it cannot measure the semantic relations between words. Weakness: it cannot measure the semantic relations between documents. [1]Ole R. Holsti, “Content Analysis” , 1968. [2]Takashi Inokuchi, Numerical Analysis of International Relations,1970. [3]Robert Axelrod ed., The Structure of Decision : The Cognitive Maps of Political Elites, 1976. [4] Christer Jonsson ed., Cognitive Dynamics and International Politics, 1982.
… … f f f f w Specific field Matrix General knowledge Matrix ・ ・ ・ ・ ・ ・ w w Our Basic Idea (1) Basic concept: Integration of semantic spaces of special knowledge and general knowledge (2) Implementation: Creation of an integrated semantic space from two matrix (specific field matrix and general knowledge matrix) Specific field General knowledge w Definition of words Technical terms Specific field General knowledge f f … w Integrated Matrix ・ ・ ・ Integrated knowledge w
Tacit knowledge of specialist IR basic matrix … f Lexicon … … … f f f f f w w General words matrix Specific field Matrix Specific field Matrix General knowledge Matrix General knowledge Matrix ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ w w w General Dictionary Our Basic idea Knowledge of a specific field (International Relations) General knowledge
Outline ofSemantic Associative Search methodbased on Mathematical Model of Meaning (MMM) Step1Creation of metadata space (MDS) Step2Mapping information resources into the image space MDS Step3Selecting the semantic subspace Source: Kiyoki, Y., Kitagawa, T. and Hayama, T.:A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, vol. 23, no. 4, pp. 34-41, 1994.
Feature words f f f f f f n 1 2 d 1 2 n Feature words 1 d C f f f 2 Basic datum Data Matrix Mfor creating semantic space 1 n 1 2 C 2 Selection of subspace according to context words given by searchers (e.g. {C ,C }) Context words Matrix Context words d q Feature words m 1 C I I p 1 2 I I I 2 I Items T M M 10 1 4 Media Data Matrix mapping q I I 3 I 2 I j 9 5 q I v 8 I 6 I 7 Semantic Associative Search methodbased on Mathematical Model of Meaning (MMM) Step1 The eigenvalue decomposition of Step 2 Step 3 Kiyoki, Y., Kitagawa, T. and Hayama, T.:A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, vol. 23, no. 4, pp. 34-41, 1994.
Semantic subspace data A1‘ context1 Context Recognition Mechanism in MMM (1) The context represented as a set of keywords is given by a user. (2) A subspace is selected according to the given context. (Context Recognition) (3) Each media data is mapped onto the subspace, and the norm of the vector(A1’) is calculated as the correlation value between media data and the context. data A1 q2 qμ the correlation of data A1 for given context1=(sad,silent)
Outline ofSemantic Associative Search method Step1Creation of metadata space (MDS) Step2Mapping information resources into the image space MDS Step3Selecting the semantic subspace
Outline of the Space Creation methodProcess to create a data matrix for space creation Step1.Creation of an IR basic matrix by using a lexicon of the terms [1](①) Step2.Creation of a general words matrix by using a general dictionary [2](②) Step3. Integration of the IR basic matrix and the general words matrix 3-1. Defining the technical terms by the general words(③) 3-2. Relating the general words to the technical terms(④) [1] Evans, Graham and Newnham, Jeffrey : Dictionary of International Relations, Penguin Books, 1998. [2] Longman Dictionary of Contemporary English, Longman, 1987.
Extraction of items in index of IR lexicon as • ‘technical basic terms’ • (vertical elements) 2) Extraction of related terms in explanation of items as ‘related feature words’ (horizontal elements) ABC Weapons ABM Accidental war Accommodation ACP Act of war Action-reaction Actor Adjudication Administered territory Afghanistan Agent-structure Aggression Aid AIDS Air power Alien Alliance : Accommodation: Term much beloved of crisis management theorists and practitioners of negotiational diplomacy. It refers to the process whereby actors in conflict agree to recognize some of the others’ claims while not sacrificing their basic interests. The source of conflict is ... … Extraction of vertical elements and horizontal elements of IR basic matrix
ABC Weapons Accidental war Accommodation Actor Arms Control : : Zero-Sum Zionism Zone of peace ability able about arms : control : force power remove : threat : war weapon : Example 1 :positive relation -1 :negative relation 0 :no relation ABC Weapons Accidental war Accommodation Actor Arms Control : : : Zero-Sum Zionism Zone of peace ability able about arms : control : force power remove : threat : war weapon : 1 1 0 1 1 0 0 ... 1 0 1 1 0 0 1 0 1 0 -1 0 1 0 -1 0 1 1 ... IR basic matrix 1 1 0 0 1 0 0 ... 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 ... 0 0 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0.. General words matrix
Realization of Semantic Space for International Relations • IR basic matrix(①) → IR space 712 basic words, 712 feature words710 dimensional vectors • General words matrix(②) → General words space 2115 basic words, 2115 feature words • Integrated matrix(①②③④) → Integrated space 2115+712 basic words,2861 feature words2846dimensional vectors
Experiments to verify the precision and feasibility of the created space Experiment 1 Comparison of correlations of words between IR space and the Integrated space Experiment 1-1 : results in the Integrated space Experiment 1-2 : comparison IR space with the Integrated space Experiment 1-3 : comparison IR space with the Integrated space Experiment 2 Document retrieval in the Integrated space
Experiment1-1 Purpose of the experiment: To verify that it is possible to retrieve IR technical terms by keywords of general words. Evaluation method: We selected the general words ‘trade,’ ‘environment’ and ‘human’ as keywords of queries and checked each result of the top 10 in the integrated space. Experimental results: The IR technical terms related to the keywords of general words are ranked in the top 10. Table: The retrieval results in the integrated space
Experiment 1-2 Purpose of the experiment: To verify that the integrated space realizes high quality retrieval for IR terms. Evaluation method: We selected an IR term ‘arms control’ as a keyword for a query and retrieved correlated words in IR space and the integrated space. Then we compared the top 30 retrieval results.
Experiment 1-2 Experimental results: For query ‘arms control,’ the words such as ‘weapon,’ ‘INF treaty,’ ‘START II’, ‘START I,’ ‘weapons of mass destruction,’ and ‘cruise missile’ are included in the top 30 in the integrated space, which are closer to ‘arms control’ in semantic relation. At the same time, the words such as ‘tactical nuclear weapons’ and ‘CTBT,’ which are closer to ‘arms control’ in semantics, are selected in the high ranking in the integrated space.
Experiment1-3 Purpose of the experiment: To verify that the integrated space keeps the retrieval quality without breaking the fundamental structure of the IR space. Evaluation method: • We selected 15 IR technical terms such as ‘weapons of mass destruction,’ ‘economic liberalism’ and ‘non-tariff barriers’ as keywords from each issue area of IR; security, political economy, international organization, human rights, global environmental problem, theoretical perspective and concept. • For these keywords, we retrieved the words in the IR space similarly in Experiment 1-2, and fixed the top 10 words as correct answers. • In the integrated space, we measured the ratio of the correct answers ranked in the top 10 and top 20.
Experiment1-3 Experimental results: For more than a half of queries, the rate of the correct answers included in the top 10 retrieval results in the integrated space was 70%. For all queries, the correct answers were included in the top 20 at the high rate of 80% to 100%.
Experiment 2: document retrieval in the integrated space(from keywords to the document with metadata) • Purpose of the experiment: • To examine the feasibility of the Semantic Associative Search using the created space. • Evaluation method: • We collected 40 documents concerning IR from WWW as retrieval candidates and prepared three kinds of metadata set for documents; 1) only IR technical terms, 2) only general words, 3) both IR terms and general terms. • We also classified type of keyword for queries into three kinds; 1) by only IR terms, 2) by only general term, 3) by both IR term and general term. Table: Combinations of metadata and keyword for queries
Experiment 2: document retrieval in the integrated space(from keywords to the document with metadata) Evaluation method: We selected the IR term ‘conflict,’ ‘crisis’ and the general word ‘attack,’ ‘crash’ as keywords for queries and fixed eight documents about the security as correct answers in advance. Then put them ID’s as doc5, doc10, doc15, doc 20, doc25, doc30, doc35 and doc40. Examples of metadata: pattern3:: both IR terms and general terms
Experiment 2 : document retrieval in the integrated space Experimental results ・The case in which the document with metadata of both IR terms and general words were retrieved by the keywords of both IR terms and general word show the best retrieval quality (IX). ・Even if the documents were given only IR-terms-metadata or the documents were given only general-word-metadata (III), they were marked relatively high relevance ratio by the keyword set of both IR terms and general words (VI). ・The case in which the document only with IR-term-metadata were retrieved by the keyword of general word (II) and the case in which the document only with general-word-metadata were retrieved by the keywords of IR (IV), the relevance ratio was not so high but at least 5 documents were ranked in the top 10.
Experimental results and analyses • Experiment 1 From the results, it was clarified that the characterization including both defining and relating is made more appropriately in the integrated space than in the single IR space. It was also clarified that not only the IR terms but also the general words which reflect the knowledge of IR field could be retrieved by both general words and IR technical terms. • Experiment 2 Form the results, it is clarified that the documents which consist of IR terms can be retrieved by using general words, and the ones which consist of general words can be searched by using IR term in the integrated space.
Conclusion • We have presented the creation method of semantic retrieval space for document analysis of International Relations and clarified the precision and feasibility of the new created space. • According to this method, we can realize Semantic Associative Search and analysis of documents dynamically according to our concern or viewpoint and also realize computations of the semantic relations between words and documents as an amount of the correlation. • This space creation method can be applied to other specific fields if only a lexicon of the field and a general dictionary as Longman exist.
Future works • We will develop this integrated semantic space for the mechanism to treat the time-series document data so as to analyze the changes of cognition, attitudes and values of various actors---country, region, organization, government, ethnic group and individual. • We will apply this semantic space to data-mining of web contents.