1 / 12

Finding Associations in Collections of Text

Develop tools for users to access and understand large multimodal information, extract implicit data, use KDT for knowledge discovery. Architecture, sources of info, GUI, association tasks, query language, constraints, BNF grammar, query execution, presentation, applying FACT to news data. Improves discovery process over databases.

mildredw
Download Presentation

Finding Associations in Collections of Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Associations in Collections of Text 99419-511 김유환

  2. Introduction • The need to develop tools to help users access and understand large quantities of multimodal information • Nontrivial extraction of implicit, previously unknown, and potentially useful information from data • KDT(Knowledge discovery from Text)

  3. The FACT System Architecture • Three sources of information • Knowledge Sources • Background Knowledge • unary and binary predicates over the keyword labeling the documents • 유의어 사전 • GUI • Text Collections • Must either already be labeled with a set of keywords • Or must be fed through a text categorization system that augments documents with such keywords

  4. Associations • FACT focuses on the task of finding association in collections of text. • r={t1,…,tn} : Collection of documents • R={I1,…,Im} : Set of Keywords • t(A) = 1 : A is one of the keywords labeling t • (X) : The set of all documents ti that are labeled (at least) with all the keywords in X. • X is called a -covering if |(X)|>=  • W=>B : association over over r • all documents that are labeled with the keywords in W, at lest a proportion r of them are also labeled with keywords in B

  5. The Query Language • Association-discovery query • What type of keywords are desired in the left-hand and right-hand side of any found associations • Any found association to satisfy • unary predicates • binary predicates : define relationships between keywords • Constraints on the size of the various components of the association • BNF grammar

  6. The Query Language (2) Find : (5/0.5) c1:country, c2:country=>t:topic Where : c1G7, c2 {Arab League}, tExportCommodities(c1) • at least half of the time, whenever a G7 country and an Arab League country label a document, the document is labeled by some topic that is not an export commodity of the G7 country, and this occurs at least 5 times in the collection

  7. Query Execution • 사전 지식 • -cover인 집합의 부분집합은 모두 -cover이다. • The set of candidate -covers is built incrementally, starting from singleton -covers and adding elements to a set so long as the set stays a -cover • Finding associations in the presence of constraints

  8. Presentation of Associations • Provide a browsing tool that helps the user easily focus on the subset of results that are potentially relevant

  9. Applying FACT to Newswire Data • Reuters data • Background Knowledge : CIA World FactBook • Run a series of queries using FACT and compared the CPU time and the number of associations found for each query • 결과 • the specification of background-knowledge constraints actually provides information that is exploited by our discovery algorithm, speeding up the association-discovery process

  10. Final Remarks • Better than Database Query • Presents the user with an easy-to-use graphical interface in which discovery tasks can be specified

More Related