550 likes | 565 Views
This paper discusses the updating methods and relations among concepts in DOE research students, focusing on statistical methods, user feedback, and automation.
E N D
Updating methods and relations among conceptsin DOE Research Students:Chakravarthi S VelvadapuGovind R MaddiRatnakar R KrishnamaFaculty Advisors:Dr.James Gil De LamadridDr.Sadanand SrivastavaCADIP’02 Conference Sponsored byUS Department Of Defense
OVERVIEW • The system takes text documents as its input • Performs semantic analysis on these documents • Generates useful ontology • Represents it graphically
GOAL OF THE PROJECT To build an Ontology utilizing • Statistical methods • A small amount of user feedback • Automation
Architecture of DOE Text Document Pre-processing Normalization Latent Semantic Indexing (SVD) Document Ontology Graph Construction GUI Updating Methods
Pre-processing • Read-in text file • Extract meaningful terms • Count their frequencies
Normalization • Calculate weight of each term using • W i,k = frequency i,knk • Σfrequency j,k • Calculate weight of each term using W i,k = frequency i,knk Σfrequency j,k j=1
Normalization(contd) • Calculate normalized weight using W i,k w(i,k) nk sqrt(Σ w2(j,k)) j=1
Build Term-Doc Matrix • Rows of Term-Doc matrix contains weights of each term in different concepts • Columns of Term-Doc matrix contains weights of different terms in each concept
Latent Semantic Indexing(LSI) • Statistical method representing documents by statistically independent concepts • Based on Singular Value Decomposition (SVD),technique that decomposes a given matrix A into three components – U, S and V.
SVD • A is formed from LSI as follows: A = US * SS * VsT US - derived from U removing all but the s columns SS - derived from S removing all but the largest s singular values VsT - derived from VT removing all but the s corresponding rows
SVD (contd) US SS VsT A m x n U m x n S n x n VT n x n
Document Ontology • Build Concept Nodes and Term Nodes using columns and rows of the term matrix (U).
Graph Construction • A bipartite graph is constructed with concept nodes and term nodes • A concept node is connected to all term nodes that belong to it. • A term node is connected to all concept nodes to which it belongs.
Graph Construction (contd) Term 1 Concept 1 Term 2 Term 3 Term 4 Concept 2 Term 5
Graphical User Interface (GUI)
GUI (contd) • GUI consists of • Concepts list • Terms list • Display for bipartite graph • Display for relations among concepts • Display for list of files in ontology
GUI • To view terms related to a concept, user selects that concept from concepts list • To view concepts related to a term, user selects that term from terms list
New Open Save saveAs Close Exit GUI – File Operations
GUI – Ontology Updates • Add • Delete • ChangeSVDThreshold • changeConcThreshold • ChangeDuplicateThreshold • foldInDoc • SVDUpdate • defaultBuild
GUI – Ontology Modifications • Rename • Renames a selected concept • DelTerm • Deletes a selected term • Undo • Ignores last modification and returns to the previous state
Adding new documents • Investigated less expensive methods for adding new documents: • Fold-In • SVD update
Fold-In • A method to add new document(s) to an existing ontology • Uses the existing data in document addition process • Less expensive process than the regular build method
Fold-In(contd) • Two step method • Step1 • Fold-In document vector • Compute new document vector(V) using d^ = dT * Uk * Sk-1 where d is document vector to be added • Append d^ to the columns of Vk
Folding-In document vector Uk Sk k x k VkT k x (n+p) Ak m x (n+p) Uk m x k
Fold-In (contd) • Step 2 • Fold-In term vector • Compute new term vector(U) using t^ = t * Vk * Sk-1 where t is term vector to be added • Append t^ to the rows of Uk
Folding-In term vector VkT k x n Ak (m+q) x n Uk (m+q)x Sk k x k
Fold-In (contd) • Using new document vector ( Vk ) and new term vector ( Uk ) • Rebuild concept nodes and term nodes • Reconstruct bipartite graph • Update GUI
SVD Update • A method to add new document(s) to an existing ontology • Uses the existing data in document addition process • Less expensive process than the regular build method
SVD Update (contd) • Three step method. • Step 1: • SVD Updating Documents • Let D = [ Ak / Dp ] where Ak is original term-document matrix and Dp is new document vector to be added. • SVD(D) = UD x D x VTD
SVD Update (contd) • SVD of D can also be computed as UD = Uk x UUD and VD = Vk 0 x VUD 0 Ip where UD = [ k / UTkx Dp ].
SVD Update (contd) • Step 2: • SVD Updating Terms • Let T = [ Tk / Tq ] where Ak is original term-document matrix and Tq is new term vector to be added. • SVD(T) = UTxTx VTT
SVD Update (contd) • SVD(T) can also be computed as UT = Uk 0 x UUT 0 Iq and VT = Vk x VUT where UT = [ k /Tqx Vk ]
SVD Update (contd) • Step 3: • Correction of term weights • Let W = Ak + Xix YiT where Xi is a m x i matrix comprised of rows of zeros or rows of the i-th order identity matrix, Ii. Yi is a n x i matrix representing the differences between old and new weights for each of the i terms. • SVD(W) = UW xWx VTW
SVD Update (contd) • SVD(w) can also be computed as UW = Uk x UQ and VW = Vk x VQ where Q = [k + UTk x Xi x YiT x Vk ].
SVD Update (contd) • Using new document vector ( Vw ) and new term vector ( Uw ) • Rebuild concept nodes and term nodes • Reconstruct bipartite graph • Update GUI
Time Complexity • Time complexities for different update methods in the descending order • Recomposing SVD(default build) • SVD Update • Fold-In
Relations among concepts • Significance of V : • Rows of V represent documents • Columns of V represent concepts Concept vector (V)
Types of relations • Sub concepts • Sub-super concepts • Disjoint concepts • Overlapping concepts • Parallel concepts • Parallel concepts • Antagonistic concepts
Sub concepts If % of overlap is < threshold value – Disjoint > 100-threshold value – Sub-super other wise - overlapping