600 likes | 606 Views
This system performs semantic analysis on text documents, generates an ontology, and represents it graphically. It utilizes statistical methods, user feedback, and automation.
E N D
Document Ontology Extractor(DOE)Research Team:Govind R Maddi, Jun Zhao Chakravarthi S VelvadapuFaculty:Dr.Sadanand SrivastavaDr.James Gil De LamadridJoint Project ofUniversity of Maryland, Baltimore CountyBowie State UniversitySponsored byDepartment Of Defense
OVERVIEW • The system takes text documents as its input • Performs semantic analysis on these documents • Generates useful ontology • Represents it graphically
GOAL To build an Ontology utilizing • Statistical methods • A small amount of user feedback • Automation
Architecture of DOE Text Document Pre-processing Normalization Latent Semantic Indexing (SVD) Document Ontology Graph Construction GUI
INPUT Text documents
Pre-processing • Read-in text file • Extract meaningful terms • Count their frequencies
Normalization • Calculate weight of each term using • W i,k = frequency i,knk • Σfrequency j,k • Calculate weight of each term using W i,k = frequency i,knk Σfrequency j,k j=1
Normalization(contd) • Calculate normalized weight using W i,k w(i,k) nk sqrt(Σ w2(j,k)) j=1
Latent Semantic Indexing(LSI) • Statistical method representing documents by statistically independent concepts • Based on Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD) • A technique that decomposes a given matrix into three components – U, S and V.
SVD (contd) • m x n term-document matrix A, of rank r, can be expressed as the product: A = U * S * VT • U is m x r term matrix • S is r x r diagonal matrix • V is r x n document matrix
SVD (contd) • Diagonal of S contains singular values of A in the descending order.
SVD (contd) • A is formed from LSI as follows: A = US * SS * VsT US - derived from U removing all but the s columns SS - derived from S removing all but the largest s singular values VsT - derived from VT removing all but the s corresponding rows
SVD (contd) US SS VsT A m x n U m x r S r x r VT r x n
Document Ontology • Build Concept Nodes and Term Nodes using the document matrix (V) and term matrix (U).
Building concept nodes from term matrix(U) • A concept node contains information about • Concept name • Terms that belong to that concept • Respective weights of terms in that concept
Building concept nodes from term matrix(U) (contd) • Naming convention: • Generates automatically • A hyphenated string of the five most high frequent terms in that concept
Building concept nodes from term matrix(U) (contd) • A concept node represents a document • Each column in U corresponds to a concept node
Building term nodes from term matrix(U) • A term node contains information about • Term name • Concepts to which it belongs • Its respective weight in each concept
Building term nodes from term matrix(U) (contd) • Naming convention: • Generates automatically • Simply named using the term name
Building term nodes from term matrix(U) (contd) • A term node represents a term • Each row in U corresponds to a term node
Graph Construction • A bipartite graph is constructed with concept nodes and term nodes • A concept node is connected to all term nodes that belong to it. • A term node is connected to all concept nodes to which it belongs.
Graph Construction (contd) Term 1 Concept 1 Term 2 Term 3 Term 4 Concept 2 Term 5
Graphical User Interface (GUI)
GUI (contd) • GUI consists of • Concepts list • Terms list • Display for bipartite graph • Display for list of files in ontology
GUI • To view terms related to a concept, user selects that concept from concepts list • To view concepts related to a term, user selects that term from terms list
GUI (contd) • To view only terms related to a specific concept: • Select that concept from concepts list • Select checkbox “Display Selected Ones Only” • Result: • GUI displays ONLY relations between selected terms and concepts
GUI (contd) • To view only concepts related to a term: • Select that term from terms list • Select checkbox “Display Selected Ones Only” • Result: • GUI displays ONLY relations between selected terms and concepts
GUI (contd) • To highlight relationship between a term and a concept: • Select that term or concept from terms or concepts list • Click on line connecting term and concept
New Open Save saveAs Close Exit GUI – File Operations
GUI – Ontology Updates • Add • Delete • ChangeSVDThreshold • changeConcThreshold • foldInDoc • defaultBuild
GUI – Ontology Updates • Add: • Click on Add • Select file to be added from file chooser popup menu • Choose whether to build now or not • If yes document is added and displayed • If no GUI remains unchanged
GUI – Ontology Updates • Delete: • Click on Delete • Select file to be deleted from file chooser popup menu • Choose whether to build now or not • If yes document is deleted and displayed • If no GUI remains unchanged
GUI – Ontology Updates • changeSVDThreshold: • SVDThreshold controls the largest s singular values that will be selected from S. • Default value is 70% i.e. only the singular values higher than 70% of the highest singular value are selected • User can change this default value
GUI – Ontology Updates • changeConcThreshold: • Controls the number of terms related to a concept based upon term weight • Default value is 70% i.e. only the terms with weights higher than 70% of the highest term weight are selected • User can change this default value