1 / 60

OVERVIEW

Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid Joint Project of University of Maryland, Baltimore County Bowie State University Sponsored by Department Of Defense. OVERVIEW.

denis
Download Presentation

OVERVIEW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document Ontology Extractor(DOE)Research Team:Govind R Maddi, Jun Zhao Chakravarthi S VelvadapuFaculty:Dr.Sadanand SrivastavaDr.James Gil De LamadridJoint Project ofUniversity of Maryland, Baltimore CountyBowie State UniversitySponsored byDepartment Of Defense

  2. OVERVIEW • The system takes text documents as its input • Performs semantic analysis on these documents • Generates useful ontology • Represents it graphically

  3. GOAL To build an Ontology utilizing • Statistical methods • A small amount of user feedback • Automation

  4. Architecture of DOE Text Document Pre-processing Normalization Latent Semantic Indexing (SVD) Document Ontology Graph Construction GUI

  5. INPUT Text documents

  6. Pre-processing • Read-in text file • Extract meaningful terms • Count their frequencies

  7. Normalization • Calculate weight of each term using • W i,k = frequency i,knk • Σfrequency j,k • Calculate weight of each term using W i,k = frequency i,knk Σfrequency j,k j=1

  8. Normalization(contd) • Calculate normalized weight using W i,k w(i,k) nk sqrt(Σ w2(j,k)) j=1

  9. Latent Semantic Indexing(LSI) • Statistical method representing documents by statistically independent concepts • Based on Singular Value Decomposition (SVD)

  10. Singular Value Decomposition (SVD) • A technique that decomposes a given matrix into three components – U, S and V.

  11. SVD (contd) • m x n term-document matrix A, of rank r, can be expressed as the product: A = U * S * VT • U is m x r term matrix • S is r x r diagonal matrix • V is r x n document matrix

  12. SVD (contd) • Diagonal of S contains singular values of A in the descending order.

  13. SVD (contd) • A is formed from LSI as follows: A = US * SS * VsT US - derived from U removing all but the s columns SS - derived from S removing all but the largest s singular values VsT - derived from VT removing all but the s corresponding rows

  14. SVD (contd) US SS VsT A m x n U m x r S r x r VT r x n

  15. Document Ontology • Build Concept Nodes and Term Nodes using the document matrix (V) and term matrix (U).

  16. Building concept nodes from term matrix(U) • A concept node contains information about • Concept name • Terms that belong to that concept • Respective weights of terms in that concept

  17. Building concept nodes from term matrix(U) (contd) • Naming convention: • Generates automatically • A hyphenated string of the five most high frequent terms in that concept

  18. Building concept nodes from term matrix(U) (contd) • A concept node represents a document • Each column in U corresponds to a concept node

  19. Building term nodes from term matrix(U) • A term node contains information about • Term name • Concepts to which it belongs • Its respective weight in each concept

  20. Building term nodes from term matrix(U) (contd) • Naming convention: • Generates automatically • Simply named using the term name

  21. Building term nodes from term matrix(U) (contd) • A term node represents a term • Each row in U corresponds to a term node

  22. Graph Construction • A bipartite graph is constructed with concept nodes and term nodes • A concept node is connected to all term nodes that belong to it. • A term node is connected to all concept nodes to which it belongs.

  23. Graph Construction (contd) Term 1 Concept 1 Term 2 Term 3 Term 4 Concept 2 Term 5

  24. Graphical User Interface (GUI)

  25. GUI (contd) • GUI consists of • Concepts list • Terms list • Display for bipartite graph • Display for list of files in ontology

  26. GUI • To view terms related to a concept, user selects that concept from concepts list • To view concepts related to a term, user selects that term from terms list

  27. GUI (contd) • To view only terms related to a specific concept: • Select that concept from concepts list • Select checkbox “Display Selected Ones Only” • Result: • GUI displays ONLY relations between selected terms and concepts

  28. GUI (contd) • To view only concepts related to a term: • Select that term from terms list • Select checkbox “Display Selected Ones Only” • Result: • GUI displays ONLY relations between selected terms and concepts

  29. GUI (contd) • To highlight relationship between a term and a concept: • Select that term or concept from terms or concepts list • Click on line connecting term and concept

  30. New Open Save saveAs Close Exit GUI – File Operations

  31. GUI – Ontology Updates • Add • Delete • ChangeSVDThreshold • changeConcThreshold • foldInDoc • defaultBuild

  32. GUI – Ontology Updates • Add: • Click on Add • Select file to be added from file chooser popup menu • Choose whether to build now or not • If yes document is added and displayed • If no GUI remains unchanged

  33. GUI – Ontology Updates • Delete: • Click on Delete • Select file to be deleted from file chooser popup menu • Choose whether to build now or not • If yes document is deleted and displayed • If no GUI remains unchanged

  34. GUI – Ontology Updates • changeSVDThreshold: • SVDThreshold controls the largest s singular values that will be selected from S. • Default value is 70% i.e. only the singular values higher than 70% of the highest singular value are selected • User can change this default value

  35. GUI – Ontology Updates • changeConcThreshold: • Controls the number of terms related to a concept based upon term weight • Default value is 70% i.e. only the terms with weights higher than 70% of the highest term weight are selected • User can change this default value

More Related