1 / 17

Visual analytics for discovering entity relationship on text data

Visual analytics for discovering entity relationship on text data. Hanbo Dai Ee-Peng Lim Hady Wirawan Lauw HweeHwa Pang. Analysis scenario. A homeland security analyst Finds out relationships between two terrorists on complex, large information sources Needs user judgments. Mas Selamat.

blanca
Download Presentation

Visual analytics for discovering entity relationship on text data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual analytics for discovering entity relationship on text data Hanbo Dai Ee-Peng Lim Hady Wirawan Lauw HweeHwa Pang

  2. Analysis scenario • A homeland security analyst • Finds out relationships between two terrorists on complex, large information sources • Needs user judgments Mas Selamat Jemaah Islamiah Osama Bin Laden Al-Qaeda Born in Central Java Was not directly connected Justinus Andjarwirawan Abu Latif

  3. Visual analytics system architecture

  4. Two TUBE (Text-Cube) instances for entity relationship discovery Document Evidencee.g. {d1, d2,…} Mask value (0/1)nodes Measures e.g. Path_strength e0 e1 e2 e3 e4 T1=<S1, B1, M1, D> e1 e0 e2 e3 e4 e0 Document Evidencee.g. {d3, d4,…} Mask value (0/1)edges Measures e.g. strength e1 T2 =<S2, B2, M2, D> e2 e3 e4

  5. ER-Explorer interface

  6. Visual analytical operations • Insert • Cluster • Delete

  7. Our tool helps to discover new relationships

  8. Conclusion • Interactive visual method to discover entity and relationships embedded in text data • ER-Explorer equipped with TUBE model and operations • Our tool assisted analysts in finding relationships between two terrorists

  9. Back up slides

  10. Case study • Dataset: The hijacking of IC814 • Entities of type Person, Organization, Event, GPE are extracted • Co-occurrence Relationships are identified on sentence level. • Each sentence is considered as a document.

  11. Text-Cube Model Represents Entities and Relationships • An entity is either a named entity or a conceptual entity. • A n-dimensional TUBE is a tuple T= <S, B, M, D> • S: Schema = {s1, s2,…, sn} • Si denotes the list of entities of dimension i • B: Mask • 0 or 1 value • M: Measure= {m1, m2,…, m|M|} • Each measure mi is associated with a measure function mfi • D: Document Collection • A TUBE T has | s1|×|s2|×…×| sn | cells • A cell c • Has document evidence denoted as Fd(c) • Is present if B(c)=1 , or hidden if B(c)=0 • Has measure value denoted as c.mj , computed by mfj(c) • Represent the co-occurrence relationship, if Fd(c) is not empty

  12. Measure formulas

  13. Two TUBE Instances for entity relationship discovery • A discovery task is to find interesting paths between two entities source (s) and target (t) • A path represents a chain of relationships • 1-Dimension TUBE instance: T1=<S1, B1, M1, D> • S1 initiated as all named entities • M1= {path_strength} • The strength of shortest path through an entity between s and t • 2-Dimension TUBE instance: T2=<S2, B2, M2, D> • S2 initiated as all named entities on both dimensions • M2= {name_sim, strength, dom_entity} • name_sim • Computed by edit distance • strength • Computed by Jaccard Coefficent or Dice Coefficent • dom_entity • Whenever ei appears ej is always there, ej dominate ei

  14. Related Work • Social network visualization • assume entities and relations • have been identified and verified. • can be studied without supporting document • Use only measures of graph structure, such as degree, centrality. • Automatic path/subgraph finding algorithms • Users have little control over the relations and entities involved • Do not consider semantically identical entities.

  15. Formal definition of entity • Entity e is defined as a named object or a set of other entities.

  16. Tube operations • Insert • Add an entity to a dimension • Remove • Remove an existing entity from a dimension • SelectCell • Assign 0 or 1 to a entry (a cell in T) in Mask • Cluster • Add a new conceptual entity representing a subset of entities to a dimension

  17. Visual Analytics Operations • Insert an entity • SelectCell in T1 and T2 • Reveals all relationships this entity has with all entities in the network • Delete • Delete a named entity • SelectCell in T1 • Delete a conceptual entity • Remove in T1 and T2 • Delete a relationship (a cell) • SelectCell in T2 • Cluseter • Cluster in in T1 and T2

More Related