1 / 21

CS511: Entity Search and Visualization

CS511: Entity Search and Visualization. Nikolay Kojuharov Lauren Massa-Lochridge Hoa Nguyen Quoc Le. Roadmap. Motivation Problem Definition Architecture Implementation Demo Conclusion. Background. DBLife:

erling
Download Presentation

CS511: Entity Search and Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS511: Entity Search and Visualization Nikolay Kojuharov Lauren Massa-Lochridge Hoa Nguyen Quoc Le

  2. Roadmap • Motivation • Problem Definition • Architecture • Implementation • Demo • Conclusion

  3. Background • DBLife: • Back-end: Given a number of documents (researchers home page, conference website, mailing list), identify entities and relationships and monitor them over time. • Front-end: Search for entities; show relationships; display useful findings • DBLife needs improvements: • Automatic source discovery process • Better mention extraction • Inferring more metadata and relationships • Front-end improvement (search functionality, display).

  4. Benefits • Front-end improvement: • Testing, evaluation. • Immediate use. • User feedback. • Goal: • Search: Given an ER graph as logical view, search for entities using keyword search. • Making navigation “easier” and clearer by novel method of displaying entities and relationships.

  5. Search Problems • Cannot find partial matches. E.g. no results for “AnHai”. • Cannot search for entities in context. E.g. no results for “data integration” • Cannot use more advanced queries. E.g. boolean, regular expressions, proximity, approximate match, etc. • Returns multiple results for the same mention • …

  6. Example: DBLife Search

  7. Search Solution • Integrated entity and text search • Contextual entity search over ER model • Advanced query syntax • Example: search for “name:Fu data integration” Suggested authors: 100% LiMin Fu 59% Jack Fu 59% Guangrui Fu Suggested web pages: 85% http://anhai.cs.uiuc.edu\home\projects\aida.html 22% www.cs.wisc.edu\~chenl\Stream_Data_Processing.html 21% http://anhai.cs.uiuc.edu\home\thesis.html

  8. Problems cont’ • User Interface • Relationships are unclear. • Relationship types and weights are not shown. • Browsing is inconvenient • Solution: ER-Visualization • Display entities and relationships in 2-D. • Easy to navigate, keep focus.

  9. Example: User Interface

  10. Problem Definition • Entity focused Search: Integrated ranked keyword search for entity and web pages. • ER Visualization: 2-D Graph Style Interface with Entity as Node and Relationship as Edge.

  11. Architecture Indexing Document Index Entities, Relationship, External Sources Search GUI Visualization

  12. Implementation • XmlDBLPReader, XmlEntityReader: DBLP, DBlife xml  ER Model • EntityGraph: ER Model  Graph • AuthorIndexer, TextIndexer: ER Model  index files • AuthorSearch, TextSearch: index files  search results • SearchGUI: UI, graph layout, etc.

  13. Indexing • Two inverted file indexes – for documents and for entities • Document index – title*, URL, contents, etc. • Author index – name*, publications, co-authors * Highest importance

  14. Query Parsing & Analysis • Users already familiar with keyword search are not required to learn special syntax or commands. • Queries analyzed using StandardAnalyzer() • Lucene offers a variety of different query parsers AuthorSearch demo uses query across all fields in an index. • Query query = MultiFieldQueryParser.parse(queryString, AuthorIndexer.FIELDS, analyzer);

  15. Query Syntax • Boolean operators over terms or phrases: • “data integration” OR “schemas” • Specify field data • title:homepage AND “semantic web” • Wild card search: dat? • Fuzzy search: roam~0.8 • Proximity search: “object relational”~10 • Range search: date:[anhai TO john] • Term Boosting: professor^4 AND mining • Grouping: (anhai OR “an hai”) AND Doan.

  16. Visualization • Built in Java using Java2D graphics library. • Graph data structure: Node, Edge, TreeNode, Graph, Tree. • Loading & Saving graph data: GraphReader, GraphWriter interfaces, XMLGraphReader etc. • Filtering: Select parts of the graph to display. • ItemRegistry: mapping between VisualItems and original graph data. • Display: render VisualItem to screen + graphic transforms + animation & activity listener.

  17. Tools • The system is implemented in Java with Java Swing (GUI), SAXBuilder (XML), Tomcat (Web Server) and: • Lucene: is a high-performance, full-featured text search engine library written entirely in Java suitable for nearly any application especially cross-platform. • Prefuse: is a user interface toolkit for building highly interactive visualizations of structured and unstructured entity-relationship data.

  18. Project Demo • Evaluation: not formally done at this point. Let users judge its coolness themselves. • Examples: • “anhai” • “a* schemas” • “name:schemas” • “name:a* co-author:a*”

  19. Limitations • Explanation not quite intuitive • Lack a specific language to ask query. • Multiple relationships between two entities. • Formal evaluation of the system. • Integrate system in Web Services.

  20. Future Work • Experiment and compare hits accuracy and quality for alternate indexing, query analysis and query parsing methods: • Span-Term Query for 'mentions‘. • Query filtering : sequences of one or more QueryParser-based filters.

More Related