1 / 12

Pixel Visualization of keyword search results in large email databases.

Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013. Research Overview. The problem: Both criminal and Civil investigations are being over with with information in the cyber age. New techniques are needed to handle the overload

kera
Download Presentation

Pixel Visualization of keyword search results in large email databases.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013

  2. Research Overview • The problem: Both criminal and Civil investigations are being over with with information in the cyber age. • New techniques are needed to handle the overload • Visualization of data can provide solutions

  3. The Investigative Problem • Datasets are rapidly growing in size for all types of investigation • National Security • Criminal • Civil • The Datasets • Most investigations focus on communications • Emails are the largest portion of these communication • Chats, IM, Phone logs and other social communication channels are also becoming important.

  4. Related Research • Jigsaw • Open Source Investigative tool kit being developed at Georgia Tech. • Focus on entity relationships and time relationships • Views are traditional

  5. Related Research continued • Daniel Keim • Pixel oriented display visualization • Large amounts of data can be viewed at once • Alternative display methodologies • Personal mailbox analysis

  6. Related Research continued • Other Visual Email analysis Techniques • EmailTime SFU Vancouver • Plots email relationships overtime by sender or by threads • Run on Enron dataset • Not sure why • Thread arcs - IBM • Traces a single thread using arcs to show trends • Interactive, highlights individuals, can highlight attributes • Used to analyze trends • Graphs and maps • Show relationships but not very useful for Ultra large datasets

  7. Related Research continued • Chris North - Use of Large Displays • Not specific to email but useful thoughts • W. Bradford Paley - Textarc • Relationships of words in a concordance • Images behind my proposal

  8. My proposed research • Pixel Visualization of Large Email Datasets • Search by Keywords • Multiple displays of returned email sets • Entity - Entity • Entity - Keyword • Keyword - Time • Entity - Time • Interaction to Refine Search • Add / Remove Keywords • Add / Remove Entities • Limit time frame • Interaction to Drill Down to actual messages • By Subject • By Message Content

  9. Key issues to be solved for investigative visualization of emails • Relative weights of emails must be calculated against some standard • Visualizations should minimize the distance of related emails between points to show important clusters around entities, keywords and time.

  10. My proposal - “Document Galaxy” • Basic idea is to treat documents as stars in a circular galaxy • Place relevant data points, such as entities, around outside with associated weights. • Place documents inside galaxy based on relative “attraction” to outside points. • Possible to have multiple outside rings to add additional attributes to calculations • User interacts with outside rings to add / remove / move attraction points. • User can explore contents of inner points and clusters to derive information about document content. • Colors of documents can used to show additional attributes

  11. Might look something like this

  12. What use is this? • Might make a good lead in tool to add to jigsaw as a lead in to reduce size of document set to be explored • Separate tool for exploring e-discovery datasets

More Related