120 likes | 224 Views
Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013. Research Overview. The problem: Both criminal and Civil investigations are being over with with information in the cyber age. New techniques are needed to handle the overload
E N D
Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013
Research Overview • The problem: Both criminal and Civil investigations are being over with with information in the cyber age. • New techniques are needed to handle the overload • Visualization of data can provide solutions
The Investigative Problem • Datasets are rapidly growing in size for all types of investigation • National Security • Criminal • Civil • The Datasets • Most investigations focus on communications • Emails are the largest portion of these communication • Chats, IM, Phone logs and other social communication channels are also becoming important.
Related Research • Jigsaw • Open Source Investigative tool kit being developed at Georgia Tech. • Focus on entity relationships and time relationships • Views are traditional
Related Research continued • Daniel Keim • Pixel oriented display visualization • Large amounts of data can be viewed at once • Alternative display methodologies • Personal mailbox analysis
Related Research continued • Other Visual Email analysis Techniques • EmailTime SFU Vancouver • Plots email relationships overtime by sender or by threads • Run on Enron dataset • Not sure why • Thread arcs - IBM • Traces a single thread using arcs to show trends • Interactive, highlights individuals, can highlight attributes • Used to analyze trends • Graphs and maps • Show relationships but not very useful for Ultra large datasets
Related Research continued • Chris North - Use of Large Displays • Not specific to email but useful thoughts • W. Bradford Paley - Textarc • Relationships of words in a concordance • Images behind my proposal
My proposed research • Pixel Visualization of Large Email Datasets • Search by Keywords • Multiple displays of returned email sets • Entity - Entity • Entity - Keyword • Keyword - Time • Entity - Time • Interaction to Refine Search • Add / Remove Keywords • Add / Remove Entities • Limit time frame • Interaction to Drill Down to actual messages • By Subject • By Message Content
Key issues to be solved for investigative visualization of emails • Relative weights of emails must be calculated against some standard • Visualizations should minimize the distance of related emails between points to show important clusters around entities, keywords and time.
My proposal - “Document Galaxy” • Basic idea is to treat documents as stars in a circular galaxy • Place relevant data points, such as entities, around outside with associated weights. • Place documents inside galaxy based on relative “attraction” to outside points. • Possible to have multiple outside rings to add additional attributes to calculations • User interacts with outside rings to add / remove / move attraction points. • User can explore contents of inner points and clusters to derive information about document content. • Colors of documents can used to show additional attributes
What use is this? • Might make a good lead in tool to add to jigsaw as a lead in to reduce size of document set to be explored • Separate tool for exploring e-discovery datasets