590 likes | 691 Views
Modeling Temporal Intention in Resource Sharing. Hany M. SalahEldeen & Michael L. Nelson. Old Dominion University. Department of Computer Science Web Science and Digital Libraries Lab. WADL 2013. Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013.
E N D
Modeling Temporal Intention in Resource Sharing Hany M. SalahEldeen & Michael L. Nelson Old Dominion University Department of Computer Science Web Science and Digital Libraries Lab. WADL 2013 Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
All tweets are equal… …but some are more equal than the others Hany SalahEldeen & Michael Nelson 01 Modeling Temporal Intention. WADL2013
Preliminary research questions: How long would these last? And if lost, is there backup somewhere? Is this what the author intended? Hany SalahEldeen & Michael Nelson 01 Modeling Temporal Intention. WADL2013
Historical integrity Since tweets are considered the first draft of history… the historical integrity of the tweets could be compromised. Hany SalahEldeen & Michael Nelson 02 Modeling Temporal Intention. WADL2013
People rely on social media for most updated information Hany SalahEldeen & Michael Nelson 03 Modeling Temporal Intention. WADL2013
The life cycle of a social post Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013
The life cycle of a social post tweets Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013
The life cycle of a social post Links to tweets Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013
The life cycle of a social post Links to tweets What the reader receives Same state the author intended Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013
The life cycle of a social post Links to tweets What the reader receives Same state the author intended The resource has disappeared Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013
The life cycle of a social post Links to tweets What the reader receives Same state the author intended The resource has disappeared The resource has changed Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013
Resource’s possibilities What the reader receives Same state the author intended The resource has disappeared The resource has changed Hany SalahEldeen & Michael Nelson 05 Modeling Temporal Intention. WADL2013
Resource’s possibilities a bigger problem since the reader might not know. What the reader receives Same state the author intended The resource has disappeared The resource has changed Hany SalahEldeen & Michael Nelson 05 Modeling Temporal Intention. WADL2013
We could lose the linked resource Hany SalahEldeen & Michael Nelson 06 Modeling Temporal Intention. WADL2013
Or the resource could change The attack on the embassy was in February 2013 Hany SalahEldeen & Michael Nelson 07 Modeling Temporal Intention. WADL2013
Why do we want to detect the Author’s Temporal Intention? • Match: and convey the intended information. • Notify: • the author that the resource is prone to change. • the reader that the resource has changed. • Preserve: the resource by pushing snapshots into the archive automatically. • Retrieve: the closest archived version to maintain the consistency. Hany SalahEldeen & Michael Nelson 08 Modeling Temporal Intention. WADL2013
Our investigation angles • The state of the archived content • The age of the shared resource • The states of the resource: • Missing from the live web • Changed from what the author intended to share • Detect the author’s intention and collect a dataset • Model this intention • Create a time-based navigation tool to match the predicted intention Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
Estimating web archiving coverage • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: • Getting 4 different datasets from 4 different sources: • Search Engines Indices • Bit.ly • DMOZ • Delicious. • Results: * • Publications: • How much of the web is archived? JCDL '11 * Table Courtesy of Ahmed AlSum JCDL 2011 Hany SalahEldeen & Michael Nelson 09Modeling Temporal Intention. WADL2013
Our investigation angles • The state of the archived content • The age of the shared resource • The states of the resource: • Missing from the live web • Changed from what the author intended to share • Detect the author’s intention and collect a dataset • Model this intention • Create a time-based navigation tool to match the predicted intention Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
The timeline of the resource Hany SalahEldeen & Michael Nelson 10 Modeling Temporal Intention. WADL2013
Timestamps accumulation Hany SalahEldeen & Michael Nelson 11 Modeling Temporal Intention. WADL2013
Our investigation angles • The state of the archived content • The age of the shared resource • The states of the resource: • Missing from the live web • Changed from what the author intended to share • Detect the author’s intention and collect a dataset • Model this intention • Create a time-based navigation tool to match the predicted intention Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
From Twitter, Websites, Books: • The Egyptian revolution. • From Twitter Only: • Stanford’s SNAP dataset: • Iranian elections. • H1N1 virus outbreak. • Michael Jackson’s death. • Obama’s Nobel Peace Prize. • Twitter API: • The Syrian uprising. Six socially significant events Hany SalahEldeen & Michael Nelson 12 Modeling Temporal Intention. WADL2013
Resources missing & archived Hany SalahEldeen & Michael Nelson 13 Modeling Temporal Intention. WADL2013
Revisiting after a year… Hany SalahEldeen & Michael Nelson 14 Modeling Temporal Intention. WADL2013
Measured vs. predicted Hany SalahEldeen & Michael Nelson 15 Modeling Temporal Intention. WADL2013
Interesting phenomenon: reappearance on the live web and disappearance from the archives Hany SalahEldeen & Michael Nelson 16Modeling Temporal Intention. WADL2013
Reappearing and disappearance predictions Hany SalahEldeen & Michael Nelson 17Modeling Temporal Intention. WADL2013
Our investigation angles • The state of the archived content • The age of the shared resource • The states of the resource: • Missing from the live web • Changed from what the author intended to share • Detect the author’s intention and collect a dataset • Model this intention • Create a time-based navigation tool to match the predicted intention Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
Temporal Intention Relevancy Model( TIRM) • Between ttweet and tclick: • The linked resource could have: • Changed • Not changed • The tweet and the linked resource could be: • Still relevant • No longer relevant Hany SalahEldeen & Michael Nelson 18Modeling Temporal Intention. WADL2013
Resource is changed but relevant • The resource changed • But it is still relevant • Intention: need the current version of the resource at any time Hany SalahEldeen & Michael Nelson 19Modeling Temporal Intention. WADL2013
Relevancy and intention mapping Current Hany SalahEldeen & Michael Nelson 20 Modeling Temporal Intention. WADL2013
Resource is changed and not relevant • The resource changed • But it is no longer relevant • Intention: need the past version of the resource at any time Hany SalahEldeen & Michael Nelson 21 Modeling Temporal Intention. WADL2013
Relevancy and intention mapping Current Past Hany SalahEldeen & Michael Nelson 22 Modeling Temporal Intention. WADL2013
Resource is not changed and relevant • The resource is not changed • And it is relevant • Intention: need the past version of the resource at any time Hany SalahEldeen & Michael Nelson 23 Modeling Temporal Intention. WADL2013
Relevancy and intention mapping Current Past Past Hany SalahEldeen & Michael Nelson 24 Modeling Temporal Intention. WADL2013
Resource is not changed and not relevant • The resource is not changed • But it is not relevant • Intention: I am not sure which version of the resource I need Hany SalahEldeen & Michael Nelson 25 Modeling Temporal Intention. WADL2013
Relevancy and intention mapping Current Past Not Sure Past Hany SalahEldeen & Michael Nelson 26 Modeling Temporal Intention. WADL2013
Our investigation angles • The state of the archived content • The age of the shared resource • The states of the resource: • Missing from the live web • Changed from what the author intended to share • Detect the author’s intention and collect a dataset • Model this intention • Create a time-based navigation tool to match the predicted intention Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
Feature extraction • For each tweet we perform: • Link analysis • Social Media Mining • Archival Existence • Sentiment Analysis • Content Similarity • Entity Identification Hany SalahEldeen & Michael Nelson 27 Modeling Temporal Intention. WADL2013
Modeling and classification using Mechanical Turk • To remove confusion we removed the close calls 898 instances remaining Hany SalahEldeen & Michael Nelson 28Modeling Temporal Intention. WADL2013
The trained classifier • From the feature extraction phase we extracted 39 different features to train the classifier. • Using 10-fold cross validation, the Cost Sensitive Classifier Based on Random Forests gave the highest success rate = 90.32% Hany SalahEldeen & Michael Nelson 29Modeling Temporal Intention. WADL2013
Testing the model Hany SalahEldeen & Michael Nelson 30 Modeling Temporal Intention. WADL2013
Our investigation angles • The state of the archived content • The age of the shared resource • The states of the resource: • Missing from the live web • Changed from what the author intended to share • Detect the author’s intention and collect a dataset • Model this intention • Create a time-based navigation tool to match the predicted intention Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
TimeLord Navigator Hany SalahEldeen & Michael Nelson 31Modeling Temporal Intention. WADL2013
Thanks! Hany SalahEldeen Web Science & Digital Libraries Old Dominion University Email: hany@cs.odu.edu @hanysalaheldeen Hany SalahEldeen Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
TimeLord Navigator Demo: www.cnn.com www.bbc.com Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013
Evaluation Hany SalahEldeen & Michael Nelson 13 Modeling Temporal Intention. WADL2013
Actual Vs. Estimated Dates Hany SalahEldeen & Michael Nelson 14 Modeling Temporal Intention. WADL2013