1 / 23

Presented By: Enas Kanan Advisor : Dr. Amer Badarneh

Big Data , The Crowd and Me. `. Presented By: Enas Kanan Advisor : Dr. Amer Badarneh. Outline. Introduction Twitter Analysis Training Classifiers The judges Big Data boundary objects Conclusion. Introduction. Big data refers to the rising flood of

lidia
Download Presentation

Presented By: Enas Kanan Advisor : Dr. Amer Badarneh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data , The Crowd and Me ` Presented By: EnasKanan Advisor : Dr. AmerBadarneh

  2. Outline • Introduction • Twitter • Analysis • Training Classifiers • The judges • Big Data boundary objects • Conclusion

  3. Introduction Big data refers to the rising flood of digital data from many sources, including the Web, biological and industrial sensors, video, e-mail and social network communications

  4. Twitter a microblogging service that enables users to post messages (“tweets”) of up to 140 Characters ,supports a variety of Communicative practices

  5. Analysis working on an analysis of portions of the Twitter feed. how people identify tweets that are interesting enough?

  6. identify tweets

  7. Training Classifiers : #1 Sample the Twitter data grabbing a small chunk of the public English language Twitter feed (a labeled training set). The remaining large sample can then act as a test set for the trained classifiers.

  8. Training Classifiers : #2 Label the tweets involves picking an existing labeling scheme or designing a new one, and developing a way to present the tweet to a worker and collect the label and any other information deemed necessary to assess the label’s potential veracity (for example, the worker’s level of Twitter experience)

  9. Training Classifiers : #3 Analyze the data ensure data quality by looking at the labels the workers produce

  10. Training Classifiers : #4 Add a secondary data source Bring a secondary data source into the picture to help interpret the first one

  11. Training Classifiers : #5 Reflect on the results evaluate up the results in a way that is convincing to the research community

  12. The judges What if they weren’t? A #FF (follow Friday) wasn’t recognized for what it was (a standard Twitter convention) the ability to quickly scan a Twitter feed – that was not uniformly held by our crowd workforce.

  13. The tweets Were we giving the judges sufficient exogenous information to judge the tweets? presented the tweets as they appear in most Twitter clients (a profile picture and name, plus the tweet itself). Perhaps it would help to give the judges the number of followers in addition to a profile name and photo. After all, these weren’t the people they normally followed.

  14. The Label There’s just so much data that we can all afford to throw quite a bit of it away. We can throw away data until we find what we’re looking for.

  15. Big Data boundary objects Human computation Statistics Data visualization and manipulation Identifying ancillary datasets Privacy

  16. Human computation Learning how to use human computation in its many variations . important to dealing with the numerous tasks that are required to effectively use Big Data.

  17. Statistics a data scientist needs to be able to know how to ask questions of the data to see what the spike means and whether it represents a meaningful trend or an anomaly in the data.

  18. Data visualization most data visualization is not interactive in a useful way. (i.e. although you can manipulate the presentation, you cannot change the underlying data. – for example, to compare different algorithms for cleaning the data)

  19. Identifying ancillary datasets -to interpret them, know which ones to trust. -to understand the ways in which they compromise privacy. - to form partnerships that will give you access to them may seem like an atheoretical skill

  20. Privacy Big Data reminds us that privacy problems are far from solved, and that there’s an enormous gap between theory and practice. 26 Some of these problems are explored in boyd and Crawford’s 2011 Big data Provocations paper under the rubric of ethics.

  21. Conclusion If you want to read Big Data, you have to read the Big World

  22. Thank You

More Related