1 / 52

"Stories" in data and the roles of crowdsourcing – views of a Web miner

"Stories" in data and the roles of crowdsourcing – views of a Web miner. Bettina Berendt Dept . of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt / Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Dr ă gan. A story. Story structure.

bo-hartman
Download Presentation

"Stories" in data and the roles of crowdsourcing – views of a Web miner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. "Stories" in data and the roles of crowdsourcing – views of a Web miner Bettina Berendt Dept. of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ Thanks to: Ilija Subašić, Markus Luczak-Rösch, and Laura Drăgan

  2. A story

  3. Story structure

  4. One case of provenance

  5. Another case of provenance

  6. Formalizing provenance: a high-level view

  7. Challenge 1:Many voices

  8. Challenge 2

  9. Challenge 3:subjectivity

  10. The STORIES Tool

  11. Uncover (1)

  12. Uncover (2)

  13. Scan (over time)

  14. Uncover

  15. Zoom

  16. Search: formulating ad-hoc concepts

  17. Track (2)

  18. Textual summarization

  19. Challenge 4

  20. Crowd-sourcing the truth? Wikipedia (here: the Gaza Flotilla Raid)

  21. Challenge 5

  22. Challenge 5: vagueness - reprise Challenge 4: More specifically

  23. The “live crowdsourcing activity“ • Goal: crowdsource data citation metadata • Motivation 1 / possible extension • Motivation 2 / case study

  24. http://prov.usewod.org

  25. The data Datasets Publications [People]

  26. The datasets Preloaded: USEWOD datasets DBpedia SWDF Bio2RDF LinkedGeoData BioPortal OpenBioMed

  27. The datasets Preloaded: Generic (!) Versions/releases References

  28. The datasets Add new: Name* Version Release date URL

  29. The publications Preloaded: USEWOD workshop papers

  30. The publications Add new: Title* Authors Year URL

  31. The data

  32. The task Capture which dataset is used in which publication and how

  33. Data representation Datasets Publications Connections between them schema.org prov:Entity ?

  34. Data representation Datasets Publications Connections between them schema.org prov:Entity prov:Derivation

  35. The task Capture which dataset is used in which publication and how

  36. Connections Publication – Publication Publication – Dataset Dataset – Publication Dataset - Dataset

  37. Connections Publication – Publication citation

  38. Connections Publication – Dataset Dataset – Publication mentions describes evaluates analyses compares

  39. Connections Dataset – Dataset extends includes overlaps transformation of generalisation of

  40. Data representation Subclasses of prov:Derivation (inverse of Publication-DS)

  41. The task Capture which dataset is used in which publication and how

  42. Data representation

  43. Data representation

  44. Bundles

  45. Live crowdsourcing activity 2014: outcomes

  46. Lessons learned Data is dirty even coming from experts Focus on the task make everything else simpler minimise data input

  47. Questionnaire results Inconclusive results on the suitability of the vocabulary, But interesting answers to: „“what questions would this information answer for you?“: “What are popular datasets?” “Which datasets are facilitators for research on X?” “What publications are related through a dataset (but don't mention each other)?”

  48. Outlook (1): Dimensions of crowdsourcing What is outsourced Who is the crowd How is the task designed How are the results validated How can the process be optimised [Quinn & Bederson, 2012]

More Related