230 likes | 419 Views
A presentation to the NSF. DIGGING INTO DATA. Cascades, Islands, or Streams? Time, Topic and Scholarly Activities in Humanities and Social Science Research. Who is. DIGGING INTO DATA. THELWALL. HOLMBERG. LARIVIÈRE. SUGIMOTO. DING. MILOJEVIĆ. What. DIGGING INTO DATA. What.
E N D
A presentation to the NSF DIGGING INTO DATA Cascades, Islands, or Streams? Time, Topic and Scholarly Activities in Humanities and Social Science Research
Who is DIGGING INTO DATA THELWALL HOLMBERG LARIVIÈRE SUGIMOTO DING MILOJEVIĆ
What DIGGING INTO DATA
What DIGGING INTO DATA Time: 1743-2011 Dissertations: 2,307,555 Subjects: 166 Schools: 1,490 Countries: 66
What DIGGING INTO DATA Time: 1900-2011 Medicine Articles: 14,698,810 Medicine References: 380,058,817 Social Science Articles: 4,228,702 Social Science References: 77,908,552 Arts & Humanities Articles: 3,151,986 Arts & Humanities References: 26,180,296 Natural Science Articles: 14,853,029 Natural Science References: 335,144,498
What DIGGING INTO DATA Time: 2007-2012 Articles: 744,584 Broad Subject areas: 7 Matching ISI records: ~50%
What DIGGING INTO DATA Time: 2010-current Tweets: 100,000 per month Subjects: 11 Generalist journals: 4 Scientists and science journalists: 350
What DIGGING INTO DATA Time: 2006-2012 Videos: 1,202 Views on TED: 620,406,446 Views on YouTube: 111,681,275 Comments on YouTube: 414,311
Why are we DIGGING INTO DATA Integrate several datasets representing a broad range of scholarly activities Use methodological and data triangulation to explore the lifecycle of topics within and across a range of scholarly activities Develop transparent tools and techniques to enable future predictive analyses
Show me the DIGGING INTO DATA
Show me the DIGGING INTO DATA
Show me the DIGGING INTO DATA
Show me the DIGGING INTO DATA H=Hedges: lowered certainty (“perhaps”) B=Boosters: heightened certainty (“absolutely”) SM=Self-mentions: self-references (“the author”) AM=Attitude markers: author-text positions (“admitedly”) EM=Engagement markers: reader positions (“should”)
Show me the DIGGING INTO DATA
Show me the DIGGING INTO DATA
Keep on DIGGING INTO DATA
Keep on DIGGING INTO DATA
Comments DIGGING INTO DATA
Analyzing sentiment DIGGING INTO DATA • We are developing sentiment analysis software SentiStrength for the texts in the project • The program will classify the sentiment of texts based upon lexicons of words – e.g., good, bad – plus special rules for negation, booster words (e.g., very) etc. • The lexicon will be customised for different genres – e.g., flawed, incomplete for academic texts, dull, inspiring for videos
Lead-lag analysis DIGGING INTO DATA
After DIGGING INTO DATA Scott Weingart
Towards a new model DIGGING INTO DATA Formal vs. Informal Published vs. Unpublished Genres Book Review Curated DB Multimedia Article Slideshow Blog Conf. paper Report Dissemination Email Production Producer Draft Tweet Prosumer Consumer
Questions about DIGGING INTO DATA Cassidy R. Sugimoto (PI) Assistant Professor School of Library and Information Science Indiana University Bloomington sugimoto@indiana.edu