Network Analytics meets Text Mining for Social Media Analysis

Network Analytics meets Text Mining for Social Media Analysis Dr. Rosaria Silipo

Social Media DataWaterWaterEverywhere, and not a droptodrink

Social Media DataWaterWaterEverywhere, and not a droptodrink • What companies do with it: • Download and keep • Topic [Shift] Detection (email content routing, detect market interest shift, clinical studies, query non structured DBs, ...) • Sentiment Analysis (marketing, polls, elections, ...) • Connection Analysis (influencers, risk analysis, ...) • ....

Social Media DataWaterWaterEverywhere, and not a droptodrink • The Analysis Tools: • Web Crawlers • Visual Exploration • Topic Detection (Text Mining, NLP, Ontologies) • Sentiment Score (Text Mining, NLP) • Influence Score (Network Analytics) • Find Groups (Predictive Analytics)

Case Study Example: Slashdot Data Post • Basic Numbers: • 24532users • 491 threads with • 15 – 843 responses • 12 – 507 users • 113505 posts • 60main topics • Selected Topic: Politics Comments

Case Study Example: Slashdot • Very rich data sourcesabout customers ! • We want to establish: • How users feel about the discussed topic • Whether it matters how users feel • A more general abstraction of the results Sentiment Analysis Network Analytics Clustering

Remove anonymous users, group by PostID Sentiment Analysis Words Tagging MPQA Corpus Positive words Negative words BoW, Entity Filter, Word Frequency, Attitude Calculation by Document Total Attitude by User User Bins Word cloud for selected users

Slashdot – Text Mining • Most Negative User pNutz

Slashdot – Text Mining • Most Positive User dada21

Slashdot – Sentiment Analysis • 16016 positive users • 7107 negative users • Most positive user: dada21 (2838 positive/1725 negative words) • Most negative user: pNutz(43 positive/109 negative words) • Which Topics have positive users in common ? • Government • People • Law/s • Money • Market • Parties

Network Creation User1 User2 User3 User4 User5 User6

Topic Graphs

Topic Graph: NASA

Topic Graph: Sci-Fi

Hubs & Authorities • Hubs = Followers • Authorities = Leaders Users with hub and authority weights and other features Centrality index to define hub weight and authority weight Filtering anonymous users and creating network

Hubs & Authorities dada21 Carl Bialik from the WSJ Tube Steak Doc Ruby pNutz 99BottlesOfBeerInMyF

KNIME: Bringing it all together Users with hub and authority weights and other features Network Analysis Text Analysis Users bins: positive, negative, neutral

dada21 Carl Bialik from the WSJ Tube Steak Catbeller Doc Ruby WebHosting Guy 99BottlesOfBeerInMyF pNutz

What we have found ... • The positive leaders • Theneutral leaders • Thenegative leaders • The inactive users • What identifies each group? • How do I identify a new user? • How do I handle each user?

Why Clustering? • No a priori knowledge (not even on a subset of users) • Prediction and interpretation capabilities required • k-Means algorithm

Re-sampling the Training Set k = 10

The k-Means Clusters

The k-Means Clusters Superfans Neutral users Fans Negative users

Additional Discoveries • There are only very few real leaders! Authority and hub scores identify active participants rather than leaders. • Superfans can be found in cluster_3 • Negative and (sigh!) active users are collected in cluster_1. • Neutral users are usually inactive (cluster_2, cluster_7, and cluster_8) • Positive users with different degrees of activity are scattered across the remaining clusters.

The operational Workflow Cluster Extraction Pre-processing Assignment of new data

Notes • MPQA Corpus: publicly available Subjectivity Lexicon (http://www.cs.pitt.edu/mpqa/lexicons.html) • User Characterization is Sum -> Mean • NLP: No sentence splitting, no negation identification. • For a more refined syntax-based sentiment analysis -> „External Tool“ node

External Tool Node • The „External Tool“ node executes anyexternal program from command line • Writes input data to an input file • Calls Tool to run on input file and command line options and to write results to output file • Reads output file and presents data at output port

Alternative Sentiment Analysis • Free non-interactive Command Line running Tools for Sentiment Analysis not found • SentiStrength v2.2 (still interactive) External Tool and Generic Web Service Client

Web Crawling Workflow Community Web Crawler Node XML Parsing Nodes

Next Steps • Integrate topic information • Integrate user demographic and behavioural information • Discover [time series] patterns for early detection of negative users and superfans • Try other techniques, maybe even on manually segmented data, to discover new user segments

Where do I find more? • Whitepaper: rosariasilipo@yahoo.com • Complete Workflows + Data: www.knime.com • - textmining • - networkmining • - combinedanalysis • (note the above 3 process huge data and require 16G memory) • clustering • Open Source Software: KNIMEwww.knime.com

Next Appointment • User Day US Boston (free) • October 22nd 2013 10:00 -17:00 • Microsoft New England R&D Center (NERD) • One Memorial Drive, Suite 100, Cambridge • http://www.knime.com/user-day-boston-2013

Hands-on Session • 1. Download KNIME from www.knime.com

Hands-on Session • 2. Install Extensions • Help -> Install New Software • Select: • KNIME & Extensions • In KNIME Labs Extensions, select: • KNIME Network Mining • KNIME Textprocessing

Hands-on Session • 3. Get workflows and Slashdot data • Get workflows from USB stick (KNIMEBoston2013.zip) • Text Mining • Network Analytics • Text and Network Mining • Social Media Clustering • Slashdot Raw Data is included in the downloaded workflows • A smaller set of data is available, Slashdot Reduced Data, for lower memory requirements • Both data sets are available from USB Stick

Hands-on Session • 3. Import Workflows

Hands-on Session • Memory Increase in knime.ini • -startup • plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar • --launcher.library • plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.100.v20110502 • -vmargs • -Xmx2G • -XX:MaxPermSize=256m • -server • -Dsun.java2d.d3d=false • -Dosgi.classloader.lock=classname • -XX:+UnlockDiagnosticVMOptions • -XX:+UnsyncloadClass • -Dknime.enable.fastload=true • -Djava.library.path=C:\Users\rosy\Documents\R\win-library\2.15\rJava\jri\x64

Hands-on Session • 5. Improve Workflows: Text Mining Data Preprocessing Data Reading Scoring and Tag Cloud Tagging Words Reading Tag Corpus BoW

Hands-on Session • 6. Improve Workflows: Network Analytics Visualize Network Create Network Object Data Reading and preprocessing Clean up Network

zoomba

nahdude812

Network Analytics meets Text Mining for Social Media Analysis

Network Analytics meets Text Mining for Social Media Analysis

Presentation Transcript

Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences

Text-Mining: analysis of text data

Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences

21 Recipes for Mining Twitter [Social Network Analysis ]

ELECTRONIC DISCOVERY MEETS SOCIAL MEDIA

Data Mining and Text Analytics

Text mining- text analytics- data mining

Social Text Analysis

Social Media Analytics Market

Text Mining Techniques for Patent Analysis

Mother Nature Meets Social Media

T17: Social Media Analytics

Text Mining, Text Analytics and Business Intelligence

Text Analysis Meets Computational Lexicography

Social media analytics

Text Analytics And Text Mining Best of Text and Data

Social Media Analytics Market

Social Media Analytics Market

Social Media Analytics Market

Text-Mining: analysis of text data

Social Network Analysis and Mining

Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences