380 likes | 537 Views
SOCIAL MEDIA MINING what is it GOOD for? and when is it good enough?. Nick Buckley SoShall Consulting. a sc Funky Data 25 th September 2012. The Plan. What is Social Media Mining? [SMM] How do Market Researchers tend to think about it? Nuts & Bolts – practical outcomes
E N D
SOCIAL MEDIA MINING what is it GOOD for?andwhen is it good enough? Nick Buckley SoShall Consulting asc Funky Data 25th September 2012
The Plan What is Social Media Mining? [SMM] How do Market Researchers tend to think about it? Nuts & Bolts – practical outcomes Challenges and Constraints [How] Do these make Researchers re-think the ‘place’ of SMM Where will it go from here? BUT: Assumption of a vendor researcher distinction – even if in house No naming or comparing of vendors/applications Difficult to judge where to pitch the basics – too familiar vs.too abstract
Whatexactlyarewetalkingabout? What they say Definition* of social media monitoring: “Social Media Monitoring (SMM) means the identification, observation, and analysis of user-generated social media content for the purpose of market research.” * http://www.social-media-monitoring.org
Newsgroups What are we talking about? Social Media... Blogs/Microblogs News sites Video sites Forums Review sitesProfessional & Consumer PublicCommunities Client sites
What’s in a word? For some time marketing and PR professionals have been monitoringSocial Media to capture headline ‘buzz’ in real time, and to detect sudden changes requiring a response. But collecting and counting this content is only the beginning of a process which can add value via many techniques… including integration with other sources such as market research data. GfK NOP currently prefers “Mining”. User generated content in social media lays down a rich seam of activity, opinion, thought and information… mess, echoes and ‘whimsy’.
Sony brand damage was driven by PlayStation breach (2011) sony buzz this year sony sentiment this year sony buzz in april sony sentiment in april playstation buzz playstation sentiment
Market Researchers believe that SMM can also give clients a window on other dimensions of online conversations SMM provides insights into: Category Dynamics Consumer needs Problems and issues consumers discuss Product usage discussions New product entries & trends in purchase intention Corporate Corporate mentions related to reputation Crises Social issues Brand/Product Brand/sub-brand mentions, brand “buzz” Number of positive vs. negative sentiments for each brand – including customer service Brand content analysis, what’s being said about brand Advertising noticed most and related discussion – launch tracking Source of mentions (specific sites) and the most influential sites Competition All the above for preference & competition
Market Researchers are fitting SMM into different places within method or process As a precursor to traditional Market Research Refining hypotheses for research design Prioritising criteria – identifying new ones Defining or qualifying the competitive set Identifying niche respondents for small-scale studies As a successor to traditional Market Research Tracking the impact of implemented findings Monitoring for events which may create discontinuities in this Low intensity/low detail follow-up As a companion to traditional Market Research Compare and contrast – e.g unconditioned Add granularity to satisfaction drivers Complement reach Interpolate lengthy studies So can SMM research stand alone? Is there a hierarchy, within these hybrid uses, of ‘best fit’. Does the story change if you get longitudinal with a category? To what extent do some of these uses assume that the data can be treated like conventional MR data? In any case – should it be treated and analysed thus?
But inevitably they think about comparison with surveys… You can ‘ask a new question’ without having to issue a new questionnaire* Unconditioned by participant awareness of a research process, often more emotive than considered survey responses Low cost - under certain circumstances Spontaneously generated content - unconstrained by research frame Offers insight into active social media users Potentially global Very immediate Not necessarily representative of the general population Difficult to weight back to general population, as demographic data is sparse Automated sentiment analysis only as good as the algorithms [and these vary greatly] Automated harvesting can capture a lot of ‘noise’ for certain words or brands No guarantee of sufficient data Costs rise when we use supplementary analysis to overcome some of these issues *within certain technical limitations
Different approaches for different client needs For example - Precision Extraction vs ‘Trawl & Filter’ More post processing, applied to data by GfK - to reduce noise and refine sentiment attribution Quantitative - Brand tracking and integration with traditional research Indicative Qual e.g. using trends and volumes to guide focus of analysis Crude mention & mood tracking Exploratory Qual – more complex collection. Manually manageable volumes and ‘tuning’ Accept raw data output from application Lower data volumes from targeted & compound search terms Higher data volumes from simple search terms
The raw material - Results from search terms • SMM applications extract results from wholesale supplies of data, conducting searches defined by “search terms” • can be anything from a simple and distinctive brand or product name, to a complex expression configured to capture discussions about a category or concept • search terms combine words or phrases • via logical instructions such as AND, OR, NOT • by employing functions such as WITHIN to detect words in a certain proximity to each other • with brackets that can dictate sequence in which instructions are applied e.g. “word1” AND ( “word2” OR “word3” ) • 14
Typical SMM application offers a dashboard view of data returned by these search terms – and the facility to export the underlying data
Analyses • Whatever the Search Terms define – here is what can be measured about the results returned… in combination or in isolation Channels “where on the web is it being talked about… twitter, blogs, forums, comments?” Volume “how much is it talked about, and how is this changing over time” Verbatims drill-down to individual posts, in their own words – “what do people actually say?” People “who is talking about it?” That may be by influence – according to various proprietary indices – or by demographics [to be used with caution] Themes “what other words and phrases are most regularly associated with it?” Location “where in the world is it being talked about?” Sentiment:Across all of these variables is superimposed automatically generated “Sentiment” analysis – positive, negative or neutral language associated with the subject of the posts…
Combinations of these basics tell different types of story • Brand A’s new ad was mainly discussed on Forums when it was being shot by a famous pop star, but was mainly discussed on Twitter when it was being aired. Volume + Channels • Automotive brand X is associated mainly with topics around performance, whereas brand Y is associated with comfort and style. Both enjoy roughly the same level of positive sentiment overall. Themes + Sentiment • Beverage brand N enjoyed a bigger ‘spike’ in its mentions when news of a future big game at a sponsored venue was announced, than it got from a tournament sponsorship that was live at the time. Volumevs. Offline Schedule • Some ‘general’ social Forum sites enjoy bigger concentrations of discussion of a particular topic than specialist Forums dedicated to that same topic! Channels + Themes + People • 17
Examples of outcomes from SMM studies • Consumers don’t always talk about the product features that you highlight. • Differentiate ‘trade press’ buzz from real engagement. • Places where naturally occurring discussion of a category offers an opportunity for brands to ‘intercept’ rather than try to create competing social media conversations. • ‘The world’ can sometimes throw up more interesting stories about you than you could hope to generate for yourself… but not always with the connotations you would like. • Focus on the right social media channels at the right time.
There are many forces which erode this nice model… Accuracy? Reach?................................................... Relevance? Reach image from titletrack.com
Accuracy • Is the searched-for phrase even in the returned “snippet”? • Is it ‘real content’ – or is it • Navigation? • Ticker or title content? • Ad Content? • Various species of spam [overlaps with ‘Relevance’]? • Is meta-data about the poster • Present? • Reliable? • Understanding this, apart from making your own manual checks, is about understanding your 3rd party vendor and, often, their ‘wholesale data suppliers’ in turn.
Reach • [T]here are known knowns; there are things we know that we know. • There are known unknowns; that is to say there are things that, we now know we don't know. • But there are also unknown unknowns – there are things we do not know, we don't know. • Donald Rumsfeld • Are these results from scrutiny of the entire [English speaking] social web No • Are they results from a very large, sometimes stated, number of social sources? Yes • Could this range be skewed relative to the subject under scrutiny? Yes • Where it’s Twitter data – is it from the whole of Twitter Maybe • Is historical data always the same basis as current data, • or data gathered since the search was defined? Not always • Do we always have a good idea of what the ‘Reach’ is? No
Relevance Even when the application has collected exactly what we asked for, and it is legitimate content, with some nice useful data about the poster… it might not be relevant “Cats are great company.” “#EMT Bolt one cool cat!” “Also, the Cat is a great resort” “I love my aunt Cat!” “I think Cat Stark is worse than any Lanister.” “I think this hurricane was a scam cooked up by the fat cats in Big Grocer.”
… put another way Oh s**t! I forgot it’s stillthe internet.
Other challenges include… However , commencing too early public smoking facts will just overstress your pet ; quite a fresh pet will not learn everything from services. Just after he has ended up perched for some a few moments, supply him with the particular take care of, plus for instance in advance of, make sure you compliment the pup. When dog house teaching your dog, continue to keep the dog house in the vicinity of the spot where you as well as the canine are usually conversing.
And I haven’t mentioned automated Sentiment Analysis yet! Irony – really? Slang/Dialect/Register Multiple meanings – “50 strong” Adjacent subjects – “My beautiful FIAT next to a BMW”
To Recap • SMM tools make it very easy to “Super Google” certain Brands, people, objects and even categories or concepts – quickly generating tables and charts. • But underneath there’s a complex story about accuracy, reach and relevance… which you only really see when you drill down… and which you only really understand by getting inside the provider’s systems and sources. • The fact that this isn’t blazoned across all dashboards, is about the fact that many solution providers started out somewhere else… with monitoring. It’s not that they should have anticipated our needs. • Sentiment analysis is only part of this story – it doesn’t define it.
Relationships matter as much as technology Social Media Content 3rd Party System [e.g. SaaS] Dashboard-wielding MR Agency Clients “Results” Reports [inc post hoc analysis] Wholesalers? FEEDS Modified searches Customise Engine Customise Feeds 3rd party organisation “Vendor” Topic-specific feedback Queries and more refined requirements
Natural Language Processing [NLP] to the rescue? Definition “Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output”* Many SMM applications now claim some level of NLP. *Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview This may legitimately be contrasted with simpler analysis of vocabulary combinations, and probabilistic methods, it sometimes means little. It may only mean that some rules of language have been ‘attended to’ in what is still essentially a pattern-matching exercise
But clearly sophisticated NLP can make a big difference • Improved Accuracy – including filtering out of unstructured spam • More tools available to achieve/check Relevance • Much-improved Sentiment Analysis Trends: • there’s more NLP – not just in social media analysis, • there’s more commercially affordableNLP and it keeps getting better, • some of it is even helpfully self-auditing. Significantly, when NLP is set to retain only high-confidence classifications, volumes of results are dramatically reduced.
Barking up the wrong Tree? Researchers’ instincts have been to use, and so judge, SMM like survey data. But “what is good”the ancient philosophers would tell us, is really about function and purpose. I think we’ve now learned enough about SMM to stop and ask.. “what was it we were trying to do?”
Remind me what we are trying to do? • Use the social web as a proxy for the population? • Understand how the social web is responding – for the benefit of those solely interested in this sub-set of the population as a channel or marketplace? • Access particularly niches which are more concentrated online than off? • Detect significant events? • Measure shifts and changes? • Make rough comparisons? • Discover new insights, themes and connections?
Different client needs indicate different SMM approachesFor example - Precision Extraction vs ‘Trawl & Filter’ More post processing, applied to data by MR agency - to reduce noise and refine sentiment attribution Quantitative - Brand tracking and integration with traditional research Not radical enough! Indicative Qual e.g. using trends and volumes to guide focus of analysis Sensible Accept raw data output from application Crude mention & mood tracking Exploratory Qual – more complex collection. Manually manageable volumes and ‘tuning’ Too much like hard work? Lower data volumes from targeted & compound search terms Higher data volumes from simple search terms
Rather than wait for NLP utopia… Settle, for now, on: • SMM as a powerful and novel Qual exploration tool • Big number crunching, on single terms, that takes a “hyena” approach. i.e. • Accept all* occurrences of a brand or product name in posts as an indication of significance… even the ‘trending’ spam and the adverts and the competitions… • Look for pure correlations between words/phrases and other word/phrases… • Or between trends in these numbers and classes of offline events – such as sales, complaints and other behaviours… with a view to predicting, explaining or causing such events in the future. • *Except for the most obvious duplication errors such as over-indexing
Talking Points How will commercial SMM applications and services with the best accuracy, reach and relevance capabilities be recognised, validated and promoted? If you’re a researcher and you want to use this stuff, for the first time, tomorrow… what must be done? Fortunately – there’s enough to learn by “super-googleing”, browsing and crude trend tracking to keep us going… and learning… for some time to come. Is that, whilst pragmatic, enough of an ambition?
Babita Earle Digital Strategy Director GfK NOP Tel: 020 7890 9467 E: babita.earle@gfk.com Dr Nick Buckley SoShall Consulting Tel: 07958 516967 t: @grimbold E: nick@soshall.net