300 likes | 421 Views
Best of UseR! 2011. A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham R User Meeting 20 th March 2012. My Bias…. I work in commercial data mining , data analysis and data visualisation
E N D
Best of UseR! 2011 • A personal & biased view with an emphasis on data visualisationAndy Pryke • Andy@the-data-mine.co.uk • Birmingham R User Meeting • 20th March 2012
My Bias… I work in commercial data mining, data analysis and data visualisation Background in computing and artificial intelligence Use R to write programs which analyse data
Using Google Visualisation API from R • Speaker: Markus Gesmann, Lloyds • Motivation: Display statistics about publications on a website • 18 different charts are available through Google API • Requires internet access & viewed through web browser • Data is embedded in HTML, with call to google'sjavascript visualisation API • Using RAPACHE you can mix HTML & R (bit like Sweave) • Can update data & look of chart from R by modifying the object returned by the plotting method
Google Visualisation API - Code install.packages("googleVis") library("googleVis") demo("googleVis") demo(package="googleVis") # Example from demo: require(datasets) states <- data.frame(state.name, state.x77) GeoStates <- gvisGeoChart(states, "state.name", "Illiteracy", options=list(region="US",displayMode="regions", resolution="provinces", width=600, height=400)) plot(GeoStates)
Google Visualisation API – More info Use at Lloyds: http://lloyds.com/stats Video demo: http://goo.gl/zfQdG
More Information… • In use on Lloyds website: http://lloyds.com/stats • Original Slides: http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Contributed/16Aug_0950_Kaleid_Ib_2-Gesmann.pdf- Includes good list of other interesting packages
Nomograms for visualising relationshipsbetween three variables • Jonathan Rougier • - Dept Mathematics, Univ. Bristol • Kate Milner • - Crossroads Veterinary Centre,Buckinghamshire
How to Use R, in a Morocan Marketplace, to Improve the Life of Donkeys • It's hard to weigh donkeys in North Africa, but useful to know their weight when prescribing drugs. • 1) Measure the weight, height,girth, body condition, age and gender of donkeys. • 2) Use R to create a predictive model of weight • 3) Create a nonographic model which can be used by vets on the ground
How Heavy is that Donkey? • Initial Model – Complex ! • sqrt(Weight) ~ BCSis + Gender + Age + log(HeartGirth) + log(Height) + log(HeartGirth):log(Height) + BCSis:log(HeartGirth) + Gender:log(HeartGirth) + Age:log(HeartGirth) + BCSis:log(Height) + Gender:log(Height) + Age:log(Height)
How Heavy is that Donkey? • Use stepAIC in the MASS package to simplify the model… • Final Model: • sqrt(Weight) ~ BCSis + Age + log(HeartGirth) + log(Height) • Still hard to use in a dust marketplace though…
Solution - Nomograms • “Graphical representation of formula allowing calculations to be made using paper and a ruler” • Published in books & on charts to make complex calculations possible before calculators & computers • Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493. • http://myreckonings.com/wordpress/wp-content/uploads/JournalArticle/The Lost Art of Nomography.pdf
More information… • Jonty’s Home page with links to slides & code from: http://www.maths.bris.ac.uk/~MAZJCR/#pres • Presentation Slides: http://www.maths.bris.ac.uk/~MAZJCR/jontyUseR.pdf • Package Design also has a nomogram function() – Not in Cran any more but old versions available.
Easy interactive ggplots • Speaker: Richie Cotton • Clever use of packages ggplots and gWidgetstcltk together, allowing clear and simple code for interactive control of charts • Example data: Chromium exposure of welders. Took air concentations & urine samples (pre/post exposure)
More Information… • Links at: http://www.bitly.com/jV1NBn • Code linked directly from http://4dpiecharts.com/2011/08/17/user2011-easy-interactive-ggplots-talk/ • See also: package gWidgets - wraps 5 UI toolkits
Predicting Personality fromSocial Network Data • Speaker: Daniel Chapsky, Hampshire College • This was quite a fast talk, but one of my favourite pieces of work, so apologies if I've mis-interpreted anything! • Big 5 theory of personality is that 5 dimensions can predict attitude, views, behaviour • This work attempts to build a model which predicts someone's "big 5" values from Online Social Network (OSN) data
Predicting Personality - Data • 615 respondents • 100 question open source personality test, "IPIP NEO" • Data last.fm, netflicks, etc – e.g. genres listened to • Distance from home town to current residence - liberallity correlates with amount of moving around • Mean income, Education level • Race inferred from surname • Data was continuous • Missing data was inferred using gibbs sampling
Predicting Personality – Model • Continuous bayesian networks - discrete needs more data • - Often weaker prediction than black box • + Clear semantics • + Works with limited evidence • + Hybrid network
Predicting Personality – Packages • Database connectivity - RMySQL • Web scraping / API connection - RCurl, RJSONIO, XML • Inference through mashups - psych, geosphere • Data Cleaning - plyr, reshape2, bayestree, mice, tm, mvoutlier • Bayesian Network construction - bnlearn, pcalg • Parallelization of optimization - foreach, snow • Graphics - Latticist, bnlearn, ggplot2
Agreeableness = 42.4 • - 1.26(Sex.Missing) • - 2.47(Sex.Male) • - 25.99(Home.Teen.Prop) • - 0.63(Movie.Dystopia-Political) • - 0.49(Movie.Action-thriller) • + 6.51(Wall.Status.Ratio) • + 0.08(Conscientiousness) • - 0.29(Neuroticism) • R2 = 0.46
More Information • Original Slides: • http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Contributed/17Aug_1115_FocusIII_5-DataMining_2-Chapsky.pdf