760 likes | 1.19k Views
Virtual Knowledge Studio (VKS). Information Studies. Webometrics: A Quick Introduction. Prof. Mike Thelwall Statistical Cybermetrics Research Group, University of Wolverhampton, UK. Overview. What is Webometrics?
E N D
Virtual Knowledge Studio (VKS) Information Studies Webometrics: A Quick Introduction Prof. Mike Thelwall Statistical Cybermetrics Research Group, University of Wolverhampton, UK
Overview • What is Webometrics? • Gathering, processing and analysing large scale data from the web (web pages, hyperlinks, blogs, Web 2.0) for many purposes that include online communication, although primarily for scientific communication • What can Webometrics offer other researchers? • Software to gather data from web sites, search engines, social network sites and blogs; methods to extract useful patterns • Collaboration with other social scientists on their problems (e.g., jokes, UN initiatives, research dissemination, politics, media) http://cybermetrics.wlv.ac.uk
Example: Identifying and tracking public science concerns in blogs Over 100,000 Blogs and other sources tracked daily via RSS feeds Objective: to identify and track public concerns about science E.g., “Schiavo” identified and tracked as potential public science concern
Example: Analysis of the accuracy of search engine results Live Search results analysis
Example: Hyperlinks to UK universities correlate strongly with their research productivity The reason for the strong correlation is the quantity of Web publication, not its quality This is different to citation analysis
Austria Geopolitical connected Example: Links between EU universities Switzerland Belgium Germany France Spain NL UK Norway Italy Poland Finland Sweden Normalised linking, smallest countries removed
Virtual Knowledge Studio (VKS) Information Studies Exploiting the Web as a Social Science Resource:Link analysis Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
1. Link Analysis • A new source of information about: • relationships between people, organisations and information • the impact of information and ideas • But: • Results should be interpreted with care
Link Analysis: Motivation • Individual hyperlinks reflect concrete creation reasons such as connections between web page contents or creators • Counts of large numbers of hyperlinks may reflect wider underlying social processes • Links may reflect phenomena that have previously been difficult to study • E.g. informal scholarly communication • Inter-organisational relationships
Choosing the correct “counting level” • It is typically best to count interlinking web sites rather than interlinking web pages Page1 Page3 Page4 Page5 Counts as one link www.scit.wlv.ac.uk www.oii.ox.ac.uk
Links to UK universities against their research productivity The reason for the strong correlation is the quantity of Web publication, not its quality This is different to citation analysis
Can map patterns of international communication Counts of links between EU universities in Swedish are represented by arrow thickness.
Counts of links between EU universities in French are represented by arrow thickness.
Most links are only loosely related to research • 90% of links between UK university sites relate to scholarly activity, including teaching and research • But less than 1% are equivalent to citations • Link counts do not measure research but are a natural by-product of scholarly activity • Use link counts to track an aspect of communication
Larger network diagrams… • The next slide is a (Kamada-Kawai) network of the interlinking of the “top” 5 universities in AEAN countries (Asia and Europe) with arrows representing at least 100 links and universities not connected removed. (Research with Han Woo Park)
Larger complex network diagrams 2 • The next slide is similar to the previous slide but • uses a different layout (Fruchtermann-Rheingold) • Line widths are varied - proportional to (co)link counts • Node widths are varied - proportional to (co)link counts (Research with Mahmood Enayat)
Link Impact Reports • Standardised comparative analysis of the link impact of a web site • Example audit: • http://cybermetrics.wlv.ac.uk/audit/101/ • Similar reports can be created for non-link impact (citation impact) • http://cybermetrics.wlv.ac.uk/audit/books/
References • Thelwall, M. (2002). Evidence for the existence of geographic trends in university web site interlinking, Journal of Documentation, 58(5), 563-574. • Thelwall, M. (2002). Conceptualizing documentation on the Web: an evaluation of different heuristic-based models for counting links between university web sites, Journal of the American Society for Information Science and Technology, 53(12), 995-1005. [counting methods] • Thelwall, M. (2006). Interpreting social science link analysis research: A theoretical framework. Journal of the American Society for Information Science and Technology, 57(1), 60-68. • Thelwall, M. (2004). Link analysis: An information science approach. San Diego: Academic Press.
Appendix: Types of link count • Direct link counts • Co-inlink counts • B and C are co-inlinked • Co-outlink counts • D and E are co-outlinked D E A F B C
Appendix 2: Linking patterns vary enormously by discipline • No evidence of a significant geographic trend • Disciplinary differences in the extent of interlinking: e.g., history Web use is very low, Chemistry is very high • Individual research projects can have an enormous impact upon individual departments • E.g. Arts web sites are often for specific exhibitions or for digital media projects • Links not frequent enough to reliably reveal patterns of interdiscipliniarity
Virtual Knowledge Studio (VKS) Information Studies MySpace members an example of large scale analysis of information about individuals Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
MySpace members data • a random sample of 15,043 members • a systematic sample of 7,627 members who joined on July 3, 2006 • 403 members from July 3, 2006 • excluded: music sites, ex-members • excluded (normally): members with 0 or 1 friends all information on home pages was automatically downloaded and harvested by SocSciBot 4 -> Excel
days since last access -members use MySpace once or frequently all Members >>
days since last access -members use MySpace once or frequently July 3 members
gender factors • female users more likely to be “here for” friendship and male users more likely to be “here for” dating (but only a minority) • males and females both preferred to have more female friends and top 8 friends • females preferred a greater proportion of female Top 8 friends women make the best friends! (403 data set)
who swears most? • for US MySpace home pages: • male = more likely to contain strong swearing • for UK MySpace home pages • male = more likely to contain moderate swearing • no difference in strong swearing - possibly more strong swearing in female home pages in the younger age groups • apparent reversal in gendered strong swearing in the UK for young people >> July 3, 2006 members, extended collection
percentage of profiles containing swearing (typical sample size 20-148 for non-web swearing research)
Emphatic adverb/adjective OR Adverbial booster OR Premodifying intensifying negative adjective (36% of swearing) • and we r guna go to town again n make a ryt fuckin nyt of it again lol • see look i'm fucking commenting u back • lol and stop fucking tickleing me!! • Thanks for the party last night it was fucking good and you are great hosts. • That 50's rock and roll weekender was fucking mint! • Fuckin my space, my arse • 1/2 d ppl cudnt even speak fuckin english! • yeah so me and sarah broke up and everythings fucking shit
Personal insult referring to defined entity (28% of swearing) • tehe i am sorry.. i m such a sleep deprived twat alot of the time! lol • Maxy is the soundest cunt in the world!!!! • 3rd? i thought i was your main man number one? Fucker • write bak cunt xxx • You Godless bastard! • You evil cunt! Haha • CHEEKY LITTLE CUNT ! • lucky fuck
Idiomatic set phrase OR Figurative extension of literal meaning (23%, mostly male) • think am gonna get him an album or summet fuck nows • got another copy of the reaction CD (will had fucked the last one lol) • qu'est ce que fuck? • what the fuck pubehead whos pete and why is this necicery mate • Heh long story.. cant be fucked to explain :D
Geography of MySpace Friends • Tobias Escher’s 2007 presentation… • Tobias is also helping to develop a Perl module WWW::Myspace for automatic extraction of information
conclusion • quantitative data can shed light on some aspects of social networking
other references • Thelwall, M. (2008). Social networks, gender and friending: An analysis of MySpace member profiles, Journal of the American Society for Information Science and Technology, 59(8), 1321-1330. • Thelwall, M. (2008). Fk yea I swear: Cursing and gender in a corpus of MySpace pages, Corpora, 3(1), 83-107.
Virtual Knowledge Studio (VKS) Information Studies Blog searching: New Insights into Customers/Citizens/Voters? Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK http://cybermetrics.wlv.ac.uk/
Background • There are millions of bloggers • Automatically tracking bloggers’ postings may give insights into public opinion Hindsight fallacy
Blog tracking companies • IBM • WebFountain qualitative and quantitative analysis of the Web, intranet data, and other sources • Nielsen Buzzmetrics • BlogPulse • “Monitor, measure and leverage consumer-generated media” • Market sentinel • supplier of blog and web monitoring services • identifies the sources that companies should monitor to take business decisions
Tracking Debates in Blogs Case study: Public science debates
Blog keyword searches • Technorati “Searches weblogs by keyword and for links” • Nokia • Blogdigger • stem cell research • IceRocket • Allows Advanced searches • Allows genuine date range search (Google only allows “last updated” date range searches)
Track evolution over time • What is changing about interest in Stem cell research/GM food? • Are experts good at identifying changes in public interest? • How can experts be sure/can they be supported with quantitative information? • Can blogs be used to generate time series reflecting changes in “public interest”?
Free debate evolution graphs • Solves the trend identification problem? • Blogpulse Offers free automatic blog searches and keyword-generated click-search graphs • Stem cell research • Kasabian • Mobile phone radiation • cartoons AND (denmark or danish)