280 likes | 286 Views
Weblogs for Research(ers). Anjo Anjewierden Human Computer Studies laboratory Faculty of Science University of Amsterdam http://anjo.blogs.com Many thanks to Lilia Efimova, Rogier Brussee, Robert de Hoog, Stephanie Hendrick and the blogosphere in general. What is a weblog (1)?.
E N D
Weblogs for Research(ers) Anjo Anjewierden Human Computer Studies laboratory Faculty of Science University of Amsterdam http://anjo.blogs.com Many thanks to Lilia Efimova, Rogier Brussee, Robert de Hoog, Stephanie Hendrick and the blogosphere in general Weblog research
What is a weblog (1)? • Most common descriptive definition: a weblog is • a personal journal, • updated regularly, • published on the internet; and • posts (entries) appear in reverse chronological order Weblog research
What is a weblog (2)? • Weblogs are social as they encourage others to participate using two mechanisms: • Posts have an explicit point of reference called a permalink • Permalinks make it possible for people to link to each other’s posts: share and discuss • Readers, possibly without a weblog, are invited to join as all posts have a comment link Weblog research
Anatomy of Weblogs • For example: my weblog Weblog research
Weblog Research is about … • Humans who share findings, thoughts, ideas and sometimes feelings in their weblogs • Computers which make it possible to create weblogs, read weblogs, and to comment and to link • Studies which analyse why and how people blog about what and to whom • Laboratory: weblog researchers need a stable environment in which to conduct their research Weblog research
Do we want to research weblogs … • Blog (short for weblog, we-blog) was word of the year 2004 by Merriam Webster. To blog, blogger, blogging, blogosphere, etc. • Communications of the ACM (CACM) carried a special issue on weblogs (December 2004) • Unfiltered and Public For the first time we get access to a large body of material on a particular person, written by that same person • Research relevance Social studies, Knowledge Management (for professional weblogs), education, linguistics … and even Semantic Blogging (combining Semantic Web and blogging) has been coined • Compare Digital Cities research by Beckers / Van den Bersselaar (at SWI) Weblog research
BlogTrace the Laboratory (1) • Weblogs are represented as HTML pages • Complex layout, difficult to find the posts • Manual research is extremely labour intensive • There is a serious lack of tools that support weblog research Weblog research
BlogTrace the Laboratory (2) • BlogTrace spider makes data collection and research a lot easier • Automatically extracts posts from the HTML • Generates the link structure of the weblog and represents it as RDF/OWL • Generates an RSS feed that contains all posts for a weblog • Implemented using induction algorithms, which learn what are posts and what is layout Weblog research
Ontologies used in BlogTrace • DC: Dublin core (names, dates, descriptions) • FOAF: Friend of a friend (documents, people) • RSS 1.0 (RDF): Really simple syndication (representation of full posts) • Link ontology, for example a link (href in HTML) becomes: • Link link:sourceDocument <http://…/>; • Link link:targetDocument <http://…/>; • Link link:anchorText “interesting site”; • Etc. Weblog research
Weblogs can now be studied … • Even using Semantic Web technology (RDF/OWL) link:WeblogPostLink rdfs:subClassOf link:SimpleLink; rdfs:comment "A WeblogPostLink is a SimpleLink if and only if both the source and the target documents are weblog posts (RSS items)."; rdfs:label "WeblogPostLink"; owl:intersectionOf (link:SimpleLink [ a owl:Restriction; owl:onProperty link:sourceDocument; owl:someValuesFrom rss:item ] [ a owl:Restriction; owl:onProperty link:targetDocument; owl:someValuesFrom rss:item ]). link:WeblogPostLink rdfs:subClassOf link:Link; rdfs:comment "A WeblogPostLink is a Link if and only if both the source and the target documents are weblog posts (RSS items)"; owl:intersectionOf (link:Link [ a owl:Restriction; owl:onProperty link:sourceDocument; owl:someValuesFrom rss:item ] [ a owl:Restriction; owl:onProperty link:targetDocument; owl:someValuesFrom rss:item ]). Weblog research
Some Weblog Research Questions • Weblog communities • Do they exist? • How can they be defined and found? • What is the social structure? • What are the conventions in the community? • Text analysis of weblogs • What do people blog about (terms, topics)? • Do they share terminology? • Can personal conceptualisations be extracted? • Conversations • Can linked weblog posts be seen as conversations? • Can we identify when there is a “knowledge flow”? Weblog research
Implementations and Papers • Weblog communities: • Visual Settlements • Graphically displays weblog community linkage based on a “weblog is a city” metaphor • Community determined by “Virtual Settlements” paper (Efimova & Hendrick, 2005) • Text analysis of weblogs: • Sigmund (Anjewierden, Brussee & Efimova, 2004) • Co-occurrence based statistical algorithm that identifies concepts and their relations for a weblog • Conversations: • Knowledge flows (Anjewierden, De Hoog, Brussee & Efimova, 2005) • Hypothesis: chance of a knowledge flow is greater when the sender and receiver share conceptualisations Weblog research
Visual Settlements • Idea • Can we compress a weblog to a single picture? • Such that we can use the picture to compare it to other weblogs in a community • And, of course, learn something … • Inspiration • Maps in general • Books by Edward Tufte on “Information Design” • The Visual Display of Quantitative Information (1983) • Envisioning Information (1990) • Beautiful Evidence (2005; forthcoming) • (Discovered Tufte by blog reading) Weblog research
Anatomy of Visual Settlements Without links in the community (house) I link to someone (I’m at work) Someone links to me (I’m in the park) Size: number of words in the post Layout: if I link to earlier posts they are close Time: early post in center, radiate outwards
Sigmund • Idea • Using co-occurrence to determine whether terms are related • Related terms might point to conceptualisations of the blogger • And, these conceptualisations might be shared by other bloggers • Supported by • Tools that are part of my regular research on methods to support ontology development from documents • In particular: term extraction and named entity recognition Weblog research
Making a Difference • Idea • In a community of bloggers it is likely terminology is shared • Finding the shared terms is interesting (see Sigmund) • But a blogger is a person and not a web page • So, what makes them different? • Implementation • Run Sigmund on all blogs in a community • Find terms that are commonfor a particular blog and not common for others in the community • Example: Making a Difference post Weblog research
Knowledge Flows • Idea and Motivation • When bloggers link to a post of other bloggers • Could it be a “knowledge flow”? • Motivated by potential use as a knowledge management tool • Implementation • Use Sigmund’s co-occurrence algorithm • Term overlap in linked posts is the main metric • Make a distinction between shared and agreed terms (used by both bloggers) and private terms (used by one of blogger) Weblog research
Knowledge Flows • Idea and Motivation • When bloggers link to a post of other bloggers • Could it be a “knowledge flow”? • Motivated by potential use as a knowledge management tool • Implementation • Use Sigmund’s co-occurrence algorithm • Term overlap in linked posts is the main metric • Make a distinction between shared and agreed terms (used by both bloggers) and private terms (used by one of blogger) Weblog research
Weblogs for Researchers • Experiment (Metis project) • Six researchers (previously non-bloggers) started a weblog to get hands-on experience • Two gave up rather early • One thinks about underpants when blogging • Three (includes myself) continue after the experiment finished • Evaluation • Posts are not emails (everybody can read them!) • Posts are not academic papers • Developing a blogging style (how and about what you blog) is difficult and different for everybody Weblog research
Conclusions (1) • Blogging as a tool for researchers • Try it! • Works for me, both reading and writing • By sharing ideas on your blog, you may get help! Weblog research
Conclusions (2) • Enormous amount of data (paradise for someone like me) • Tempting to continue my own weblog research • If others have better ideas than I have, and some do, I gladly return to my role as supporting others to do their weblog research Weblog research