200 likes | 300 Views
Web Analyst: Software for Solving Real-World Problems by Understanding Virtual Communities.
E N D
Web Analyst: Software for Solving Real-World Problems by Understanding Virtual Communities PI: Sun-Ki Chai (Dept. of Sociology)Co-PIs: David Chin (Dept. of Information & Computer Sciences)Scott Robertson (Dept. of Information & Computer Sciences)Mooweon Rhee (Dept. of Management and Industrial Relations)Min-Sun Kim (Dept. of Speech Communications) Jang Hyun Kim (Dept. of Speech Communications) Research Assistants: Kar-Hai Chu, Aaron Herres & Dong-Wan Kang United States Patent # 7499965 by Sun-Ki Chai Current Research Supported by the Air Force Office of Scientific Research and the Office of Naval Research
Why the Need for Better Web Analysis Software? • The WWW is becoming the location for much of the world’s cultural, political and economic activity • Information on the web is mostly publicly available and can be collected with little cost, time, and intrusion. Still, no general-purpose software tool existsfor systematically collecting and analyzing data from the web to answer real-world questions about human attitudes and behavior. To be effective, such a tool must be based uponcutting-edge social science theoriesand methods for locating, analyzing, and presenting relevant and accurate information.
What Kind of General Solutions WillThis Software Provide? • Who is most interested in your product or issue? • How do they feel towards this product or issue? • What other interests do members of this community have? • Who are the most influential and powerful people in this community? • What are the characteristics that make them so? • How do we associate an online community with a particular real-world location and group? • What will the future behaviors of that group be?
Web-Mining Software that Understands the Social Nature of the Web • Integrates a wide range of validated social science theories on social networks, language, attitudes, culture, and behavior. • Downloads and stores a full and customizable rangeof content, link, geographical, and traffic data. • Includes specialized forums and blogsanalysis and collection. • Variety of interfaces allow customization of crawl, analysis, visualization, and output.
Starting Out: Locating the Right Virtual Community • User enters a few “seed” sites to start the exploration. • Control exactly how many sites to look for, how deep to go into each site, or select one of our pre-made profiles
The System at Work: Building a Virtual Community • Our interface provides real-time feedback as it explores the web, including visual map and listing of the virtual community as it grows. • System allows users to halt processing at any time, save the stage, and resume at a later point.
Specialized Exploration: The Forum Analyzer • Forum • A message board or online discussion site • Forum Analyzer • Measures activity level • Estimates the strength of community • Detects opinion and sentiment
Community Metrics in Forums • Web metrics • Standard web traffic statistics and data • Community metrics • Member interaction and communication structure • Response time, mean reply depth, active participation rate
Content Analysis: Contrast Between Different Kinds Forums • Pronoun usage • I, we, they • Emotions • neg emotion, anxiety, sadness, anger
Analysis of Member Communication Networks in Forums Centralized vs. Distributed
What The Technology Provides • Human behavior databased on established social science theories and metrics. • These can be freely composed, weighted, and added together in an intuitive way. • Content and network information combined seamlessly to provide the most accurate answers.
Features to be Implemented • Integration of specialized blog analysisfeatures in partnership with ASU. • Specialized front-end modules for specific fields of inquiry, e.g. product marketing, predicting political opinion trends, violence and risk assessment. • Systematic comparisonof data obtained from virtual communities to that from traditional surveys and experiments. • Integration with UH cultural change and behavioral modeling software
END www.manoa.hawaii.edu/ccpv
Sample Partial Output (2) %URL http://boycotts.org/ %TITLE Boycott Action News %CONTACT Co-Op America, 1612 K Street N.W., 600, Washington, DC 20006 US, (301) 881-4900, patr@crcwnet.com %DESCRIPTION Co-op America's "tell it like it is" information about corporations that need to be made more accountable. %CATEGORIES Society / Issues / Business / Corporate Accountability, News / Current Events / Business and Economy / Business and Society / Business Ethics %RELATED: http://www.ceres.org/, http://www.bts.gov/ntda/oai/, http://www.betterworld.com/BWZ/9604/welcome.htm/, http://www.asyousow.org/, http://shareholderaction.org/, http://adbusters.org/campaigns/corporate/flash.html/, http://www.censorshipkills.com/, http://www.bitc.org.uk/ . . . %WORDFREQ: 221,"the“, 147,"to“, 120,"and“, 109,"of“, 74,"boycott“, 62,"in“, 59,"a“, 55,"by“, 55,"that“, 40,"products“, 34,"contact“, 34,"workers“, 34,"www“, 33,"information“, 33,"is“, 32,"has“, 32,"web“, 31,"site“, 31,"target“, 30,"company“, 30,"its“, 30,"made“, 29,"phone“, 27,"org“, 26,"com“, 25,"for“, 25,"organizers“, 24,"with“, 23,"consumer“, 23,"from“, 23,"not“, 22,"action“, 22,"email“, 20,"consumers“, 20,"world“, 19,"as“, 19,"being“, 19,"have“, 19,"on“, 18,"allegations“, 18,"boycotted“, 18,"fax“, 18,"organizer“, 17,"inc“, 17,"requested“,17,"tobacco“ . . .
What Kind of General Solutions WillThis Software Provide? • Locate the virtual community that best represents a social group of interest to the user. • Who are the people on internet who are most interested in a particular product or issue? • Find out ideas and sentiments most prevalent within a community, and predict how these will change over time. • What other interests do members of this community have? • In what ways is this community united/divided? • Identify the most powerful and influential actors in a community and the characteristics that make them so. • Who are the opinion-makers that I should try to look at first? • Predict the future behaviors of social groups from their online presence and identify emerging political and cultural movements. • Which political groups will translate their opinions into open conflict with the government? • Provide a kind of "reverse search engine" that generates the most important identifying characteristics of a community.
Web-Mining Software that Understands the Social Nature of the Web • Integrates a wide range of validated social science theories on social networks, language, attitudes and culture, and behavior to identify and analyze those websites most relevant to user. • Downloads and stores a full range of content, link, geographical, and visit data on these sites. • Simple analysis for first-time users, and power interface that allows full customization of crawl, analysis, and output. • Includes specialized tools for that recognize and perform enhanced data collection and analysis on forums and blogs. • Real-time visual and data feedback as software explores and analyzes sites. • Generated information is collected in data files that are easily integrated with popular third-party software for further analysis.
Features to be Implemented • Integration of specialized blog analysis features in partnership with ASU. • Wizard interface that allows user to specify a research question directly, then configures the crawl, analysis, and data formatting to best answer this question. • Implementation in both desktop and web application modes. • Specialized front-end modules for specific fields of inquiry, e.g. product marketing, predicting political opinion trends, violence and risk assessment. • Systematic comparison of data obtained from virtual communities to that from traditional surveys and experiments. • Integration with UH cultural change and behavioral modeling software to create forecast systems that can automatically collect the data they need to make their predictions. • Any requests?
Comments Avoid embedded video, so may have play stand-alone – mpeg-1, wmv file Went 40 minutes – part was discussion, but may need to paring things down Less text on first three slides . . . more audience participation Multiple people talking . . . OK What would you take out – need to know what it can do, but not how it works Me in particular need to be succinct – 7 minutes approx. Can eat into your Q&A but this is not good. Questions coming up – how do you differentiate troller from opinion leader? More about what it does rather than how it does it . . . Spare slides for technical questions. . .