260 likes | 401 Views
(Tacitly) Collaborative Question Answering Utilizing Web Trails. Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010. Collaboration. Working together Efficiency, sharing vs. groupthink Tacit collaboration Professional analysts → COLLANE system
E N D
(Tacitly) Collaborative Question Answering Utilizing Web Trails Tomek Strzalkowski & Sharon G. Small ILS Institute, SUNY Albany LAANCOR May 22, 2010 LREC QA workshop
Collaboration • Working together • Efficiency, sharing vs. groupthink • Tacit collaboration • Professional analysts → COLLANE system • Information sharing • Why and when • Collaborative filtering • Sharing insight and experience LREC QA workshop
Outline • Introduction • Collaborative Knowledge Layer • Web Trails • Exploratory Episodes • Experiments • Data Collection • Results • Collaborative Sharing • Conclusions • Future Research LREC QA workshop
Sharing on the Internet? • Internet users leave behind trails of their work • What they asked • What links they tried • What worked and what didn’t • Capture this Exploratory Knowledge • Utilize this Knowledge for subsequent users • Tacitly enables Collaborative Question Answering • Improved Efficiency and Accuracy LREC QA workshop
Collaborative Knowledge Layer • Captures exploration paths (Web Trails) • Supplies meaning to the underlying data • May clarify/alter originally intended meaning • Hypothesis: CKL may be utilized to • Improve interactive QA • Support tacit collaboration • Current Experiments • Capturing web exploration trails • Computing degree of trail overlap LREC QA workshop
Collaborative Space LREC QA workshop
Web Trails • Consists of individual exploratory moves • Entering a search query • Typing text into an input box • Responses from the browser • Offers accepted or ignored • Files saved • Items viewed • Links clicked through, etc. • Returns to the search box • Contain optimal paths leading to specific outcomes LREC QA workshop
Exploratory Episodes • Discovered overlapping subsequences of web trails • Common portions of exploratory web trails from multiple network users • May begin with a single user web trail • Shared with new users who appear to be pursuing a compatible task LREC QA workshop
G E F D T C B Q M A K A-B-Q-D-G Exploratory Episode helps new user from M to G LREC QA workshop
Experiment • Evaluate degree of web trail overlap • 11 Research Problems Defined • Generated 100 short queries for each research problem description • Used Google to retrieve the top 500 results from each query • ~500MB per topic • Filtered for duplicates, commercial, offensive topics, etc. • 2GB Corpus of web-mined text LREC QA workshop
Experiment setup • 4-6 Analysts per Research Topic • 2.5 hours per topic • Utilized two fully functional QA Systems • HITQA – Analytical QA system developed under the AQUAINT program at SUNY Albany • COLLANE – Collaborative extension of the HITIQA system developed under the CASE program at SUNY Albany • Analyst’s Objective • Find sufficient information for a 3-page report for the assigned topic LREC QA workshop
Example topic: artificial reefs Many countries are creating artificial reefs near their shores to foster sea life. In Florida a reef made of old tires caused a serious environmental problem. Please write a report on artificial reefs and their effects. Give some reasons as to why artificial reefs are created. Identify those built in the United States and around the world. Describe the types of artificial reefs created, the materials used and the sizes of the structures. Identify the types of man-made reefs that have been successful (success defined as an increase in sea life without negative environmental consequences). Identify those types that have been disasters. Explain the impact an artificial reef has on the environment and ecology. Discuss the EPA’s (Environmental Protection Agency) policy on artificial reefs. Include in your report any additional related information about this topic. LREC QA workshop
What is COLLANE? • Collaborative environment • Analysts work in teams • Synchronously and asynchronously • Information sharing on as-needed basis An Analytic Tool Exploits the strength of collaborative work LREC QA workshop
Collaborating via COLLANE • A team of users work on a task • Each user has own working space • A Combined Answer Space is created • Made out of individual contributions • Users interact with the system • Via question answering and visual interfaces • The system observes and facilitates • Shares relevant information found by others tacit collaboration • Users interact with each other • Exchange tips and data items via a chat facility open collaboration LREC QA workshop
COLLANE/HITIQA user interface LREC QA workshop
Key Tracked Events • Questions Asked • Data Items Copied • Data Items Ignored • Systems offers accepted/rejected • Displaying Text • Words searched in user interface • All dialogue between user and system • Bringing up full document source • Passages viewed • Time spent LREC QA workshop
Experimental Results • Aligned Episodes on common data items • Only considered user copy as indicator • Used document level overlap • Ignored potential content overlap between different documents • Lower bound on Episode overlap LREC QA workshop
A-G & E-H 60-75% Overlap Artificial Reefs Example LREC QA workshop
Experimental Results • 95 Exploratory Episodes • EE grouped by the degree of overlap • 60% or higher → may be shared? • OR • 40% or lower → divergent? • Find an overlap threshold • Maximize information sharing • Minimize rejection LREC QA workshop
Some topics appear more suitable for information sharing and tacit collaboration LREC QA workshop
At 50% episode overlap threshold more than half of all episodes are candidates for sharing LREC QA workshop
Collaborative Sharing Objective • Leverage Exploratory Knowledge • Use experience and judgment of users who faced the same or similar problem • Provide superior accuracy and responsiveness to subsequent users • Similar to Relevance Feedback in IR • Community based rather than single user judgment LREC QA workshop
Utilize User B trail • Offer D4-D7 to User D • After D3 copy • Avoids 2 fruitless questions • Q2 & Q4 • Finds extra potential relevant data point • D7 LREC QA workshop
Conclusion • Users searching for information in a networked environment leave behind exploratory trails that can be captured • Exploratory Episodes can be compared for overlap by data items copied • Many users searching for same or highly related information are likely to follow similar routes through the data • When a user overlaps an EE above a threshold they may benefit from tacit information sharing LREC QA workshop
Future Research • Evaluate overlap utilizing semantic equivalence of data items copied • Distill Exploratory Episodes into shareable knowledge elements • Expand overlap metrics • Question similarity • Items Ignored, etc. • Evaluate frequency of acceptance of offered material • Varying thresholds LREC QA workshop