1 / 69

Implications of Web 2.0 on Information Research

Implications of Web 2.0 on Information Research. Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw. Outline. What is Web 2.0? Web 2.0 and Research Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info Conclusion. What is Web 2.0?.

kamea
Download Presentation

Implications of Web 2.0 on Information Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 hsu@iis.sinica.edu.tw

  2. Outline • What is Web 2.0? • Web 2.0 and Research • Human-based Computation • Folksonomy (Social Tagging) • Academic Data Analysis • GIO-Info • Conclusion

  3. What is Web 2.0? • Web 2.0 Conference (October 2004) • Tim O'Reilly • The Web As a Platform • Harnessing Collective Intelligence • Data is the Next Intel Inside • End of the Software Release Cycle • Lightweight Programming Models • Software Above the Level of a Single Device • Rich User Experiences

  4. Key Web 2.0 services/applications Blogs Wikis Tagging and social bookmarking Multimedia sharing RSS and syndication Podcasting P2P

  5. Social Bookmarking Source: http://funp.com/push/

  6. Source: http://digg.com/ Soruce: http://www.hemidemi.com/

  7. Blog Social bookmark adsense Content comments Source: http://carol.bluecircus.net/

  8. Skype Source: S.A Baset, H. Schulzrinne (September 14, 2004). An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol. Technical Report. Columbia University.

  9. Wikipedia

  10. Second Life

  11. Symbiosis (共生機制) is the Key Blog Social bookmark

  12. The Web Changes in Several Dimensions • Dynamics • Heterogeneity • Collaboration • Composition • Socialization

  13. Current Research Activities • Information Retrieval on Blogs • NTCIR-7 CLIRB (Cross-Lingual Information Retrieval for Blog) • Question Answering on Blogs • TREC 2007 QA Track • Question Answering on Wikipedia • QA@CLEF 2007 • CLEF 2006 WiQA • given a Wikipedia page, locate information snippets in Wikipedia • PASCAL Ontology Learning Challenge • Ontology construction • Ontology extension • Ontology population • Concept naming • LinkKDD2006, Textlink2007, MRDM2007

  14. International Competition 1st/9place in the NTCIR5 2005 CLQA Chinese Question Answering Contest (44.5%) 1st/13place in the WS CityU closed track of the SIGHAN 2006 Word Segmentation Contest (97.2%) 2nd/10place in the WS CKIP closed track of the SIGHAN 2006 Word Segmentation Contest (95.7%) 2nd/8 placein the NER CityU closed track of the SIGHAN 2006 Named Entity Recognition Contest (88%) 1stplace in the NTCIR6 2006 CLQA Chinese Question Answering Contest (55.3%) 1stplace in the NTCIR6 2006 CLQA English-Chinese Question Answering Contest (34%)

  15. Factoid Questions PERSON: 請問芬蘭第一位女總統為誰? Who is Finland's first woman president? LOCATION:請問狂牛症最早起源於何國?Which country is the mad cow disease originated from? ORGANIZATION:請問收購南韓三星汽車的外國廠商為何?Which corporation bought South Korea's Samsung Motors? TIME NUMBER ARTIFACT

  16. IASL QA Architecture Answer Extraction Question Processing Mencius SVM ME Filter InfoMap AutoTag Mencius Answer Ranking Passage Retrieval Lucene AutoTag Answers word index char index documents

  17. Chinese Question Taxonomyfor NTCIR CLQA Factoid Question Answering

  18. Knowledge Representation of Chinese Questions Chinese Question: 2004年奧運在哪一個城市舉行? (In which citywerethe Olympicsheldin 2004?) [5 Time]:[3 Organization]:[7 Q_Location]:([9 LocaitonRelatedEvent])

  19. QC by SVM Two types of feature used for CQC Syntactic features Bag-of-Words character-based bigram (CB) word-based bigram (WB) Part-of-Speech (POS) AUTOTAG POS tagger developed by CKIP, Academia Sinica Semantic Features HowNet Senses HowNet Main Definition (HNMD) HowNet Definition (HND)

  20. Question Classification Accuracy

  21. Answer Extraction 廿一世紀美國總統 總統父子檔美國第二對 美國總統性事錄 翻開美國總統傳訊史 美國總統匆忙赴晚宴 陸文斯基瘋狂愛上美國總統 美國總統大選選舉人票分析 前越南總統阮文紹病逝美國 美國總統柯林頓表示 Answer Extraction Mencius Filter 陸文斯基 阮文紹 柯林頓

  22. Templates generated by local alignment ..因/Cbb/O 台中縣/Nc/LOC 議長/Na/OCC 顏清標/Nb/PER涉嫌/VK/O.... 清朝/Nd/O 台灣/Nc/LOC 巡撫/Na/OCC 劉銘傳/Nb/PER所/D/O.. LOCOCCPER(contains only NEs) 被/P/O 大陸/Nc/LOC 國家/Na/O主席/Na/OCC 江澤民/Nb/O 形容為/VG/O../COMMA/O 香港/Nc/LOC 行政/Na/O長官/Na/OCC 董建華/Nb/PER 近日..俄羅斯/Nc/LOC 男子/Na/O選手/Na/OCC 史莫契柯夫/Nb/O 在/P/O.. LOC Na OCC Nb (template contains POS-tag) 由/P/O 建業/Nc/O 所長/Na/OCC 張龍憲/Nb/PER擔任/VG/O 由/P/O 安侯/Nb/O 所長/Na/OCC 魏忠華/Nb/PER擔任/VG/O由N所長PER 擔任(template contains paritial POS-tag, word) 在/P/O 卡達首都/Nc/LOC多哈/D/PER,LOC舉行/VC/O於/P/O 國父紀念館/Nc/ORG - 舉行/VC/O在/P/O 國父紀念館/Nc/ORG 廣場/Nc/O 舉行/VC/O P Nc – 舉行 (template with gap ‘-’ )

  23. Answer Extraction from Template Question: 誰是台灣國防部長? Q-Type: PERSON Q-KEYWORD: 台灣 國防部長 Tagged Passages 前任/A/O 美國/Nc/LOC 國防部長/Na/OCC溫柏格/Nb/PER 認為/VE/O ,/COMMACATEGORY/O 美國/Nc/LOC 國防部長/Na/OCC柯恩/Nb/PER今天/Nd/O 表示/VE/O ,/COMMA/O 華府/Nc/ORG,LOC 當局/Na/O 正/D/O 設法/VF/O 釐清/VC/O 台灣/Nc/LOC 【/PAR/O 路透/Nb/ORG 東京/Nc/LOC 十九日/Nd/TIME 電/VC/ART 】/PAREN/O 台灣/Nc/LOC 國防部長/Na/OCC唐飛/Nb/PER昨天/Nd/O Template matching and Relation building Template: LOC OCC PER Relation: 美國, 國防部長, 溫柏格, 柯恩 台灣, 國防部長, 唐飛

  24. Answer Extraction from Template Question: 黛安娜王妃的死亡車禍事故發生在哪裡? Q-TYPE: LOCATION Q-KEYWORD: 黛安娜 王妃 死亡 車禍 事故 發生 Tagged Passages .. 則/D/O 把/P/O 英國/Nc/LOC 黛安娜/Nb/PER 王妃/Na/O 的/DE/O 巴黎/Nc/LOC 死亡/VH/O 車禍/Na/O ,/COMMA/O 搬上/VC/O 舞台/Na/O .. .. 英國/Nc/LOC 王妃/Na/O 黛安娜/Nb/PER 離開/VC/O人世/Nc/O 四個多月/Nd/TIME .. Template matching and Relation building Template: PER Na DE LOC – Na LOC Na PER - VC Relation: 黛安娜/PER, 王妃/Na, 巴黎/LOC, 車禍/Na 英國/LOC, 黛安娜/PER, 王妃/Na, 離開/VC

  25. Answer Ranking Features are combined as weighted sum Answer Ranking Features IR Score Answer Frequency (voting) * QFocus adjacency: “美國總統[布希]表示” “前往[惠氏藥廠]參觀” * Question Term and Answer Term (QAT) Co-occurrence * Answer Template

  26. Web 2.0 and Research • Human-based Computation • Folksonomy (Social Tagging) • Academic Data Analysis • GIO-Info

  27. Human-based Computation

  28. Human-based Computation • Social Search • wayfinding tools informed by human judgment • CAPTCHA • reversed Turing test (Turing test 是由人來詢問系統,這裡則是由系統來詢問使用者) • Interactive Genetic Algorithm (IGA) • a genetic algorithm informed by human judgment. • 由人工提供fitness function結果 • 例子:描繪罪犯畫像,系統以GA方式產生嫌犯畫像,目擊者負責評分看那個比較像,不斷重複過程直到接近罪犯樣子為止

  29. CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart • A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human. wikipedia SOURCE: http://recaptcha.net/

  30. blog blog blog CAPTCHA CAPTCHA CAPTCHA CAPTCHA Recognized text Unrecognized text

  31. a two-player game The goal is to guess what your partner is typing on each image. Once you both type the same word(s), you get scores. The ESP Game ESP Source: http://www.espgame.org/

  32. The Phetch Game Play as a describer

  33. The Phetch Game Play as a seeker Phetch

  34. How about a game for describing idioms? 罄竹難書: 壞事做太多 虎頭蛇尾: 做事沒有毅力 ……… 高抬貴手 不動如山 壞事做太多 罄竹難書 如沐春風

  35. Folksonomy (Social Tagging)

  36. Folksonomy (Social Tagging) • Also known as social tagging, collaborative tagging, social classification, social indexing • Folksonomy is the practice and method of collaboratively creating and managing tags to annotate and categorize content. Wikipedia

  37. del.icio.us Tags: Descriptive words applied by users to links. Tags are searchable My Tags: Words I’ve used to describe links in a way that makes sense to me

  38. Semantic Web Source: Tim Berners-Lee

  39. Using Folksonomy to Help Semantic Web • Top-down Semantic Annotation • Approach • Define an ontology first • Use the ontology to add semantic markups to web resources. • The semantics is provided by the ontology which is shared among different web agents and applications. • Problem • Negotiation • Evolution (hard to maintain) • High Barrier (background) Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”

  40. Using Folksonomy to Help Semantic Web • Bottom-up approach with social tagging • Advantage • No common ontology or dictionary are needed • Easy to access • Sensitive to information drift • Disadvantage • Ambiguity Problem: For example, “XP” can refer to either “Extreme Programming” or “Windows XP”. • Group Synonymy Problem: two seemingly different annotations may bear the same meaning. Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”

  41. Or Folksonomy is the Solution? • Ontology is Overrated • Classification of the web has failed • Classification itself is filled with bias and error • Tagging is the solution Source: http://www.shirky.com/writings/ontology_overrated.html

  42. Academic Data Analysis

  43. Academic Data Analysis Users participate and interact with data and people Add My Library, Tag Ex. Citeulike, BibSonomy Add Comments, Rating, Recommendation Ex. Techlens Domain Focus Groups Ex. Botanicus Arxiv e-Lib, Lib 2.0 concept adding into application, so search platform provide open API for collecting more data Google Scholar Windows Live Academic Search PudMed CiteSeer Citation index Papers , journal/conference, authors

  44. An Example • Let’s use an example of TechLen to imagine what research on IR /NLP can do. Authors Readers Papers

  45. Alfred V Aho Entities Aho, A. V. Alfred Aho AV Aho References Alfred Aho, John Hopcroft, Jeffrey Ullman Links AV Aho, BW Kernighan, PJ Weinberger G1 (Programming Languages) G2 (Databases) Entity Groups G3 (Algorithms) The Terminology

  46. Imagine how we can make use of them Papers Reference Extraction Entity Resolution Authors Rating Comments Readers

  47. New Research Topics • From those changes, key emerging challenge for “Data Mining” is tackling the problem of dealing with richly structured, finding patterns behind heterogeneous datasets, …, etc. • Several researches focus on those problem like • (Social) Network Analysis • Link Mining • PASCAL Ontology Learning Challenge • …

  48. Society Nodes: individuals (Authors, Readers) Links: social relationship (family/work/friendship/belong to,…etc.) S. Milgram (1967) Six Degrees of Separation, Science John Guare Social networks: Many individuals with diversesocial interactions between them. source: www.cs.uiuc.edu/~hanj

More Related