1 / 21

Trends in Web Search and its relevance to Digital Libraries

Trends in Web Search and its relevance to Digital Libraries. Min-Yen Kan Web IR NLP Group (WING) National University of Singapore. Tips on Web Searching. Visualize results, then come up with multiple queries Use multiple search engines Advanced Search inurl:, site: “Phrasal search”

nitesh
Download Presentation

Trends in Web Search and its relevance to Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trends in Web Searchand its relevance to Digital Libraries Min-Yen Kan Web IR NLP Group (WING) National University of Singapore

  2. Tips on Web Searching • Visualize results, then come up with multiple queries • Use multiple search engines • Advanced Search • inurl:, site: • “Phrasal search” But that’s just general search… • Federated resources / Niche search engines 26 Sep 2008

  3. Site- and Task-specific resources • Site Prestige Know what others think and do • Google PageRank (Link structure), Alexa (Traffic) • Google Trends / Insight (Queries) • Social Searching (Web 2.0) The voice of the reader / critic • (Bookmarks / Tags) Del.icio.us, Citeulike.org, Bibsonomy.org • (News) Digg / Slashdot • (Blogs) Google Blog, Technorati • People Search: Finding public information on a person • Spock (web), Zabasearch (US only) • LinkedIn, Facebook • Must validate your sources http://labs.digg.com/arc/ 26 Sep 2008

  4. Expert Search Find people who will advocate on your behalf • What do they want? • Scholar: • Active? →Check their recent articles • Names common? → Define area of interest • Compare against peers • Download vs. citation counts • Patent search: • Referenced by: (citation count; different than scholar) • Identifying webfaced advocates: • Blog search, PageRank →Impact http://flickr.com/photos/phauly/ • How do machines do it? • Expert search task as benchmark test • Download web pages to analyze • Needed to deal with spam pages • Used PageRank to assess prestige 26 Sep 2008

  5. Revenue from print continually declining Students and researchers rely on internet Researchers want archiving rights – freedom of academic information Characteristics: Not zero-sum content Distribution is now largely the role of search engines →Necessitates new role of publisher and new revenue model Will classic models work? Advertising, Subscription, Transactional & Bundling Variants? Versioning (Varian), Moving window (JSTOR) Problem or opportunity? The game has fundamentally changed http://flickr.com/photos/danielbroche/ 26 Sep 2008

  6. Content is becoming free MIT / Stanford opening up textbooks Open access archiving → long term: content will not be primary revenue source eBook revenue hasn’t held up its promise yet… Device gap: iPhone and nextGen devices → Revenue may be further down the pipe + Academic publishers Connect to libraries and federations at institution level Individual customers are secondary Trusted source Expertise in copyediting, typesetting, project management, distribution, social networking Many individual web publishers rediscovering same problems → Consultancy model → Win-win partnerships with individual authors Forecasting 26 Sep 2008

  7. Social Content Wisdom of masses: Crowdsourcing Rich Media Open Source / Access Paradigmatic change Classifieds → Craigslist POTS →Skype CD store →iTunes Publishers → ?? Web Trends http://www.informationarchitects.jp/slash/iA_WebTrends_2007_2_1024_768.gif 26 Sep 2008

  8. Server centric User centric Where is research going? • Search API usage • Browser as computer • Web page structure, mining text data • Modeling web users at tasks: Exploring / Fact-finding • Personalization, recommending • Social networks • Understanding opinion • Query and log analysis http://flickr.com/photos/alisdair/ 26 Sep 2008

  9. WING@NUS Webfaced pop quiz – which is which? American Statistical Society World Scientific Springer courtesy:http://pagerank.si/ 26 Sep 2008

  10. Get advocates Make it easy to get individuals to insist to their institution to buy your materials Know who is accessing (not necessarily buying) your content Content revenue will continue to decline Find an economic model that works for you Work as partners in content creation Be savvy on trends Be visible: do “white hat” Search Engine Optimization (SEO) Make your abstracts indexable by others + Academic publishers Connect to libraries and federations at institution level Individual customers are secondary Trusted source Expertise in copyediting, typesetting, project management, distribution, social networking Many individual web publishers rediscovering same problems → Consultancy model → Win-win partnerships with individual authors Forecast: Know your strengths 26 Sep 2008

  11. Trends in Digital Libraries >> WING @ NUS • Expanding types of information in search • Automated tools for DLs • Usability in E-books and online media • User modeling • Personalization, annotation and relation to other user tasks http://flickr.com/photos/pathfinderlinden 26 Sep 2008

  12. Scholarly Digital Libraries • ForeCite: our scholarly DL • Data Cleaning • Slide and Document Alignment • Searching in the OPAC • Math Information Retrieval 26 Sep 2008

  13. ForeCite: Beyond the document as an item Server Client • A user-centric DL framework • Put author / reader functionality together • Tagging, correction, annotation and viewing • Automatic tools: keyphrases and sentence classification • For use on and offline, organizes local PDF files for you • Onlyneed your web browser 26 Sep 2008

  14. Addresses Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802 LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343 Products Honda Fix vs. Honda Jazz Apple iPod Nano 4GB vs. 4GB iPod nano 4GB Idea: use web as additional context for disambiguation and clustering Placed 3rd in Web People Search Task (WEPS 2007) Data Cleaning • Search results: • “Jeffrey D. Ullman” 384,000 pages • “Jeffrey D. Ullman” + “aho” 174,000 pages • “J. Ullman” 124,000 pages • “J. Ullman” + “aho” 41,000 pages • “Shimon Ullman” 27,300 pages • “Shimon Ullman” + “aho” 66 pages 45% 33% 0% 26 Sep 2008

  15. Slides and their relationship to documents Document in focus Slides in Focus 26 Sep 2008

  16. Searching in Libraries http://linc.comp.nus.edu.sg 26 Sep 2008

  17. Symbolic Information Search How do users want to search math materials? Our answer: Text-to-Expression Linking • Resolve text keywords to expressions • e.g., “Pythagorean Theorem”“a2+b2=c2” or “x2+y2=z2” • Reduce the need for expression input • Solves the notational variation problem Not quite right… 26 Sep 2008

  18. Conclusions • Consider us your research WING! • Trade data and problems for solutions and interns Meanwhile: • Use better search strategies • Practice white hat SEO • Identify webfaced advocates 26 Sep 2008

  19. References • Kahin and Varian (2000) Internet Publishing and Beyond • Towle et al. (2007) Electronic Books in the 2003-2005 Period, Pub Res Q 23:95-104 Photo Credits • Flickr Creative Commons Search Thanks to all of you for listening & my fellow WING group members 26 Sep 2008

  20. 26 Sep 2008

  21. Abstract • I will present trends in current academic research on web search anddigital libraries, and discuss their relevance to publishers and theireconomic model. With respect to the web, I will cover how searchengines are starting to specialize and use click through and ad datato improve relevance ranking. With respect to digital libraryresearch, I discuss my group's research at NUS on advancing thestate-of-the-art in scholarly digital libraries. I cover advances onhow we deal with data cleaning issues, and slide and equationretrieval and alignment. 26 Sep 2008

More Related