1 / 71

Information retrieval practice

Information retrieval practice. I. IR applications • Digital libraries • Citation analysis • Web searching II . Semantic web • Metadata • Explanation • Peer-to-peer IR III . IR careers. I. IR applications. Digital libraries came from the government

kalyca
Download Presentation

Information retrieval practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information retrieval practice I. IR applications •Digital libraries •Citation analysis • Web searching II. Semantic web • Metadata • Explanation • Peer-to-peer IR III. IR careers

  2. I. IR applications Digital libraries came from the government An NSF-ARPA-NASA Joint Initiative awarded ~$50 million to fund research projects developing new technologies for DLs in mid to late 1990s They defined DLs as a technological phenomenon The working definition: “storehouses of digital information available through the net” Goal: to advance the collection, organization and storage of digital information Also: make it available for searching, retrieval and processing via networks - in user-friendly ways

  3. I. IR applications Examples UC Berkeley DL projectelib.cs.berkeley.edu/UC Santa Barbara Alexandria DL Projectalexandria.sdc.ucsb.edu/ UIUC DL Initiative: Federating Repositories of Scientific Literaturedli.grainger.uiuc.edu University of Michigan DL Projectwww.si.umich.edu/UMDL/ Stanford University DL Projectdbpubs.stanford.edu:8091/diglib/ Carnegie Mellon InformediaVideo DLwww.informedia.cs.cmu.edu/

  4. I. IR applications Examples UC Berkeley Digital Library project elib.cs.berkeley.edu/ Goal: to develop technologies for intelligent access to massive, distributed non-text collections Photographs, satellite images, maps, full text and “multivalent” documents Provides public access to large environmental datasets (reports, image collections, maps, sensor data, GIS…) Social goal: to support “collaborative knowledge” work of distributed users

  5. I. IR applications UC Santa Barbara Alexandria Digital Library Project alexandria.sdc.ucsb.edu/ Goal: users access and manipulate geographically-referenced information in a distributed set of digital collections Generic user query: “What information is there about some phenomenon at a particular set of places?” “Distributed” means DLs components may be across the net, as well as coexisting on a single desktop “Geographic access” through geographic locations, as well as by other spatial characteristics of the information

  6. I. IR applications U of Illinois at Urbana-Champaign Digital Library Initiative: Federating Repositories of Scientific Literature dli.grainger.uiuc.edu Developing technologies and infrastructure to easily search technical documents on the net The DLI Testbed uses document structure to provide federated search across publisher collections This means building repositories (organized collections) of indexed multiple-source collections These are federated (merged and mapped) by searching the material via multiple views of a single virtual collection

  7. I. IR applications University of Michigan Digital Library Project www.si.umich.edu/UMDL/ This DL takes advantage of decentralization (geographic, administrative), rapid evolution, and flexibility of the web Goal: to provide an infrastructure that lets patrons and publishers work together within a single library Traditional emphasis on service and organized content This should occur despite the fact that the underlying structure is volatile, administratively decentralized, and constantly evolving “Agent” based IA supports large scale “information commerce” in a heterogeneous environment

  8. I. IR applications Stanford University Digital Libraries Projectdbpubs.stanford.edu:8091/diglib/ Goal: to provide access to a vast array of Web topics Focus on interoperability of networked information sources The heart of the DL is the “InfoBus” protocol testbed Provides uniform access heterogeneous information services and sources through “proxies” that translate between InfoBus and native protocols Experimenting with several different types of search interfaces

  9. I. IR applications Carnegie Mellon’s Informedia Digital Video Librarywww.informedia.cs.cmu.edu/ Goal: to improve search and discovery of digital videos An on-line digital video library with thousands of hours of edited video from public TV, news, special events Intelligent, automatic mechanisms allow full-content and knowledge-based search and retrieval via desktop computer and metropolitan area networks This will involve the integrated application of speech, language and image understanding technologies for efficient creation and exploration

  10. I. IR applications Search engines have their roots in information retrieval Simply becoming more powerful and amassing larger indexes is not enough Breakthroughs will come as researchers gain a deeper understanding of how we search for, evaluate, and use information Ex: tracking long-term interests so engines can refine their handling of new information requests Ex: using data mining techniques to discover patterns in search queries Mostafa, J. (2005). Seeking better web searches. Scientific American. January 24.

  11. I. IR applications How they work Content is continually identified, acquired, and indexed The system counts relevant words and establishes their importance using various statistical techniques A highly efficient data structure (tree) is generated from the relevant terms, which associates those terms with specific Web pages We submit a query and the completed tree (index) is searched, not individual Web pages Different engines use different methods to rank results 4.bp.blogspot.com/_3m65KNJ6jVQ/Sh6wslHReaI/AAAAAAAAA6E/N0awjSsV5hE/s320/search-engine-optimizing.jpg

  12. I. IR applications A study of the evolution of interdisciplinarity in library and information science: Using three bibliometricmethods The authors use direct citation, bibliographic coupling, and co-authorship analysis to determine whether LIS has become more or less interdisciplinary They find that we tend to cite ourselves although we draw on five other disciplines ~ Why do you cite literature from another discipline? ~ Why does it matter if LIS becomes more (or less) interdisciplinary?

  13. I. IR applications Uses three bibliometricmethods to study changes in LIS interdisciplinarity between 1978-2007 Direct citation, bibliographic coupling, co-authorship analysis Three data sets were extracted from the same group of articles in 10 selected LIS journals References in all articles References in articles with bibliographic coupling links Authors’ institutional affiliations in co-authored articles Chang, Y.W. and Huang, M.H. (2012). A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods. Journal of he American Society for Information Science and Technology, 63(1), 22-33

  14. I. IR applications Interdisciplinarity: scientific output or activity using knowledge, methods, and tools from two or more disciplines The amount of interdisciplinary literature and centers has increased Seen in citation behavior and co-authorship, which involves information transfer across disciplines Analyze the citations from different disciplines, the co- authorship by researchers from different disciplines, and the publishing of works within other disciplines

  15. I. IR applications Bibliographic coupling: two articles with at least one common reference Articles without bibliographic coupling were excluded Co-authorship: one type of social network indicating collaboration across disciplines Brillouin index was used measure interdisciplinarity The greater the degree of the interdisciplinarity of a discipline, the higher the index value Sample: n=1,536 articles 27,678 references for citation analysis; 8,906 for bibliographic coupling; 1,536 author affiliations

  16. I. IR applications Questions From which disciplines do LIS citations originate? From which disciplines do LIS articles with a bibliographic coupling originate? From which disciplines do the co-authors originate? Are there differences among the disciplines seen through (a) direct citation (b) bibliographic coupling links, and (c) from which the co-authors originate? Which disciplines have had the greatest impact on LIS? Has the degree of interdisciplinarity within LIS increased over time?

  17. I. IR applications Finding: We most frequently cite publications in our own discipline LIS was influenced by other disciplines, but the influence was much less than that of LIS itself The top 5 disciplines accounted for 75% of citations: general science, business/management, CS, and Soc Half of all co-authors are affiliated with LIS institutes It is more difficult for researchers to seek a partner from other fields than to cite literature in other fields The degree of interdisciplinarity within LIS has increased, particularly coauthorship

  18. I. IR applications Direct citations are distributed across 30 disciplines, but co-authors are distributed across 25 disciplines % of contribution to interdisciplinarity attributable to LIS show a decreasing tendency based on the results of direct citation and co-authorship analysis They show an increasing tendency based on those of bibliographic coupling analysis Implication: each method has its strength and provides insights respectively for viewing various aspects of interdisciplinarity No single bibliometric method can reveal all aspects of interdisciplinaritydue to its multifaceted nature

  19. I. IR applications Ordinary search engine users carrying out complex search tasks The authors describe a study of web searching based on a set of simple and complex tasks They found that the type types of tasks had very different characteristics and recommended that search engine developers could use these findings to improve contextual help ~ Does this make sense to you? Are your complex searches that different? ~ Do you want search engines to offer contextual help?

  20. I. IR applications Problem: the amount of web information overburdens users and impacts their experience Complex search is neither well-supported by current search engines nor well-researched Microsoft: many queries yield terrible satisfaction and only 25% are successful Assumption: we increasingly use search engines to make decisions that require aggregated information Goal: to find out what makes complex search distinct from simple search Singer, G., Norbisrath, U. and Lewandowski, D. (2012). Ordinary search engine users carrying out complex search tasks. Journal of Information Science, pre-print

  21. I. IR applications Complex search scenario Users often face a situation in which there is no simple answer to their information need Complex searching has subjective (open-ended, ambiguous) and objective (sub task) components Lab experiment Sample: n=56 carrying out 12 tasks: 7 simple, 5 complex) Search results were analyzed manually, compared with a sample solution and rated correct, having correct elements, not correct, user did not supply a solution

  22. I. IR applications Findings The average task time over the complex tasks was almost four times higher than the average over the simple ones The complex search tasks were carried out in a larger number of sessions than for simple ones Read time was three times as high for complex search tasks in comparison to simple tasks More queries were used and twice as many pages were examined There were 149 successfully carried out and 52 wrongly carried out complex tasks

  23. I. IR applications For complex search tasks, all time-based measurements are higher than for simple search task The larger average number of sessions in complex search tasks can be an indicator of users’ having difficulty in completing the tasks Long queries or questionsmay indicate uncertainty about the information need which may be an indicator for search engines to offer support Suggestion: search engine operators put more emphasis on the fact that complex search tasks have significantly different characteristics than simple ones

  24. I. IR applications Belkin and Croft describe information filtering as the processes involved in delivering information to people It is designed for unstructured and semi-structured textual and non-textual data Typically large amounts of data that are streaming into a repository Can be pushed and can be the result of a DB search Involves individual or group profiles Involves the removal of a subset of data from the stream with the person seeing what is left Belkin, and Croft (1992). Information filtering and information retrieval: two sides of the same coin? CACM 35(12), pp.29-38.

  25. I. IR applications A fine example of information filtering Gary Larsen is all over it! static.flickr.com/46/143296774_97691615cb.jpg

  26. I. IR applications An example of a filtering system cmc.dsv.su.se/select/information-filtering03.gif

  27. Content Service A label Parent selects rating method Service B label Publisher’s label Child using the net Label reading software I. IR applications PICS: another filtering scheme www.w3.org/pub/WWW/PICS/iacwcv2.htm

  28. I. IR applications The filtering process Submit URL For this user? At this time? This type of site? This type of file? Filter reviews request Is this site allowed? Yes No See page See denial page

  29. Information retrieval practice I. IR applications •Digital libraries • Citation analysis • Web searching II. Semantic web • Metadata • Explanation • Peer-to-peer IR III. IR careers

  30. II. Semantic web Goal: to transform the Web from a linked document repository into a distributed knowledge base and application platform To make web content more accessible to automated processes To transform the existing web into a set of connected applications forming a consistent logical web of data Tool: ontology languages capture knowledge that enables applications to understand web accessible resources and to use them more intelligently Horrocks, I. (2007). Semantic web: the story so far. Proceedings of the 2007 International Cross-disciplinary Conference on Web accessibility (W4A), 120-125.

  31. II. Semantic web For example OWL (web ontology language) Used for ontology development in fields as diverse as geography, geology, astronomy, agriculture, defense and the life sciences Can handle problems of ambiguity that makes searching frustrating Based on XML Adding semantic annotations that are defined in shared ontologies “Individuals” that have “properties” and are related to each other

  32. II. Semantic web What is it good for? The web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people For the web to scale, programs must be able to share and process data even when these programs have been designed totally independently It is a vision: having data defined and linked in a way that it can be used by machines … for automation, integration and reuse of data across various applications on the web W3C. (2001). Semantic Web: Introduction. www.w3.org/2001/sw/

  33. II. Semantic web It will allow us to realize the full potential of the web Machines move beyond the presentation of files to understanding and manipulate the meaning of files Agents acting for us will be able to carry out tasks based on the semantics of the files The ultimate goal of the semantic web: To design enabling technologies to support machine facilitated global knowledge exchange Structured machine-to-machine communication Berners-Lee, T., Hendler, J., & Lassila, O. (2001, May 17). The Semantic Web. Scientific American, 501. www2005.org/keynotes/

  34. II. Semantic web Machines will need access to structured collections of information Sets of inference rules to automate decision making and choose among courses of action A form of decentralized knowledge representation The semantic web adds logic to the web Will make use of ontologies, XML and RDF Also taxonomies of objects and relations among them Creating semantic tags to express that certain objects have certain properties with certain values getsemantic.com/img/getsemantic_400.png

  35. II. Semantic web An example of an ontology www.wiwi.hu-berlin.de/~berendt/lehre/2003s/kaw/Student_project/Ontology.gif

  36. II. Semantic web Personal Agent-based information transactions management Semantic web Digital libraries Collaborative research Business applications

  37. II. Semantic web Business applications: Highfleet.com: ontology-based deductive DBs and Semantic Federation of Legacy Databases www.highfleet.com Ontology: content and ontology management www.ontology.com Semafora Systems- industrial products for semantic software www.semafora-systems.com/index.php?id=135 Sandpiper Software -Visual Ontology Modeler is modeling and data integration software www.sandsoft.com/products.html

  38. II. Semantic web The architecture of the semantic web www.w3c.rl.ac.uk/.../ Dublin_Main/slide11-0.html

  39. II. Semantic web There are three main visions of the semantic web Universal library: retrieval supported by semantic metadata Knowledge navigator: enabling agent-based transactions A global knowledge base for agents to act upon Federated knowledge bases: heterogeneous data sources for specific uses: Knowledge based support domain specific applications Evaluated from rhetorical, theoretical, and pragmatic perspectives Marshall, C.C. and Shipman, F.M. (2003). Which semantic web? Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, 57-66

  40. II. Semantic web There is a convergence of DLs, museums and archives as memory organizations Technical and social overlap between semantic web and DLs It is critical to help memory organizations develop as key players in supporting the semantic web For example, there is a Semantic Web Advanced Development collaboration with the DSpace project Dspace is a joint project of MIT Libraries and HP Its goal is to develop a flexible digital archive designed capture and distribute intellectual output

  41. II. Semantic web Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach The authors argue that authorativemetada schemes will not be efficient in a semantic web environment They suggest that semantic metadata interoperability will best be achieved through a collaborative and user- driven approach ~ Why is the attempt to develop a single metadata standard misguided? ~ What are the advantages and drawbacks of crowdsourcing these standards

  42. II. Semantic web Metadata schemes tend to be hierarchical and authoritative supporting an expert controlled approach Ex: MARC, Dublin Core, MODS, DDC, and LC classification and Subject Headings Problem: do not take into account the diversity of cultural, linguistic and local perspectives of library users Change: folksonomies are in widespread use in popular web applications such as Wikipedia, Flickr, LibraryThing, Delicious, and Technorati and the use of Web 2.0 services Alemu, G., Stevens, B., Ross, P. (2012). Towards a conceptual framework for user-driven semantic metadata interoperability in digital libraries: A social constructivist approach, New Library World, 113(1), 38-54.

  43. II. Semantic web Libraries are trying to integrate user-driven and social media approaches, but there is no activity in the metadata domain Question: are standards driven metadata schemes and user driven social media approaches incompatible? Problem: an absence of theoretical metadata frameworks to understand the use of Web 2.0 services Suggestion: use a social constructivist approach when archiving information objects that require metadata This would reflect the diversity of perspectives held by users and could lead to semantic interoperability

  44. II. Semantic web Assumption: current metadata practices are authoritative and hierarchical and stem from a foundationalist objectivist ontological viewpoint There is a single correct solution because truth and meaning reside in objects independent of us Seen in controlled vocabularies, authority names, taxonomies, and classification systems Fixed categories updated at wide intervals mean the scheme will be outdated and missing context in time Challenge: Does the world make sense or do we make sense of the world?

  45. II. Semantic web Constructivism assumes meaningful reality is contingent on human practices and is developed and transmitted within social contexts There are many ways to structure the world, and many meanings or perspectives for any event or concept Social constructivist knowledge focuses on consensus and shared and negotiated meaning Recognizing the existence of multiple interpretations of an object impacts semantic metadata interoperability It accounts for differences in the interpretations of digital objects (information resources) among individuals, groups, countries and geographic regions

  46. II. Semantic web Assumption: the digital world is a radical break as a single object can be categorized in multiple ways Ontological classification fails when libraries deal with large collections because entities can be categorized into various branches in the taxonomic hierarchy A way out: involving users in the creation of metadata provides a richer metadata environment where diverse views are also reflected Should not try to force interoperability solutions around a single standard Fostering an approach that encourages diversity is more attuned to actual practice

  47. II. Semantic web Explanation in the Semantic Web: a survey of the state of the art The authors review literature about and discuss the importance of explanation in various web services They argue that the extent to which the sematic web succeeds will depend on the extant to which developers build in the ability of a service to explain its actions to users ~ How important is explanation to you wneh you use a web service? ~ How will having explanation available improve your web experience?

  48. II. Semantic web Semantic web applications use interconnected distributed data and inferential capabilities to compute results Explanation adds transparency to the results and enables user trust in the process The need for understanding why a system has failed to meet requirements has led to explanation facilities Early adopters built explanation into expert systems because without it, these systems had credibility problems, especially in safety critical domains Hasan, R. and Gandon, F. (2012). Explanation in the Semantic Web: a survey of the state of the art. Research Report 7974, Project-Team Wimmics

  49. II. Semantic web Semantic web apps cause a shift from answering queries by retrieving stored information to using inference As must provide explanation capabilities showing how results were obtained Factors Collaboration: interaction and sharing of knowledge between agents dedicated to solving a problem Managing provenance, trust and reputation Heterogeneous and loosely-coupled agents or systems components must be able to discover new services by accessing service descriptions

  50. II. Semantic web Autonomy: agent can autonomously decide which services to call and compose, which sources to use How to enhance a query with background knowledge, to provide more useful answers to user questions Use of ontologies: provide support for heterogeneous and distributed data integration and resolve inconsistencies of data from multiple sources Important for describing domain knowledge, problem areas, and user preferences for agents with reasoning capabilities Common vocabularies among multiple agents allowing description of agent communication protocols

More Related