490 likes | 582 Views
Searching. Overview – providing a framework A bit of history tefkos@rutgers.edu ; http://comminfo.rutgers.edu/~tefko/. Central ideas. Searching is a complex, interactive process aimed at finding & retrieving relevant information
E N D
Searching Overview – providing a framework A bit of history tefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/ Tefko Saracevic
Central ideas • Searching is a complex, interactive process aimed at finding & retrieving relevant information • this, of course, raises a number of questions as to what do we mean by: • information • relevant • finding • retrieving • interaction • Modern searching has deep roots in historical attempts to deal with information explosion Tefko Saracevic
ToC Basics - definitions Complexity – elements involved Searching & search interaction Professional changes in searching A bit of history Conclusions Tefko Saracevic
1. Basics A few definitionsWe all know them, but sometimes we should think about them Tefko Saracevic
Information Generally Context of searching: several layers or strata Narrow: information as a property of the message(text, record, document, image …) Broader: as property of cognition - affects or changes the state of a mind Broadest: also connected to the expansive social context or horizon, such as culture, work, task, or problem-at-hand We must consider inf. not only as “message” but in its cognitive & contextual sense • “Information” has many meanings • depending on context • but it is universally well understood • it is a primitive concept – one does not have to explain it -other concepts, definitions are then built upon it • but many definitions on the Web From “informare” (Latin): to fashion, shape, or create, to give form to Tefko Saracevic
Oh well… Tefko Saracevic “Information is a difference that makes a difference.” Gregory Bateson "Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T. S. Eliot "With so much information now online, it is exceptionally easy to simply dive in and drown." Alfred Glossbrenner "The stone age was marked by man's clever use of crude tools; the information age, to date, has been marked by man's crude use of clever tools." Source Unknown
Relevant, relevance GenerallyMerriam Webster (2005) Information which is connected with a user’s (searcher’s, inquirer’s …) information need AND cognitive state as related to given task, or problem at hand & given affective state – motivation, intention … Relevance has always a connection: “to” In the context of searching: several layers or strata: “Relevant: having significant and demonstrable bearing on the matter at hand.” “Relevance: the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user.” SYNONYMS: pertinent, useful, of utility, germane, material, applicable, appropriate … Tefko Saracevic
Finding, findability Generally In the context of searching:information has to foundable Findability: (Morville, 2005) The quality of being locatable or navigable. The degree to which a particular object is easy to discover or locate. The degree to which a system or environment supports navigation & retrieval. • To realize, understand, or locate something especially by studying or observing • To make a special effort to gather something together or summon something up • To discover something or somebody after a search Tefko Saracevic
Information retrieval (IR) Generally In the context of searching: Searching of & retrieval from abstracting & indexing databases & services specialized databases & sites search engines directories, portals digital libraries OPACs reference resources and the like … All can be labeled as IR systems – that is what they do “Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).” Manning, Raghavan, & Schütze, (2008) Other definitions on the Web Tefko Saracevic
2. Complexity Elements involved in searching Tefko Saracevic
Searching is… (repeated) People … a complex process involving interaction & feedback between and among PEOPLE, INFORMATION, & TECHNOLOGY Information Technology Searching Tefko Saracevic
People People Users Professional searchers - you Experts with knowledge & competencies for performing effective & efficient searches in a variety of sources, systems & media searches may be done on behalf of people, institutions, tasks mediated searching searchers must follow ethical guidelines • Generally: • People who accesses & use an information system • In information retrieval (IR): • people with an information need that may be satisfied by a search of an IR system • End users: • people who use an IR system directly to retrieve information. Tefko Saracevic
Infor-mation Information Content Organization Ways & means by which the objects are organized to facilitate access & searching vocabularies (free, controlled), indexes, fields, abstracts, summaries, classification, clustering, links, sites … great many now created automatically e.g. terms extracted from texts many types of organization exist, more on the horizon essential for searching Objects that potentially may convey information as: • texts, documents, images recordings , sites … • often refered to as records • So far majority are texts & documents • even images & recordings are mostly labeled (tagged) with texts • Many systems collect them as to sources • e.g. journals, areas … Tefko Saracevic
Tech-nology Technology: two components or layers Hardware & software Systems Systems that handle information objects by: identifying, collecting, organizing, storing, managing providing access … & provide capabilities to search, retrieve, navigate, browse, We label them information retrieval (IR) systems • A variety of information & communication technologies • most importantly, includes networks • Software: many applications available • most are taken as given by users & searchers Again, the two are different things but are closely connected.Professional searchers need to know how to use both. IN THIS COURSE WE DEAL ONLY WITH SYSTEMS! Tefko Saracevic
3. Searching & search interactions Kinds, components, dynamics Tefko Saracevic
Interactionwith information there is no such thing as not to interact (yes, a double negative) General Various interactions Human-information interaction Human-human information interaction User-searcher interaction Human-computer interaction • We concentrate on behavior of people in the use of information embedded in systems, services, networks, and devices • More broadly & recently this also includes cooperative activities among dispersed people and resources Tefko Saracevic
Human-information interaction General Context of searching How & why people access information is highly dependent on the context of their interaction this context is influenced by a range of factors such as the time, place, and history of interaction the tasks motivating the interaction the technical possibilities of the information systems. (from Information Interaction in Context, 2008) • Information interaction is the process that people use to communicate & act reciprocally with an information system, particularly in relation to its content • It is a dynamic process mostly mediated by technology • involving feedback • reiteration; reciprocal action • evaluation Tefko Saracevic
Components in human-information interaction:a reiterative process with feedback Context Tefko Saracevic
Human-information interaction: Components defined Feedback Reiteration Tefko Saracevic
Role of searchers - you There is much more to searching than searching • Not only to do the searching, but also (or in addition) to assist, lead, instruct, help a user in • defining, specifying the problem, task at hand • particularly in terms of informational aspects & resources • articulating the information need – diagnosis • guide from possibly visceral to expressive to be searchable • formulating of question(s) – clarifying, defining concepts • translating into query(ies); choosing variations • evaluating responses; eliciting feedback to steer reiteration • guiding toward further action, resources on their own Tefko Saracevic
Human-human inf. interaction General Context of searching User-searcher interaction part of mediated searching searchers acting on behalf of other people or institutions On part of searcher involves user modeling: determining users’ inf. needs & requirements, & their characteristics as related to effective searches predated by reference interview Communication between or joint activity involving two or more people with a goal of obtaining or exchanging information • reciprocal action • goal directed • Most people still get most information from other people Tefko Saracevic
Human-computer interaction (HCI) General Context of searching Study & practice of using computers, particularly interfaces, in searching for & retrieval of information often concentrates on using particular information systems, interfaces & algorithms evaluation of the effectiveness & efficiency of interactions – algorithms, systems, interfaces HCI is the study & practice of interaction between people (users) and computers • relationship between humans and computers HCI is concerned with the design, evaluation & implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.( Association of Computing Machinery, Special Interest Group on Computer-human Interaction (SIG CHI)) Tefko Saracevic
Reiteration in searching ( copy from Hembrook et al. 2005) Tefko Saracevic
4. Professional changes Dramatic shifts & evolving directions Tefko Saracevic
Mediated searching • Of interest in librarianship for over a century • reference became a major component of library practice • With advent of information & communication technology mediated online searching became a major professional & research activity • even mainstream of many information centers • publications, conferences, inf. industry oriented to it • Searching, meaning mediated searching, became a big deal • well, we are teaching it for decades Tefko Saracevic
Information industry • Starting from early 1960’s an information industry developed dealing with computerized abstracting & indexing services & database available for searching • earliest ones were government sponsored (e.g. Medline), then transformed within professional organizations (e.g. Chemical Abstracts), then private industry (e.g. Dialog) • By 1970’s inf. industry became strong & global • Most databases & services from inf. industry were oriented toward professional searchers • who then offered searching to users in their institutions, companies, public – mediated searching Tefko Saracevic
Changes due to search engines • But, search engines have radically changed the way people search for information • mediated searching including reference questions, have declined drastically over the last decade • users became end users – searching for themselves • end user searching of search engines exploded globally • Reference questions drastically fallen off • between 1995 & 2006 reference transactions declined 54% in ARL libraries (source: Assoc. Research Libraries statistics) • Mediated searching followed – done much less than a decade ago Tefko Saracevic
General changes in library use • Libraries have added great many digital resources • including digital resources & databases for end user searching • As a result today's users have changed use of libraries • virtual use is skyrocketing while physical use is plummeting • users don’t vote anymore with their feet but their fingers • electronic transactions are growing rapidly • physically users are not in the library but library use is going up & up & up (again see ARL statistics) • We do not have statistics how many searches are done on databases available in libraries, but must be a LOT! Tefko Saracevic
Oh, well… “Many years ago, the esteemed Barbara Quint offered an estimate that Google answered as many reference queries in half an hour as all the reference librarians in the world did in 7 years.” Abram, S. (2008), Searcher, 16(8) I have no idea of the source of the statistics, or if they are right at all, but it seems OK “While they [users] may be absent they are not inactive. Networked electronic resources via library portals and the Internet have provided users with benefits that go far beyond anything available when physical use was the only alternative.” Martell, C. (2008), The Journal of Academic Librarianship, 34(5) Tefko Saracevic
Web & changes in inf. industry • Web changed architecture & orientation of many databases & changed inf. industry in a big, big way • old databases restructured significantly e.g. Web of Science • new databases emerged - some very large e.g. Scopus • aggregators or publishers of journals became databases for searching – e.g. EBSCOhost, Wilson • they went with great gusto after end users • and with it after a much bigger & different market • Now libraries & inf. centers buy time-based licenses from databases for access to their users • e.g. RUL provides access to close to 300 databases in every field Tefko Saracevic
Changes for searchers - you • Searchers are now also involved with licenses, library Web systems, & access provision, plus: • New orientation & services emerged & are still being developed, refined (as already mentioned in previous lecture): knowledge navigation - supporting the user in locating and retrieving relevant information in the global information environment cooperative searching – with users & projects source recommendation – acting as recommenders source evaluation – assessing value, quality & suitability impact investigation – search for evaluative data of use in assessing outputs & impacts of research, institutions, researchers … user assistance and training - incl. information literacy • But no matter what you still have to master searching Tefko Saracevic
5. A bit of history A short chronology rather than history Tefko Saracevic
Antecedents • Europe before WWII: • strong documentation movement • Universal Decimal Classification, indexing of scientific literature, utilitarian integration of technology & technique toward social goals • In the US right after WWII concern about information explosion, particularly in science • Vannevar Bush’s classic article “As we may think” in Atlantic Monthly in 1945 stirred imagination & funding • problem: “the massive task of making more accessible a bewildering store of knowledge.” • solution: use of new technology, suggested a machine named “Memex” as idealized model • Technological imperative became a norm for solving inf. explosion problems – followed to this day Tefko Saracevic
Beginnings • National Science Foundation (NSF) act of 1950(& later amendments) mandated support for scientific & technical information (STI) for effective use • from the start in 1950s to this day NSF supports research & development in this area, including digital libraries • now through Division of Information & Intelligent Systems (IIS) • sparked involvement of many fields; many projects were funded • Other government agencies got involved • e.g. National Institutes of Health in supporting mechanization of the National Library of Medicine to Medline & now MedlinePlus • Other governments, first in Europe, USSR, and later globally started supporting similar activities Tefko Saracevic
Information as strategic resource • Key idea in providing support for STI activities from the end of Second World War to this date: • effective dissemination of information considered of strategic value for progress in science & technology • Spread to all other fields & human endeavors • Bedrock of information industry • Searching fits right in there: • affected importance & increase of online searching as a professional activity • affected spread of searching to wide populace Tefko Saracevic
Idea of information as strategic resource Affected evolution of information age • global economy's shift in focus away from the production of physical goods (as exemplified by the industrial age) and towards the manipulation of information And information society • in which the creation, distribution, diffusion, use, integration and manipulation of information is a significant economic, political, and cultural activity Tefko Saracevic
Information scienceInformation retrieval • 1951 Calvin Mooers coined term “information retrieval” (IR) to label a burgeoning activity • by mid 1950’s computerized IR systems emerged & later proliferated fast in many fields even outside of science & globally • among others, their searching became a professional activity • Societies and conferences proliferated globally related to problems of IR and broader issues of information science • e.g. very influential 1958 International Conference on Scientific Information(with really great Proceedings) Tefko Saracevic
IR research • From the 1960’s & onwards Gerald Salton & his students in computer science pioneered research into advanced IR methods • addressed technical or system side of IR • great many good results over decades • but it took decades before results applied commercially • today all vendors & search engines use it • IR research continues to this day internationally • particularly under TREC (Text Retrieval Conference) • and reported by Special Interest Group on IR (SIGIR) • Research and IR are still closely connected • source of advances, but now also proprietary Tefko Saracevic
Research (cont.) • 1970s & 80s also saw emergence of research dealing with the human (user) side of IR • addressed users, use of information & IR systems • basic notions, such as relevance • In the 1990’s till present growing research in areas: • interaction in IR, or human-computer interaction • human information behavior (Wilson, 2000) • information seeking & searching (Bates, 2002) • Human and system side of research do not mesh well • still & unfortunately Tefko Saracevic
Onto the real world • 1960s saw computer applications for IR blossoming • also library automation emerged, incl. MARC(go to RUL then ERIC to retrieve the report) • Late 1960’s: Medline, the online version of MEDLARS (National Library of Medicine) came out • this was online way before the Internet & the Web through commercial time-sharing networks, such as Tymnet & Telenet • Professional searching became firmly established • grew at high rate • most access for users was through mediated searching • but end user searching grew slowly Tefko Saracevic
Onto the real world (cont.) • Early 1970’s: Dialog and ORBIT established – large commercial online vendors • Dialog after a number of changes in owners is still in business; ORBIT later merged with other vendors & disappeared • they provided online access to an ever growing number of databases – became information supermarkets • later joined by a number of other vendors more specialized • e.g LexisNexis, STN,EBSCOhost, CSA, etc. etc. • or new giants, such as Scopus (already mentioned; link is to an overview) • Magazines, such as Information Today & Searcher dutifully record & comment on what is going on in information industry & the profession Tefko Saracevic
the Net • Internet first went live in 1969 as ARPANET, an inter-university net • in 1983 TCP/IP protocol was adopted, free & still in use globally today – i.e. present Internet was born • in 1986 NSFnet was created, broadening reach significantly • in 1995 NSF pulled out & offered to broad public & commercial use • Internet infrastructure is now provided commercially • By 1980s it became a force • by 1990’s it took the world • Internet has a colorful history(from the Internet Society) • timeline shows rapid growth & development Tefko Saracevic
WWW Tefko Saracevic • In 1991 Tim Berners-Lee invented the World Wide Web – a hypermedia initiative for global information sharing • in 1993 first Web browser was developed by Marc Andreessen - Mosaic to become Netscape • it popularized the Web • WWW became the fastest growing & spreading technology in history • Search engines • Yahoo launched in 1993 & Google in 1999 • affected searching enormously • today over 3000 search engines in over 150 countries • but a few large ones dominate in every market e.g Baidu in China
Digital libraries • Emerged in mid 1990s • Since then involved • massive research & development programs • e.g the National Science Digital Library (NSDL) • massive investments by libraries • changed the library landscape • particularly as to access & searching • for most libraries digital library portions of budget skyrocketed • Brought together IR & libraries • Today vast international presence • many institutions in addition to libraries involved • e.g. museums, societies, professional organizations Tefko Saracevic
Digital libraries and searching Tefko Saracevic • Major resource for searchers • large variety of texts, images, sounds digitized all over the world • rich source of many (and many unusual) resources not found through databases or the Web • At the same time major headache for searching • search mechanisms not well developed & integrated • federated searching (covering multiple databases at once) still in infancy • e.g. RUL has close to 300 databases (see under Research resources – Indexes & databases) yet almost all have to be searched individually • at RUL federated searching through Searchlight can be done on 8 databases only
6. Conclusions A few parting thoughts on changes Tefko Saracevic
New world for searching • Everybody is a searcher now • searching is a mass sport • whoever has a computer or other communication devices also searches • however few do it well • even fewer can assess how well they are doing • horror stories abound • Search engines are constantly enlarging & refining their reach, coverage, specialization (e.g. Google Scholar) • But still the major flaw: Web is value neutral • diamonds & rubbish, true & untrue, good & evil are all equal Tefko Saracevic
Opening for searchers - YOU(and libraries & information centers) • New opportunities & challenges • They are providing value added services • and could so even more • Connecting in different ways with users • Their basic worth: TRUST – that is where ethics play a major role PROFESSIONAL COMPETENCE – that is where your life long education plays a major role • This whole course is just a beginning Tefko Saracevic
Searching is human Tefko Saracevic