510 likes | 668 Views
Information Science 2005 . Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http://www.scils.rutgers.edu/~tefko. Information science: a short definition.
E N D
Information Science 2005 Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http://www.scils.rutgers.edu/~tefko © Tefko Saracevic
Information science: a short definition “the science dealing with the efficient collection, storage, and retrieval of information” Webster © Tefko Saracevic
Organization of presentation • Big picture – problems, solutions, social place • Structure – main areas in research & practice • Technology – information retrieval – largest part • Information – representation; bibliometrics • People – users, use, seeking, context • Digital libraries – whose are they anyhow? • Paradigm shift – distancing of areas • Conclusions– big questions for the future © Tefko Saracevic
Scope • Evolution and state of the field in the last decade of the old and first decade of the new century © Tefko Saracevic
The big pictureProblems addressed • Bit of history: Vannevar Bush (1945): • Defined problem as “... the massive task of making more accessible of a bewildering store of knowledge.” • Problem still with us & growing © Tefko Saracevic
… solution • Bush suggested a machine: “Memex ... association of ideas ... duplicate mental processes artificially.” • Technological fix to problem • Still with us: technological determinant © Tefko Saracevic
At the base of information science:Problem Trying to control content in • Information explosion • exponential growth of information artifacts, if not of information itself PLUS today • Communication explosion • exponential growth of means and ways by which information is communicated, transmitted, accesses, used © Tefko Saracevic
applying technology to solving problems of effective use of information BUT: from aHUMAN & SOCIAL and not only TECHNOLOGICAL perspective technological solution, BUT … © Tefko Saracevic
People Information Technology or a symbolic model © Tefko Saracevic
Problems & solutions: SOCIAL CONTEXT • Professional practice AND scientific inquiry related to: Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information • Taking advantage of modern information technology © Tefko Saracevic
or as White & McCaine put it: “modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand.” © Tefko Saracevic
Elaboration • Knowledge records = texts, sounds, images, multimedia, web ... ‘literature’ in given domains • content-bearing structures – central to information science • Communication = human-computer-literature interface • study of information science is the interface between people & literatures • Information need, seeking, and use = reason d'être • Effectiveness = relevance, utility © Tefko Saracevic
General characteristics • Interdisciplinarity - relations with a number of fields, some more or less predominant • Technological imperative - driving force, as in many modern fields • Information society - social context and role in evolution - shared with many fields © Tefko Saracevic
StructureComposition of the field • As many fields, information science has different areas of concentration & specialization • They change, evolve over time • grow closer, grow apart • ignore each other, less or more © Tefko Saracevic
most importantly different areas… • receive more or less in funding & emphasis • producing great imbalances in work & progress • attracting different audiences & fields • this includes • vastly different levels of support for research and • huge commercial investments & applications © Tefko Saracevic
Information or People or How to view structure? by decomposing areas & efforts in research & practice emphasizing Technology © Tefko Saracevic
Part 3. Technology • Identified with information retrieval (IR) • by far biggest effort and investment • international & global • commercial interest large & growing © Tefko Saracevic
Information Retrieval – definition & objective “ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...” Calvin Mooers, 1951 • How to provide users with relevant information effectively? For that objective: 1. How to organize information intellectually? 2. How to specify the search & interaction intellectually? 3. What techniques & systems to use effectively? © Tefko Saracevic
Streams in IR Res. & Dev. 1.Information science: • Services, users, use; • Human-computer interaction; • Cognitive aspects 2. Computer science: • Algorithms, techniques • Systems aspects 3. Information industry: • Products, services, Web • Market aspects • Problem: • relative isolation – discussed later © Tefko Saracevic
Contemporary IR research • Now mostly done within computer science • e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM) • Spread globally • e.g. major IR research communities emerged in China, Korea, Singapore • Branched outside of information science - “everybody does information retrieval” • data mining, machine learning, natural language processing, artificial intelligence, computer graphics … © Tefko Saracevic
Text REtrieval Conference (TREC) • Started in 1992, now probably ending • “support research within the IR community by providing the infrastructure necessary for large-scale evaluation” • Methods • provides large test beds, queries, relevance judgments, comparative analyses • essentially using Cranfield 1960’s methodology • organized around tracks • various topics – changing over years © Tefko Saracevic
TREC impact • International – big impact on creating research communities • Annual conferences • report. exchange results, foster cooperation • Results • mostly in reports, available at http://trec.nist.gov/ • overviews provided as well • but, only a fraction published in journals or books © Tefko Saracevic
Genomics with 4 sub tracks HARD (High Accuracy Retrieval from Documents) Novelty (new, nonredundant information) Question answering Robust (improving poorly performing topics) Terabyte (very large collections) Web track Previous tracks: ad-hoc (1992-1999) routing (92–97) interactive (94-02) filtering (95-02) cross language (97-02) speech (97-00) Spanish (94-96) video (00-01) Chinese (96-97) query (98-00) and a few more run for two years only TREC tracks 2004103 groups from 21 countries © Tefko Saracevic
Broadening of IR – ever changing, ever new areas added • Cross language IR (CLIR) • Natural language processing (NLP IR) • Music IR (MIR) • Image, video, multimedia retrieval • Spoken language retrieval • IR for bioinformatics and genomics • Summarization; text extraction • Question answering • Many human-computer interactions • XML IR • Web IR; Web search engines • DB and IR integration – structured and unstructured data © Tefko Saracevic
Commercial IR • Search engines based on IR • But added many elaborations & significant innovations • dealing with HUGE numbers of pages fast • countering spamming & page rank games – adversarial IR • never ending combat of algorithms • Spread & impact worldwide • about 2000 engines in over 160 countries • English was dominant, but not any more © Tefko Saracevic
Commercial IR: brave new world • Large investments & economic sector • hope for big profits, as yet questionable • Leading to proprietary, secret IR • also aggressive hiring of best talent • new commercial research centers in different countries (e.g. MS in China) • Academic research funding is changing • brain drain from academe © Tefko Saracevic
IR successfully effected: • Emergence & growth of the INFORMATION INDUSTRY • Evolution of IS as a PROFESSION & SCIENCE • Many APPLICATIONS in many fields • including on the Web – search engines • Improvements in HUMAN - COMPUTER INTERACTION • Evolution of INTEDISCIPLINARITY IR has a long, proud history © Tefko Saracevic
Part 4. Information • Several areas of investigation; • as basic phenomenon – not much progress • measures as Shannon's not successful • concentrated on manifestations and effects • information representation • large area connected with IR, librarianship • metadata • bibliometrics • structures of literature Covered in separate lectures © Tefko Saracevic
Part 5. People • Professional services • in organization – moving toward knowledge management, competitive intelligence • in industry – vendors, aggregators, Internet, • Research • user & use studies • interaction studies • broadening to information seeking studies, social context, collaboration • relevance studies • social informatics © Tefko Saracevic
User & use studies • Oldest area • covers many topics, methods, orientations • many studies related to IR • e.g. searching, multitasking, browsing, navigation • Branching into Web use studies • quantitative & qualitative studies • emergence of webmetrics © Tefko Saracevic
Interaction • Traditional IR model concentrates on matching not user side & interaction • Several interaction models suggested • Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model • hard to get experiments & confirmation • Considered key to providing • basis for better design • understanding of use of systems • Web interactions a major new area © Tefko Saracevic
Information seeking • Concentrates on broader context not only IR or interaction, people as they move in life & work • Based on concept of social construction of information • Most active area, particularly in Europe, with annual conferences © Tefko Saracevic
Information seeking Sampling of theories, models • Why people seek information: • Taylor’s stages of information need • Dervin’s Sense-Making – gap, bridge • Belkin’s Anomalous State of Knowledge • Chatman’s life in the round – inf. poverty • How people seek information: • Wilson’s General Model of inf. seeking • Bates’ berrypicking – acts in searching • Kuhlthau’s information search process • Chang’s browsing model • Benoit’s communicative action - Habermas © Tefko Saracevic
Part 7. Paradigm split in technology - people • Split from early 80’s to date into two orientations • System-centered • algorithms, TREC • continue traditional IR model • Human-(user)-centered • cognitive, situational, user studies • interaction models, some started in TREC • These became almost separate universes – one based in computer science, the other in information science & libraianship © Tefko Saracevic
Critiques, cultures • Number of critiques (e.g. Dervin & Nilan) about isolated systems approach • calls for user-centered approaches, designs & evaluation • But user-centered studies did not deliver very useful design pointers, guides • Very different cultures: • computer science has own, more science & technology oriented • information science more humanities oriented • C.P. Snow’s two cultures © Tefko Saracevic
Human vs. system • Human (user) side: • often highly critical, even one-sided • mantra of implications for design • but does not deliver concretely • System side: • mostly ignores user side & studies • ‘tell us what to do & we will’ • Issue NOT H or S approach • even less H vs. S • but how can H AND S work together • major challenge for the future © Tefko Saracevic
Reconciliation? • Several efforts to provide human-centered design • but more discussion than real application • Integration of information seeking and information retrieval in context (Ingwersen & Järvelin) • Research & development toward • using search context, improving user search experiences & search quality • machine learning, incorporating semantics © Tefko Saracevic
Funding • Most funding goes toward systems side & computer science • most (very large %) support for system work • In the digital age support is for digital • True globally © Tefko Saracevic
Digital librariesLARGE & growing area • “Hot” area in R&D • a number of large grants & projects in the US, European Union, & other countries up to now; • will it continue? It is not growing • but “DIGITAL” big & “libraries“ small • “Hot” area in practice • building digital collections, hybrid libraries, • many projects throughout the world • growing at a high rate © Tefko Saracevic
Technical problems • Substantial - larger & more complex than anticipated: • representing, storing & retrieving of library objects • particularly if originally designed to be printed & then digitized • operationally managing large collections - issues of scale • dealing with diverse & distributed collections • interoperability • assuring preservation & persistence • incorporating rights management © Tefko Saracevic
Digital Library Initiatives in the US (DLI) • Research consortia under National Science Foundation • DLI 1: 1994-98, 3 agencies, $24M, six large projects • DLI 2: 1999-2006, 8 agencies, $60+M, 77 large & small projects in various categories • ‘digital library’ not defined to cover many topics & stretch ideas • not constrained by practice © Tefko Saracevic
European Union • DELOS Network of Excelence on Digital Libraries • many projects throughout European Union • heavily technological • many meetings, workshops • resembles DLIs in the US • well funded, long range © Tefko Saracevic
Research issues • understanding objects in DL • representing in many formats • non-textual materials • metadata, cataloging, indexing • conversion, digitization • organizing large collections • federated searching over distributed (various) collections • managing collections, scaling • preservation, archiving • interoperability, standardization • accessing, using, © Tefko Saracevic
DL projects in practice • Heavily oriented toward a variety of institutions – primarily libraries • but also museums, professional societies, specific domains, etc etc • Main orientation: institutional missions, contexts, finances • sustainability, preservation in real world • managing growth, rights, access © Tefko Saracevic
Agendas • Most DL research agenda is set from top down • from funding agencies to projects • imprint of the computer science community's interest & vision • Most DL practice agendas are set from bottom up • from institutions, incl. many libraries • imprint of institutional missions, interests & vision • providing access to specialized materials and collections from an institution (s) that are otherwise not accessible • covering in an integral way a domain with a range of sources © Tefko Saracevic
Connection? • DL research & DL practice presently are conducted • mostly independent of each other, • minimally informing each other, • & having slight, or no connection • Parallel universes with little connections & interaction © Tefko Saracevic
ConclusionsIS contributions • IS effected handling of inf. in society • Developed an organized body of knowledge & professional competencies • Applied interdisciplinarity • IR reached a mature stage • IR penetrated many fields & human activities • Stressed HUMAN in human-computer interaction © Tefko Saracevic
Challenges • Adjust to the growing & changing social & organizational role of inf. & related inf. infrastructure • Play a positive role in globalization of information • Respond to technological imperative in human terms • Respond to changes from inf. to communication explosion - bringing own experiences to resolutions, particularly to the INTERNET • Join competition with quality • Join DIGITAL with LIBRARIES © Tefko Saracevic
Juncture • IS is at a critical juncture in its evolution • Many fields, groups ... moving into information • big competition • entrance of powerful players • fight for stakes • To be a major player IS needs to progress in its: • research & development • professional competencies • educational efforts • interdisciplinary relations • Reexamination necessary © Tefko Saracevic