1 / 16

On retrieval system theory

On retrieval system theory. Stephen Robertson Microsoft Research Cambridge. 4 July 2011. ISKO Conference, UCL. 1. [By way of background]. In 1967, I finished my first degree and started on the MSc in Information Science at City U. I still have a few of the books I bought then

xexilia
Download Presentation

On retrieval system theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On retrieval system theory Stephen Robertson Microsoft Research Cambridge 4 July 2011 ISKO Conference, UCL 1

  2. [By way of background] In 1967, I finished my first degree and started on the MSc in Information Science at City U. I still have a few of the books I bought then …of which one is Brian’s On Retrieval System Theory This must have impressed me … because I went on to make Retrieval System Theory my entire career  and also because of my use of the word On (the title of this talk should really have been: On ‘On retrieval system theory’) 4 July 2011 ISKO Conference, UCL 2

  3. [By way of background] On completing my MSc, I joined the Aslib Research Department • then headed by Brian Started a part-time PhD with Bertie Brookes here at UCL then got a research fellowship to hold at UCL • coincidentally, Brian moved to UCL at exactly the same time Thus Brian was my HoD for 5 + 5 years, from 1968 to 1978 4 July 2011 ISKO Conference, UCL 3

  4. The book The phrase information retrieval was coined in 1950 by Calvin Mooers … and Brian’s book On Retrieval System Theory came out in 1961 (his third book in 3 years) This talk is in some sense a retrospective review of and commentary on ORST 4 July 2011 ISKO Conference, UCL 4

  5. Chapters The Scope of this Study The Analysis of Retrieval Systems The Description of Documents Descriptor Languages Structural Models File Organisation and Coding Search Procedures The Automation of Storage and Retrieval Purpose, Parameters and Performance The Terminology of Retrieval 4 July 2011 ISKO Conference, UCL 5

  6. On theory in IR “There is as yet no unified theory of retrieval systems” (from introduction) [a statement I have made many times myself] “… and a good deal of retrieval practice is still an empirical art, unsullied by theory” • suggests that there may one day be one (but maybe we should expect IR to be a mongrel field) 4 July 2011 ISKO Conference, UCL 6

  7. Science versus technology Technology: a discourse or treatise on an art or arts; the scientific study of the practical or industrial arts, practical arts collectively – shorter OED Science: understanding how the world is greatest achievement of a scientist: a law Technology: changing the world ditto of a technologist: a way to do the impossible 4 July 2011 ISKO Conference, UCL 7

  8. On theory in IR Theory: theoretical generalisation, principles diagram to represent the components of the IR system use of ‘tool subjects’ from Communication Sciences “retrieval is a form of human communication” classification of CS • but use with caution 4 July 2011 ISKO Conference, UCL 8

  9. Communication sciences • Mathematical foundations • Probability theory, Information theory, Game theory, Numerical analysis, Mathematical logic • Communication processes • Non-living systems • Data processing, Computation, Automata • Electrical communications • Servo-mechanisms and control • Living systems • Linguistics • Neurophysiology, Experimental psychology • Group behaviour • Dynamics of large systems • Linear programming 4 July 2011 ISKO Conference, UCL 9

  10. Emphasis 3 chapters on (essentially) indexing • doc description, languages, structure against one on search We might see the intervening half-century as • de-emphasising the former (now we just throw in the entire document plus whatever else we can find out about it, use NL, ignore structure) • developing the latter 4 July 2011 ISKO Conference, UCL 11

  11. On automation Must remind ourselves: automation ≠ computers punched cards, plus “experimentally at least” paper tape, mag tape / drums / disks, photographic film automation = storage media?? (Google = a very large bank of disks!) in the context, medium is a major factor Automation is necessary and desirable depends on abstraction Suggested abstractions are interesting • if (inevitably) a little dated 4 July 2011 ISKO Conference, UCL 12

  12. On automation: abstractions Original text v. Manipulable text original text is a physical object • not machine readable manipulable text requires a series of transformations • reduction to “informative statements” • selection of some of these • standardised representation • filing for search 4 July 2011 ISKO Conference, UCL 13

  13. On automation: abstractions Search queries also need transformation but humans – librarians – are expected to be needed “The requester cannot be expected to translate his own question” Nevertheless, many of these steps are potentially mechanisable analysis is based on existing systems • may be “unnecessarily complicated” • hint that HP Luhn’s statistical approaches might provide an alternative 4 July 2011 ISKO Conference, UCL 14

  14. On statistics Contrast ORST with a contemporary view: ICTIR (int conf on the Theory of Information Retrieval) Rather little overlap between them One connection: statistics ICTIR 2009: quantum models [statistical] language modelling regression divergence from randomness belief models… and pervasive notions from Machine Learning ORST: scattered suggestions about the power of statistics references to HP Luhn, mention of ME Maron 4 July 2011 ISKO Conference, UCL 15

  15. A mongrel field? Should we look for a unified theory of IR? … or accept a diversity of sources of theory? A field of engineering or technology should look for a variety of theoretical insights and ways of thinking about the phenomena under investigation (I think Brian would agree) There are nine and sixty ways of constructing tribal lays, And every single one of them is right! – Rudyard Kipling 4 July 2011 ISKO Conference, UCL 16

  16. Brian’s subsequent work Characterised by a broadening view as indicated by subsequent booktitles: • Information Retrieval Techniques • Information Systems • Information Science in Theory and Practice … in contrast with the field of information retrieval which has become more narrowly focussed … perhaps too much so (as would be argued in the Information Seeking world) Brian never lost sight of users 4 July 2011 ISKO Conference, UCL 17

More Related