1 / 29

CS 430: Information Discovery

CS 430: Information Discovery. Lecture 22 Non-Textual Materials 2. Course Administration. Multimedia 3: Geospatial Information. Example: Alexandria Digital Library at the University of California, Santa Barbara • Funded by the NSF Digital Libraries Initiative since 1994.

deddie
Download Presentation

CS 430: Information Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 430: Information Discovery Lecture 22 Non-Textual Materials 2

  2. Course Administration

  3. Multimedia 3: Geospatial Information Example: Alexandria Digital Library at the University of California, Santa Barbara • Funded by the NSF Digital Libraries Initiative since 1994. • Collections include any data referenced by a geographical footprint. terrestrial maps, aerial and satellite photographs, astronomical maps, databases, related textual information • Program of research with practical implementation at the university's map library

  4. Alexandria User Interface

  5. Alexandria: Information Discovery • Metadata for information discovery • Coverage: geographical area covered, such as the city of Santa Barbara or the Pacific Ocean. • Scope: varieties of information, such as topographical features, political boundaries, or population density. • Latitude and longitude provide basic metadata for maps and for geographical features.

  6. Gazetteer • Gazetteer: database and a set of procedures that translate representations of geospatial references: • place names, geographic features, coordinates • postal codes, census tracts • Search engine tailored to peculiarities of searching for place names. • Research is making steady progress at feature extraction, using automatic programs to identify objects in aerial photographs or printed maps -- topic for long-term research.

  7. Alexandria: Computer Systems and User Interfaces • Computer systems • Digitized maps and geospatial information -- large files • Wavelets provide multi-level decomposition of image • -> first level is a small coarse image • -> extra levels provide greater detail • User interfaces • Small size of computer displays • Slow performance of Internet in delivering large files • -> retain state throughout a session

  8. Collection-Level Metadata Collection-level metadata is used to describe a group of items. For example, one record might describe all the images in a photographic collection. Note: There is a way to add collection-level metadata records to Dublin Core. However, a collection is not a document-like object.

  9. Collection-Level Metadata

  10. Example 4: Collections -- Finding Aids and the EAD Finding aid • A list, inventory, index or other textual document created by an archive, library or museum to describe holdings. • May provide fuller information than is normally contained within a catalog record or be less specific. • Does not necessarily have a detailed record for every item. The Encoded Archival Description (EAD) • A format (XML DTD) used to encode electronic versions of finding aids. • Heavily structured -- much of the information is derived from hierarchical relationships.

  11. Automatic Creation of Surrogates for Non-textual Materials Discovery of non-textual materials usually requires surrogates • How far can these surrogates be created automatically? • Automatically created surrogates are much less expensive than manually created, but have high error rates. • If surrogates have high rates of error, is it possible to have effective information discovery?

  12. Example 5: Informedia Digital Video Library Collections: Segments of video programs, e.g., TV and radio news and documentary broadcasts. Cable Network News, British Open University, WQED television. Segmentation: Automatically broken into short segments of video, such as the individual items in a news broadcast. Size: More than 4,000 hours, 2 terabyte. Objective:Research into automatic methods for organizing and retrieving information from video. Funding: NSF, DARPA, NASA and others. Principal investigator: Howard Wactlar (Carnegie Mellon University).

  13. Informedia Digital Video Library History • Carnegie Mellon has broad research programs in speech recognition, image recognition, natural language processing. • 1994. Basic mock-up demonstrated the general concept of a system using speech recognition to build an index from a sound track matched against spoken queries. (DARPA funded.) • 1994-1998. Informedia developed the concept of multi-modal information discovery with a series of users interface experiments. (NSF/DARPA/NASA Digital Libraries Initiative.) • 1998 - . Continued research and commercial spin-off (which failed).

  14. The Challenge A video sequence is awkward for information discovery: • Textual methods of information retrieval cannot be applied • Browsing requires the user to view the sequence. Fast skimming is difficult. • Computing requirements are demanding (MPEG-1 requires 1.2 Mbits/sec). Surrogates are required

  15. Multi-Modal Information Discovery • The multi-modal approach to information retrieval • Computer programs to analyze video materials for clues • e.g., changes of scene • methods from artificial intelligence, e.g., speech recognition, natural language processing, image recognition. • analysis of video track, sound track, closed captioning if present, any other information. • Each mode gives imperfect information. Therefore use • many approaches and combine the evidence.

  16. Informedia Library Creation Speech recognition Image extraction Natural language interpretation Segments with derived metadata Text Audio Video Segmentation

  17. Informedia: Information Discovery Segments with derived metadata User Querying via natural language Browsing via multimedia surrogates Requested segments and metadata

  18. Text Extraction Source Sound track: Automatic speech recognition using Sphinx II and III recognition systems. (Unrestricted vocabulary, speaker independent, multi-lingual, background sounds). Error rates 25% up. Closed captions: Digitally encoded text. (Not on all video. Often inaccurate.) Text on screen: Can be extracted by image recognition and optical character recognition. (Matches speaker with name.) Query Spoken query: Automatic speech recognition using the same system as is used to index the sound track. Typed by user

  19. Image Understanding Informedia has developed specialized tools for various aspects of image understanding • scene break detection segmentation icon selection • image similarity matching • camera motion and object tracking • video-OCR (recognize text on screen) • face detection and association

  20. Multimodal Metadata Extraction

  21. An Evaluation Experiment Test corpus: • 602 news stories from CNN, etc. Average length 672 words. • Manually transcribed to obtained accurate text. • Speech recognition of text using Sphinx II (50.7% error rate) • Errors introduced artificially to give error rates from 0% to 80%. • Relative precision and recall (using a vector ranking) were used as measures of retrieval performance. As word error rate increased from 0% to 50%: • Relative precision fell from 80% to 65% • Relative recall fell from 90% to 80%

  22. Speech recognition and retrieval performance

  23. User Interface Concepts Users need a variety of ways to search and browse, depending on the task being carried out and preferred style of working • Visual icons one-line headlines film strip views video skims transcript following of audio track • Collages • Semantic zooming • Results set • Named faces • Skimming

  24. Thumbnails, Filmstrips and Video Skims Thumbnail: • A single image that illustrates the content of a video Filmstrip: • A sequence of thumbnails that illustrate the flow of a video segment Video skim: • A short video that summarizes the contents of a longer sequence, by combining shorter sequences of video and sound that provide an overview of the full sequence

  25. Creating a Filmstrip Separate video sequence into shots • Use techniques from image recognition to identify dramatic changes in scene. Frames with similar color characteristics are assumed to be part of a single shot. Choose a sample frame • Default is to select the middle frame from the shot. • If camera motion, select frame where motion ends. User feedback: • Frames are tied to time sequence.

  26. Creating Video Skims Static: • Precomputed based on video and audio phrases • Fixed compression, e.g., one minute skim of 10 minute sequence Dynamic: • After a query, skim is created to emphasize context of the hit • Variable compression selected by user • Adjustable during playback

  27. Limits to Scalability Informedia has demonstrated effective information discovery with moderately large collections Problems with increased scale: • Technical -- storage, bandwidth, etc. • Diversity of content -- difficult to tune heuristics • User interfaces -- complexity of browsing grows with scale

  28. Lessons Learned • Searching and browsing must be considered integrated parts of a single information discovery process. • Data (content and metadata), computing systems (e.g., search engines), and user interfaces must be designed together. • Mulit-modal methods compensate for incomplete or error-prone data.

More Related