200 likes | 468 Views
Alternative Retrieval Techniques What do you do when the documents are not English text Lecture Objectives Understand problems of dealing with multi-media data Know some solutions to those problems especially for image data Multimedia “documents” Text Hypertext Still Image Video Sound
E N D
Alternative Retrieval Techniques What do you do when the documents are not English text
Lecture Objectives • Understand problems of dealing with multi-media data • Know some solutions to those problems especially for image data
Multimedia “documents” • Text • Hypertext • Still Image • Video • Sound • Graphics • Hypermedia
Metadata • Data about data (documents) • Compare a Library Catalogue • author names, titles, subject, keywords • useful to search metadata rather than documents
Conventional Approaches to General Image Retrieval Manual Tagging of text descriptions to images. • Costly, not very practical. • Rich content of images means that they may be indexed in very different ways.
Problems of Image Indexing • What is the concept of an image ? • Easy to extract low level features from images • colour, texture, shape • Hard to get from this to “Apples” let alone “Gridiron Football” • A picture may be interpreted in lots of ways
Text Indexing • The words (terms) used indicate the concept/topic of the text • Most texts are fairly unambiguous
Querying Image vs Text • Match query text vs extracted index from document • Generally users don’t have an example image • SCUD launchers • Which interpretation of the image are the users interested in ?
Solutions • Manual Indexing • Index by related text • Layout images according to low level features and provide browsing interface • Query by low level features • Combinations
Example Low Level Feature Global Colour Histograms
Global Colour Indexing • Identify a number of buckets in which to sort the available colours (e.g. red green and blue, or up to ten or so colours) • Allocate each pixel in an image to a bucket and count the number of pixels in each bucket. • Use the figure produced (bucket id plus count, normalised for image size and resolution) as the index key for each images.
Advantages: Computationally fairly efficient and tractable at present Comprehensible to users Disadvantages Users would prefer to search with keywords of objects or moods Apparently similar images can have very different colour histograms: e.g. lighting Global Colour Indexing
Video Indexing • Background vs. Foreground • Finding Objects/People • Extracting Key Frames • Identifying Events for subsequent manual indexing
Conclusions • Image Retrieval • hard to identify concept • hard to match/express queries • Global Colour Indexing • Useful, simple • Similar problems for other non-text media