Multi-modal Information Systems

Multi-modal Information Systems Khurshid Ahmad Chair in Computer Science, Trinity College Dublin, Ireland

Computation and its neural basis Information Processing in the brain is characterised by the interaction of two or more areas of the brain concurrently: Picture naming involves the interaction between the vision and the speech areas; and both apparently are stimulated by having the other modality present; Numerosity and its articulation, understanding what a graph of numbers tells us, involves the interaction between audio areas and the spatial attention areas of the brain.

Computation and its neural basis Information Processing in the brain involves a reliance on the conceptual organisation of the world around us – what there is or ontology. I have developed a text-based ontology method that has been used successfully for: Terminology Extraction; Sentiment Analysis; Knowledge Management; Automatic annotation of images and video sequences;

Computation and its neural basis Much of the computing in the brain is on sporadic, multi-modal data streams Much of modern computing relies on the discrete serial processing of uni-modal data

Computation and its neural basis • Adaptive Image Annotation: I am working with Trinity Institute of Molecular Medicine to annotate images of animal cells in motion; with the National Gallery of Ireland for annotating fine art images with archival material; and there is a possibility of annotating images of containerized goods at ports of entry • Sentiment Analysis: Computation of ‘sentiments’, related to the behaviour of stakeholders, from free text and the time-serial correlation of the sentiment with indexes of prices, volumes, and ‘goodwill’. This work is in conjunction with the School of Business and the Irish Stock Exchange.

Computation and its neural basis • I am working a neural simulation of multi-modal information enhancement and suppression in conjunction in a self-organising framework; • study of non-stochastic and unstable time series using wavelets and fuzzy logic

Image and Collateral Texts • The key problem for the evolving semantic web and the creation of large data repositories is the indexation and efficient retrieval of images – both still and moving- and the identification of key objects and events in the images. • The visual features of an image – colour distribution, shapes, edges, texture, under-constrain an image . So an image cannot be described using visual features alone • Typically, image retrieval systems use images and associated keywords for indexing and retrieving images using both the visual features and the keywords

“Cell” Query Results Introduction to Image Annotation • Why image annotation – consider that in the data there are 20,000 medical images, and one would like to have a look at all those containing cells • Visual query: low-level feature or a exemplar image similar enough to the desired images • Text query: a linguistic description of the context of image

CLOSELY COLLATERAL TEXTS Title of the text References to the Figure in the main body of text Other texts cited in the paper Figure Caption BROADLY COLLATERAL TEXTS The Image Image and Collateral Texts

Annotating an Image Keywords Specialist terms Tags, “Folksonomy” Descriptions authoritative/non-authoritative Systems of concepts? Classification Systems Ontology Flickr tags Hercules Hydra Duplicate Content Duplicate Content Penalization Flickr – Photo Sharing: http://www.flickr.com/ Steve Project: http://www.steve.museum - an “experiment in social tagging of art museum collections”

Image and Collateral Texts Semantic similarity ? “Syntactic“ similarity ? Gustave Moreau, Hercules and the Lernaean Hydra, c. 1876, Art Institute of Chicago. A. Jaimes & S. Chang. A conceptual framework for indexing visual information at multiple levels. In IS&T/SPIE Internet Imaging, 2000. Erwin Panofsky. Studies in Iconology. Harper & Row, New York, 1962.

Trinity Multi-modal Systems • Multi-modal Information Systems: To develop a system that learns to segment images, and learns to annotate images with keywords and learns to illustrate keywords with images • Joint feasibility study between Trinity Computer Science & Trinity Molecular Medicine Laboratory; • Computer Science team is led by Prof. Khurshid Ahmad, and includes Dr. Chaoxin Zheng, Dr. Jacob Martin and Dr Ann Devitt.

Trinity Multi-modal Systems A neural computing solution to automatic annotation and illustration • The Trinity system is based on an earlier system that has learnt to associate 9,000 keywords associated with 1,000 images (9 Keywords/image on average). • Once trained the system can retrieve images given keywords using full and partial matches. Query Term Matched Text Retrieved Image K. Ahmad, B. Vrusias, and M. Zhu. ‘Visualising an Image Collection?’ In (Eds.) Ebad Banisi et al. Proceedings of the 9th International Conference Information Visualisation (London 6-8 July 2005). Los Alamitos: IEEE Computer Society Press. pp 268-274. (ISBN 0-7695-2397-8).

Trinity Multi-modal Systems A neural computing solution to automatic annotation and illustration Query Image Matched Image Retrieved Text

Indexing and Annotation • Human annotation is tedious and slow, and cannot cope with the huge volume of images generated by advanced image acquisition techniques, such as high content screening used in biological and medical research • There is a need to automate the process of annotating or indexing images in laboratories, at customs check posts, in art galleries or on the internet

Training/learning Similarity How to Annotate Images • People are trained with knowledge in a specified domain and become experts so that they can annotate images using their expertise – a lot of other analysis can be done at the meantime. Remember! It is a cell Ha, I know this is a cell

Automatic Image Annotation • Training set: this is the basic of most systems. Without the training dataset, it is like asking somebody to do a job without giving any education or training • Similarity: the new and unseen situation has something in common with the training set • Learning: this is to explore the association between images and their associated descriptions.

词图 CITU (C2) System • What is in the system? • A user-friendly and efficient interface to collect training data • A modern image analysis toolbox to process images and extract features for similarity measures and a text processing component, which can extract linguistic features • A state-of-the-art cross-modal system, based on neural computing, to learn the associations between image features and textual features • A database acts as the communicator between different modules 词 = words; 图 = images

词图 CITU (C2) System • What can the system do? • Automatically analyse images – image segmentation, colour, texture, and shape analysis • Automatically process text documents associated with images – frequency analysis and collocations – to extract key terms and key features • Automatically learn the association between image features and textual features – once it is trained or learnt, the system will automatically generate keywords for images or retrieve images for textural queries. • C. Zheng, K. Ahmad, A. Long, Y. Volkov, A. Davies, D. Kelleher 2007. Hierarchical SOMs: segmentation of cell migration images. International Symposium on Neural Networks, Nanjing, China, June 3-7. • C. Zheng, A. Long, Y. Volkov, A. Davies, D. Kelleher, K. Ahmad 2007. A cross-modal system for cell migration image annotation and retrieval. International Joint Conference on Neural Networks, Orlando, Aug. 11-17. • C. Zheng, D. Kelleher, K. Ahmad 2008. A semi-automatic indexing system for cell migration images. 2008 World Congress of Computational Intelligence, HongKong, June. 1-6.

Architecture of CITU (C2) Manual annotation Image analysis Language processing Image content and free text Image pre-processing Frequency analysis Free text Image content Database Image segmentation Collocations Image feature Linguistic feature Image and linguistic Features Cross modal associations Feature extraction Feature extraction Cross modal learning

Image Annotation Image content Image feature Image feature Linguistic feature Automatic Image Annotation User Image analysis Image pre-processing Database Image segmentation Feature extraction Cross modal learning

Images Free text Free text Linguistic feature Image feature Linguistic feature Image Retrieval User Language processing Frequency analysis Database Collocations Feature extraction Cross modal learning

Image Feature Extraction • Multiscale analysis • Wavelet transform is used to decompose the image into different scales. • Moment extraction • Zernike moments are extracted from each scale as features and passed to the database Notation: I, image; A, approximation signal; H, horizontal signal; V, vertical signal; D, diagonal signal

词图 CITU (C2) System

Multi-modal Information Systems

Multi-modal Information Systems

Presentation Transcript

Multi-Modal Dialogue in Personal Navigation Systems

Multi-Modal Radioactive Shipping

Multi-Modal Assessment

A multi-modal newspaper and internet information resource for schools

Multi-modal imaging: simultaneous EEG-fMRI

Multi-party, Multi-issue, Multi-strategy Negotiation for Multi-modal Virtual Agents

Innovations in Multi-Modal Transit Mapping

Multi Modal Learning Environment Project

Experiments on Building Language Resources for Multi-Modal Dialogue Systems

MULTI MODAL TRANSPORTATION PROBLEMS

MAIS: Multi-channel ADAPTIVE information systems

Multi-Modal Visualization Methods

STATEWIDE TRANSPORATION MULTI-MODAL MEETING

Multi-Modal transportation

Innovations in Multi-Modal Transit Mapping

Multi-Modal Corridor Study

Multi-modal Interfaces

Multi-Modal Sensory Stimulation