A Self-organizing Semantic Map for Information Retrieval

A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchioninipresented by Yi-Ting

Outline • Introduction • Kohonen’s feature map • A Self-Organizing Semantic Map for AI Literature • A prototype system • Future research

Introduction • Discoveries of new learning algorithms in artificial neural networks. • An application of unsupervised learning method,Kohonen’s feature map algorithm. • Doyle’s classic article “Semantic road maps for literature searchers”.(1961) • Three most distinguishing characteristics of Kohonen’s feature map： • frequencies and distributions of underlying input data. • Understanding of the computer’s role. • Projection.

Kohonen’s feature map • A unsupervised learning methoh. • A self-organizing learning algorithm which presumably produces feature maps similar to those occurring in the brain. • Each Input object is represented by a N-dimensional vector, called a feature. • Maps input objects onto of a two-dimensional grid.

Kohonen’s feature map

Kohonen’s feature map • Each node in the grid is assigned a N-dimensional vector. • Learning process： • Select an input vector randomly. • Find the node whose weights closest to the input vector. • Adjust the weights of the winning node. • Adjust the weights of the nodes close to the winning node.

Kohonen’s feature map • A projection of the input space onto the two-dimensional grid. • Two properties of such a feature map： • The feature map preserves the distance relationships between the input data as faithfully as possible. • The feature map allocates different numbers of nodes to inputs based on their occurrence frequencies.

Kohonen’s feature map • Two basic procedure：selecting a winning node、updating weights. • The winning node is selected based on the Euclidian distance.Let X(t)={X1(t),X2(t),…..,XN(t)} be the input vector selected at time t be weights for node k at time t. the winning node s is selected so that：

Kohonen’s feature map • weights of s and the weights of the node in a defined neighborhood are adjusted by where is a gain term ( ) that decreases in time and converges to 0. • two control mechanisms are imposed： • To shrink the neighborhood of a node gradually over time. • Adaptive gain parameter

Kohonen’s feature map • gain parameter : • Kohonen initially defined it over geographic neighborhoods. • A more recent version adapts the Gaussian function. • The Gaussian function is supposed to describe a more natural mapping so as to help the algorithm converge in a more stable manner.

A Self-Organizing Semantic Map for AI Literature • Kononen’s self-organizing map algorithm is applied to a collection of documents from the LISA database. • All 140 titles indexed by the descriptor “Artificial Intelligence”. And 25 words were retained. • A document vector contains 1 in a given column if the corresponding word occurs in the document title and 0 otherwise. • The document vectors are used as input to train a feature map of 25 features and 140 nodes.

A Self-Organizing Semantic Map for AI Literature

A Self-Organizing Semantic Map for AI Literature • Fig.3. is the semantic map of documents obtained after 2500 training cycles. • The map contains very rich information： • These numbers collectively reveal the distribution of documents on the map. • The map is divided into concept areas.

A Self-Organizing Semantic Map for AI Literature • The areas to which a node belongs is determined as follows： • Way 1- compare the node to every unit vector and assign to the node the unit vector. • Way 2- compare each init vector to every node and label the winning node with the word corresponding to the unit. • When two words fall into the same area, the areas are merged.

A Self-Organizing Semantic Map for AI Literature • The size of the areas corresponds to the frequencies of occurrence of the words. • The relationship between frequencies of word co-occurrence and neighbor areas is also apparent. • Other properties of the map are particularly related to the nature of the document space. • The final weight patterns could be a good associative indexing scheme for the collection. • Related to descriptors in a thesaurus.

A Self-Organizing Semantic Map for AI Literature • Comparing the self-organizing semantic map with Doyle’s semantic road map. • Doyle’s map was basically a demonstration, rather than a practical implementation. • The self-organizing map reveals the frequencies and distributions of underlying data. • The self-organizing map allows much flesxibility.

A prototype system • A prototype system in HyperCard has been designed using a semantic map as an interface. • The system is designed to accept downloaded bibliographical data and automatically add links among the data for easy browsing.

The first card of the system contains a self-organizing semantic map

These selections will cause the system to display titles associated with the selected nodes

To see the full record

A prototype system • The design goal is to conceptualized an information retrieval approach which uses traditional search techniques as information filters and the semantic map as a browsing aid. • To enhance post-processing the retrieved set. • Provided the user an environment by the simple, high recall-oriented.

Future research • Several promising features： • It maps a high dimension document space to a two-dimensional map while maintaining as faithfully as possible document inter-relationships. • It visualizes the underlying structure of a documents space. • It results in an indexing scheme that show the potential to compensate for incompleteness of a simple document representation.

Future research • Dimension reduction： • Latent semantic indexing (LSI). • What are the dimensions for the document space? • Automated generation of dimensions from word stem frequencies as appearing in titles. • The document space can be defined assigned descriptors as dimensions. (descriptor assignment) • The extended vector model to define a document space.

Future research • Information visualization： • Crouch (1986) provided an overview of the graphical techniques used for visual display for IR. • Clustering, multidimensional scaling, iconic representation, and his own component scale drawing. • Most of these techniques were not appropriate in an interactive environment. • The self-organizing semantic map can be considered as another technique for visual display of information for IR.

Future research • The human-generated semantic map： • The properties inherited from Kohonen’s feature map algorithm are promising for the purpose of IR. • But it is not clear yet what properties we may have if we apply some other learning algorithms to produce the semantic map. • Desire to have a semantic map that is independent of any algorithms.

Future research • A further comparison of the maps generated by the algorithm and by people is to use the maps in a retrieval setting. • Through a series of investigations, they should better understand the construction and the properties of the self-organizing semantic map.

A Self-organizing Semantic Map for Information Retrieval

A Self-organizing Semantic Map for Information Retrieval

Presentation Transcript

Information Retrieval and the Semantic Web

Information Visualization with Self-Organizing Maps

Self-Organizing Map (SOM)

Self-organizing map (SOM)

Building a Semantic Map

Keeping Found Things Found: Organizing Information For Retrieval

OPTIMIZING THE SELF-ORGANIZING MAP FOR PATTERN AND IMAGE CLASSIFICATION

ConSOM: A conceptional self-organizing map model for text clustering

Visualization and Navigation of Document Information Spaces Using a Self-Organizing Map

Self-organizing map for symbolic data

A principal components analysis self-organizing map

Self-Organizing Map (SOM) = Kohonen Map

Self-organizing map

ARTIFICIAL NEURAL NETWORKS – Self-Organizing Map (SOM)

Towards the Self-Organizing Feature Map

Information Visualization with Self-Organizing Maps

A self-organizing map for adaptive processing of structured data

Self-organizing map for cluster analysis of a breast cancer database

The Self-Organizing Map and Applications

iCluster: a Self-Organising Overlay Network for P2P Information Retrieval