Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Using a Background Neural Model in a Digital Library Jean-Charles LAMIREL, Jieh HSIANG Liu WJ LORIA, Nancy, France

The CORTEX team Research areas :Biological-like models for intelligent information management Applications : • Autonomous robotics and in-board intelligence • Numerical classification (vs. symbolical) • Information retrieval and discovery

The CORTEX information retrieval and discovery activity • Main themes of research Interface for personalized access to information Intelligent multimedia data mining Web - Documentary database interaction • Collaborations • ORPAILLEUR INRIA team, INIST, LaVillette, NSC Taiwan, industry... • European projects: SCHOLNET, EISCTES

Some examples of application • Adaptive environment for assistance to investigation on the Web • Multi-topographic navigation MultiSOM • For multimedia data mining • For data mining on full text (patents) • Numerical-symbolic collaboration

Presentation summary • Introduction: • Basic set of functionalities for information discovery • Limitations of the classical methods for information discovery • The MultiSOM model + Butterfly application: • Basic behaviour • Extensions • Management of textual information lamirel@loria.fr

Basic set of functionalities for information discovery • Synthetical view of the studied domain = • Distribution of the thematical indicators of the domain • Highligting of regularities / weak signals • Management of several type of synthesis • Interactivity = • Dynamic data mixture / type of need • Choice of meta-orientation of investigation • Setting of the granularity level of the analysis • Multimedia

Managing different kinds of queries for discovery • Exploratory (no goal): « Which is the contents of the database ?» • Thematic (general orientation): « Images of space conquest » • Connotative (hidden goal, indirect research): « Impressive images on human technology » • Precise: « Images of Amstrong moonwalk, July 69 »

Limitations of the classicalmethods for information discovery • Overall view of the studied domain = • Noise • Complex interpretation (hidden information) • Local views necessarily independant • Weaks signal difficult to highlight • No interactivity = • Passive classification • Predefined ways to access to information

Neural methods for information cartography • Topographic learning (SOM) = • classification • projection • Multi-viewpoint modelization capabilties (MultiSOM) • Intuitive auto-organization of information • Active maps (IR + Navigation) • Low human intervention during construction • Multimedia capabilities

Butterfly museum application • Different kinds of query • Query by keywords • Query by example • Different kinds of criteria • Colour (automatic) • Shape (manual) • Texture (manual) • Problems • Hand-made classifications • Combination of results coming from different criteria Yellow = very strong,Red = not,Edge = strongSpot = middle, …

Query by keywords Query by example Adding new individuals Butterfly application automation Global and/or cross viewpoints classifications User interface Combination of results User interface Validation of insertion or classification recalculation Butterfly application Viewpoint classifications

WEIGHTED DESCRIPTION IDF TEXTURE Basic topographic map building • Data description: • Document (image) = index vector : eg vector of characteristics • Weighting of the characteristics modalities (very strong=1, …) • Optionnal IDF weighting (weak signals detection)

Basic topographic map building • Map predefined parameters settings: • Number of neurons • Structure : eg 2D grid with square neighbourhood • Competitive learning:

Current data (image) at time T Selection of the winning neuron Influence on the neigbourhood Competitive learning

Map labelization and zoning • Map labelization: • Based on the best components of the profiles • Class or member-oriented • One single method is not sufficient =>Gives an overview of the detected themes • Map zoning: • Based on the SOM topographic properties • Based on the best components of the class profiles =>Gives an overview of the weights of the themes

MULTIMEDIA THEMATIC CARTOGRAPHY OF « BUTTERFLY » THEME « YELLOW » CENTRAL SUB. IMAGE DESCRIPTION THEME « GREEN » LIST OF THEME MEMBERS COLOR VIEWPOINT

On-line generalizations Basic map (core classification) VIEWPOINT 2 VIEWPOINT 1 The MultiSOM model

Map on-line generalization • Goal: • Synthethize the map contents by decreasing the number of neurons (classes) • Constraints: • Preserve the map topographic properties • No classification re-computation • Method: • Exploitation of the neighbourhood relations on the map

Map on-line generalization

Semantic viewpoints • Subspace of the description space • Can be a field, a subset of keywords, ... • Possible overlapping sets • Concurrent or complementary viewpoints =>Examples: indexer keywords, title keywords, authors, … , visual characteristics, sounds =>Butterflies: color, shape, texture, …

Inter-map communication • Goal: • Cope with the limitations of a global map • Allow communication between viewpoints • Constraints: • Interpretable behaviour • Method: • Re-projected data = Transmitters neurons • Two steps: 1) Activation of a source map (directly or through a query) 2) Transmission to target maps

Inter-map communication

Inter-map communication • A function: • Two modes: • Possibilistic (weak thematic relations over viewpoints) • Probabilistic (mesure of the themes similarities) => g = class belonging degree

Activity coherency STRONG FOCALIZATION WEAK FOCALIZATION

TEXTURE MAP COLOR MAP Response: YES, Spots and Edges Question: Regularities in textures of yellow butterflies ? Inter-map communication BUTTERLIES

Compliance with IR operations Response = YES Response = NO Question: Are there butterflies with spots AND veins ?

Remaining problems (to be solved) • Validation of the automatic classification results by the experts • Testing of different results merging methods • Test the use of prototype features in classification* • Realization of a Web interface for the maps • Compare map build-in result combination mechanism with external combination mechanism • Test map capabilities for the help in adding new individuals • Introduce textual data and combine it with images

USE OF COLOR PROTOTYPES THEME « YELLOW » YELLOW COLOR VIEWPOINT

Experimentation on patents (texts) Goal : Intelligent technological survey = Full text analysis of the patents • Domain of oil engineering • Provide answers to questions like : 1.“Which are the relationships between patentees ?”, 2. “On which specific technology does a patentee work ? Which are the advantages of this specific technology ? For which use ?”,

ViewpointsDefinition Basic experimental protocol PatentsDatabase DILIBReformating Patents in XMLFormatStructured by Viewpoints Nominal groupsExtraction ValidatedMulti-indexes Interactive maps for analysis MicroNOMADMultiSOM lamirel@loria.fr

Nominal groups extraction 1) Lexicographic analysis (compound terms) 2) Normalization : Ex: “ oil fabrication ” and “ oil engineering” => “ oil engineering ” • Results :

Patents reindexing Selected Viewpoints: title, use, advantages and patentees

Title (Components) Use Patentees Advantages Example of dynamic analysis DYNAMIC DEDUCTION : Parentee «TONEN CORP. » is a specialist of lubrification of the « automatic transmission ». It products mainly oils based on « organo- molybdenum compound » whic have the specific property of having a « friction coefficient stable stable on a wide range of temperature »

CLASS DESCRIPTION Hidden link ! Classical methods (AK-means) CLASSES MAP

Conclusion • Different viewpoints yield complementary results: • Ex: Indexer keywords = Closed themes, Title keywords = Open themes, ... • Detection of indexation inconsistencies • Projection of thematic pertinence of a query • Bilateral synergy: images <=> textual information • Very rich and flexible inter-map communication mechanism: • Cross analysis between viewpoints, dynamics • No limitation regarding viewpoints type and number

Perspectives • Sophisticated 2D mapping, 3D mapping • Pure image mosaic navigation • Automatization of communication between viewpoints • Interaction with Gallois lattice: map zoning and generalization, rule mapping, lattice entry points selection • Applications: • La Vilette: interactive browsing through museum collection, setting up of exibitions • INIST: Cartography of the Web (EISCTES EEC Project)

3) Combining Symbolic and Numeric Techniques for DL Contents Classification and Analysis Jean-Charles LAMIREL, Yannick TOUSSAINT (Orpailleur)

Introduction • Combining numerical and symbolic methods: • MicroNOMAD Self Organizing Maps (SOM) • Basic SOM topographic properties • MicroNOMAD multi-map communication process • Lattice • Formal properties and symbolic deduction • Hierarchical structure and inheritance of properties • Study of projection of SOM over lattice • Making explicit formal properties on the map • Map intelligent zoning and labelization

Galois lattice • Symbolic hierarchical method: ({i1, i2}, {p1, p2, p3}) • Partial order defined by the subsumption relation over the set of formal concepts: (I1, P1)  (I2, P2)  I1 I2, (I1, P1)  (I2, P2)  P1 P2,  I1, I2there is a unique meet and join. • Inheritance of properties • Extraction of association rules: Search Engine  {Web, IR}

I = {i1, i2, i3, i4}, P = {AI, Robots, Search Engine, Web, IR} i1 = {Web, IR} i2 = {Web, IR} i3 = {Web, IR, Search Engine} i4 = {AI, Robots} {{i1, i2, i3, i4} ,  } {{i1, i2}, {Web, IR} } {{i4}, {AI, Robots} } {{i1, i2, i3}, {Search Engine, Web, IR} } { , {IA, Robots, Search Engine, Web, IR} } R1 = Search Engine  {Web, IR}

Complementarity of approaches • Kohonen SOM • Complex weighting scheme • Difficulty for precise interpretation • Good illustrative power (topographic structure) • Good synthesis capabilities • Non linearity • Lattice • High number of classes • Memory and time consuming • Hierarchical structure • Rule extraction • Incrementality

Projection Grouping 3-steps methodology Agglomeration

Conclusion • Cosine method seems to be the best of the test • Good accuracy • Well-balanced agglomeration • Agglomeration preserves closed areas on SOM • Other projection and agglomeration methods have to be tested • Preservation of partial order and inheritance

Perspectives • Evaluation on large corpus + Expert • Rule management • class quality evaluation • class labelisation • Deduction validation on communicating maps (lattice extensions) • Implementation of an operational prototype

Other approaches • Multi-classificator cooperation (PhD) • SVM • Stigmergy • Genetic • Neural maps • On-line learning of user ’s behaviour, intelligent relevance feedback

Annexes • Topographic inconsistencies • Area computation • Inter-map communication • Activity coherency

Topographic inconsistencies NO INCONSISTENCIES WEAK INCONSISTENCIES STRONG INCONSISTENCIES

Topographic inconsistencies GLOBAL STRONG Neuron neighbourhood

Area computation WHILE SO AS IN DO END DO

Jean-Charles LAMIREL, Jieh HSIANG Liu WJ