1 / 53

Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

Using a Background Neural Model in a Digital Library. Jean-Charles LAMIREL, Jieh HSIANG Liu WJ. LORIA, Nancy, France. The CORTEX team. Research areas : Biological-like models for intelligent information management Applications : Autonomous robotics and in-board intelligence

zed
Download Presentation

Jean-Charles LAMIREL, Jieh HSIANG Liu WJ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using a Background Neural Model in a Digital Library Jean-Charles LAMIREL, Jieh HSIANG Liu WJ LORIA, Nancy, France

  2. The CORTEX team Research areas :Biological-like models for intelligent information management Applications : • Autonomous robotics and in-board intelligence • Numerical classification (vs. symbolical) • Information retrieval and discovery

  3. The CORTEX information retrieval and discovery activity • Main themes of research Interface for personalized access to information Intelligent multimedia data mining Web - Documentary database interaction • Collaborations • ORPAILLEUR INRIA team, INIST, LaVillette, NSC Taiwan, industry... • European projects: SCHOLNET, EISCTES

  4. Some examples of application • Adaptive environment for assistance to investigation on the Web • Multi-topographic navigation MultiSOM • For multimedia data mining • For data mining on full text (patents) • Numerical-symbolic collaboration

  5. Presentation summary • Introduction: • Basic set of functionalities for information discovery • Limitations of the classical methods for information discovery • The MultiSOM model + Butterfly application: • Basic behaviour • Extensions • Management of textual information lamirel@loria.fr

  6. Basic set of functionalities for information discovery • Synthetical view of the studied domain = • Distribution of the thematical indicators of the domain • Highligting of regularities / weak signals • Management of several type of synthesis • Interactivity = • Dynamic data mixture / type of need • Choice of meta-orientation of investigation • Setting of the granularity level of the analysis • Multimedia

  7. Managing different kinds of queries for discovery • Exploratory (no goal): « Which is the contents of the database ?» • Thematic (general orientation): « Images of space conquest » • Connotative (hidden goal, indirect research): « Impressive images on human technology » • Precise: « Images of Amstrong moonwalk, July 69 »

  8. Limitations of the classicalmethods for information discovery • Overall view of the studied domain = • Noise • Complex interpretation (hidden information) • Local views necessarily independant • Weaks signal difficult to highlight • No interactivity = • Passive classification • Predefined ways to access to information

  9. Neural methods for information cartography • Topographic learning (SOM) = • classification • projection • Multi-viewpoint modelization capabilties (MultiSOM) • Intuitive auto-organization of information • Active maps (IR + Navigation) • Low human intervention during construction • Multimedia capabilities

  10. Butterfly museum application • Different kinds of query • Query by keywords • Query by example • Different kinds of criteria • Colour (automatic) • Shape (manual) • Texture (manual) • Problems • Hand-made classifications • Combination of results coming from different criteria Yellow = very strong,Red = not,Edge = strongSpot = middle, …

  11. Query by keywords Query by example Adding new individuals Butterfly application automation Global and/or cross viewpoints classifications User interface Combination of results User interface Validation of insertion or classification recalculation Butterfly application Viewpoint classifications

  12. WEIGHTED DESCRIPTION IDF TEXTURE Basic topographic map building • Data description: • Document (image) = index vector : eg vector of characteristics • Weighting of the characteristics modalities (very strong=1, …) • Optionnal IDF weighting (weak signals detection)

  13. Basic topographic map building • Map predefined parameters settings: • Number of neurons • Structure : eg 2D grid with square neighbourhood • Competitive learning:

  14. Current data (image) at time T Selection of the winning neuron Influence on the neigbourhood Competitive learning

  15. Map labelization and zoning • Map labelization: • Based on the best components of the profiles • Class or member-oriented • One single method is not sufficient =>Gives an overview of the detected themes • Map zoning: • Based on the SOM topographic properties • Based on the best components of the class profiles =>Gives an overview of the weights of the themes

  16. MULTIMEDIA THEMATIC CARTOGRAPHY OF « BUTTERFLY » THEME « YELLOW » CENTRAL SUB. IMAGE DESCRIPTION THEME « GREEN » LIST OF THEME MEMBERS COLOR VIEWPOINT

  17. On-line generalizations Basic map (core classification) VIEWPOINT 2 VIEWPOINT 1 The MultiSOM model

  18. Map on-line generalization • Goal: • Synthethize the map contents by decreasing the number of neurons (classes) • Constraints: • Preserve the map topographic properties • No classification re-computation • Method: • Exploitation of the neighbourhood relations on the map

  19. Map on-line generalization

  20. Semantic viewpoints • Subspace of the description space • Can be a field, a subset of keywords, ... • Possible overlapping sets • Concurrent or complementary viewpoints =>Examples: indexer keywords, title keywords, authors, … , visual characteristics, sounds =>Butterflies: color, shape, texture, …

  21. Inter-map communication • Goal: • Cope with the limitations of a global map • Allow communication between viewpoints • Constraints: • Interpretable behaviour • Method: • Re-projected data = Transmitters neurons • Two steps: 1) Activation of a source map (directly or through a query) 2) Transmission to target maps

  22. Inter-map communication

  23. Inter-map communication • A function: • Two modes: • Possibilistic (weak thematic relations over viewpoints)  • Probabilistic (mesure of the themes similarities) => g = class belonging degree

  24. Activity coherency STRONG FOCALIZATION WEAK FOCALIZATION

  25. TEXTURE MAP COLOR MAP Response: YES, Spots and Edges Question: Regularities in textures of yellow butterflies ? Inter-map communication BUTTERLIES

  26. Compliance with IR operations Response = YES Response = NO Question: Are there butterflies with spots AND veins ?

  27. Remaining problems (to be solved) • Validation of the automatic classification results by the experts • Testing of different results merging methods • Test the use of prototype features in classification* • Realization of a Web interface for the maps • Compare map build-in result combination mechanism with external combination mechanism • Test map capabilities for the help in adding new individuals • Introduce textual data and combine it with images

  28. USE OF COLOR PROTOTYPES THEME « YELLOW » YELLOW COLOR VIEWPOINT

  29. Experimentation on patents (texts) Goal : Intelligent technological survey = Full text analysis of the patents • Domain of oil engineering • Provide answers to questions like : 1.“Which are the relationships between patentees ?”, 2. “On which specific technology does a patentee work ? Which are the advantages of this specific technology ? For which use ?”,

  30. ViewpointsDefinition Basic experimental protocol PatentsDatabase DILIBReformating Patents in XMLFormatStructured by Viewpoints Nominal groupsExtraction ValidatedMulti-indexes Interactive maps for analysis MicroNOMADMultiSOM lamirel@loria.fr

  31. Nominal groups extraction 1) Lexicographic analysis (compound terms) 2) Normalization : Ex: “ oil fabrication ” and “ oil engineering” => “ oil engineering ” • Results :

  32. Patents reindexing Selected Viewpoints: title, use, advantages and patentees

  33. Title (Components) Use Patentees Advantages Example of dynamic analysis DYNAMIC DEDUCTION : Parentee «TONEN CORP. » is a specialist of lubrification of the « automatic transmission ». It products mainly oils based on « organo- molybdenum compound » whic have the specific property of having a « friction coefficient stable stable on a wide range of temperature »

  34. CLASS DESCRIPTION Hidden link ! Classical methods (AK-means) CLASSES MAP

  35. Conclusion • Different viewpoints yield complementary results: • Ex: Indexer keywords = Closed themes, Title keywords = Open themes, ... • Detection of indexation inconsistencies • Projection of thematic pertinence of a query • Bilateral synergy: images <=> textual information • Very rich and flexible inter-map communication mechanism: • Cross analysis between viewpoints, dynamics • No limitation regarding viewpoints type and number

  36. Perspectives • Sophisticated 2D mapping, 3D mapping • Pure image mosaic navigation • Automatization of communication between viewpoints • Interaction with Gallois lattice: map zoning and generalization, rule mapping, lattice entry points selection • Applications: • La Vilette: interactive browsing through museum collection, setting up of exibitions • INIST: Cartography of the Web (EISCTES EEC Project)

  37. 3) Combining Symbolic and Numeric Techniques for DL Contents Classification and Analysis Jean-Charles LAMIREL, Yannick TOUSSAINT (Orpailleur)

  38. Introduction • Combining numerical and symbolic methods: • MicroNOMAD Self Organizing Maps (SOM) • Basic SOM topographic properties • MicroNOMAD multi-map communication process • Lattice • Formal properties and symbolic deduction • Hierarchical structure and inheritance of properties • Study of projection of SOM over lattice • Making explicit formal properties on the map • Map intelligent zoning and labelization

  39. Galois lattice • Symbolic hierarchical method: ({i1, i2}, {p1, p2, p3}) • Partial order defined by the subsumption relation over the set of formal concepts: (I1, P1)  (I2, P2)  I1 I2, (I1, P1)  (I2, P2)  P1 P2,  I1, I2there is a unique meet and join. • Inheritance of properties • Extraction of association rules: Search Engine  {Web, IR}

  40. I = {i1, i2, i3, i4}, P = {AI, Robots, Search Engine, Web, IR} i1 = {Web, IR} i2 = {Web, IR} i3 = {Web, IR, Search Engine} i4 = {AI, Robots} {{i1, i2, i3, i4} ,  } {{i1, i2}, {Web, IR} } {{i4}, {AI, Robots} } {{i1, i2, i3}, {Search Engine, Web, IR} } { , {IA, Robots, Search Engine, Web, IR} } R1 = Search Engine  {Web, IR}

  41. Complementarity of approaches • Kohonen SOM • Complex weighting scheme • Difficulty for precise interpretation • Good illustrative power (topographic structure) • Good synthesis capabilities • Non linearity • Lattice • High number of classes • Memory and time consuming • Hierarchical structure • Rule extraction • Incrementality

  42. Projection Grouping 3-steps methodology Agglomeration

  43. Conclusion • Cosine method seems to be the best of the test • Good accuracy • Well-balanced agglomeration • Agglomeration preserves closed areas on SOM • Other projection and agglomeration methods have to be tested • Preservation of partial order and inheritance

  44. Perspectives • Evaluation on large corpus + Expert • Rule management • class quality evaluation • class labelisation • Deduction validation on communicating maps (lattice extensions) • Implementation of an operational prototype

  45. Other approaches • Multi-classificator cooperation (PhD) • SVM • Stigmergy • Genetic • Neural maps • On-line learning of user ’s behaviour, intelligent relevance feedback

  46. Annexes • Topographic inconsistencies • Area computation • Inter-map communication • Activity coherency

  47. Topographic inconsistencies NO INCONSISTENCIES WEAK INCONSISTENCIES STRONG INCONSISTENCIES

  48. Topographic inconsistencies GLOBAL STRONG Neuron neighbourhood

  49. Area computation WHILE SO AS IN DO END DO

More Related