1 / 30

Semantic collaborative web caching

Semantic collaborative web caching. Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON Jean-Marc.Pierson@insa-lyon.fr. Outline. Motivations and Proxies Documents indexation Temperature of documents Collaboration schema and architecture Results, evaluation and discussion

mirra
Download Presentation

Semantic collaborative web caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic collaborative web caching Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON Jean-Marc.Pierson@insa-lyon.fr

  2. Outline • Motivations and Proxies • Documents indexation • Temperature of documents • Collaboration schema and architecture • Results, evaluation and discussion • Conclusion Jean-Marc.Pierson@insa-lyon.fr

  3. Sharing information/Sharing usage • Information is disseminated • The volume of information is huge  How find my way in the jungle of the IS ? • Many possible solutions : search engines, agents, ontologies... • A solution to be explored : help from/collaboration with other users Jean-Marc.Pierson@insa-lyon.fr

  4. Making users share usages • ... Is an issue that has been addressed for a long time : proxies users server proxy Jean-Marc.Pierson@insa-lyon.fr

  5. Proxies • Proxies allow • reducing the response time • reducing the server load • reducing the network load • Proxies can be located close to the server and/or close to users • Proxies can collaborate (hierarchical or "flat" collaboration) • Proxy management policies are based on operational (LRU/MFU-like) information Jean-Marc.Pierson@insa-lyon.fr

  6. Motivations • Users are generally interested in some concerns • User caches contain related documents • Metadata, user profiles, virtual communities, hot topics can provide proxies with semantic and contextual information about the queries they have to serve Jean-Marc.Pierson@insa-lyon.fr

  7. Proposition monitoring this semantic and contextual information to : • optimize proxy management policies and proxy communication policies • allow users to share usages • give users a personalized view of the web information space Jean-Marc.Pierson@insa-lyon.fr

  8. Proposition : use collaborative proxies to : • improve performances (basic) • act as forum and mediators for helping users share usage information • Assumptions : • proxies do not share rough data but documents that hold information which can be described by metadata (descriptors) • users are not isolated nor autistic : they share some common interest or experience or objective or behavior (virtual communities) • information and topics of interest evolve rapidly : "hot" topics Jean-Marc.Pierson@insa-lyon.fr

  9. From proxies to adaptive indexes • The (present + past) content of a proxy de facto provides a view over the global information system • This view has some real added value • Examples : • what teaching materials about Java are the most accessed ? • are there some news about football ? • what correlated documents people who once read this document have accessed after ? Jean-Marc.Pierson@insa-lyon.fr

  10. Document indexation • indexing tree : an "ontology" of the web space • difficulty to find one ! • « Yahoo » like Jean-Marc.Pierson@insa-lyon.fr

  11. How the indexation is performed ? • analyzes the content of the document… • Title • Meta-tags (Content, Keywords, …) • Links • Formatting (header, bold face, outline) • … to extract keywords • Keywords are analyzed to find related concepts • mapping is realized from concepts to ontology Jean-Marc.Pierson@insa-lyon.fr

  12. Weighted indexing tree • Edges between concepts (ancestors and children) are weighted • The weight relates to the probability of a request for a document located under the child node to be next requested after a document under the parent node in the hierarchy was requested. • It is the “correlation” (in terms of access patterns) between the target node and its “brothers” Jean-Marc.Pierson@insa-lyon.fr

  13. Weighted tree for instance, one interested in baseball is more likely to be interested by soccer than skiing (subject of discuss) Jean-Marc.Pierson@insa-lyon.fr

  14. Notion of Temperature • documents are assigned a temperature related to their « hotness » : a more a document is accessed, the higher its temperature • cache replacement policy uses the temperature of documents : cooler documents are first suppressed from the cache; prefetching uses the hottest documents Jean-Marc.Pierson@insa-lyon.fr

  15. Temperature • Represents the probability for a document to be accessed in the near future • It is the synthesis between the number of requests for a document in the last time interval and the semantic links represented by the data structure. • A temperature value is also associated to internal nodes of the data structure. Jean-Marc.Pierson@insa-lyon.fr

  16. Temperature computation • Temperature computation occurs at regular requests intervals • The number of accesses to each document between two consecutive computations is stored in an access table. • if a document has been accessed since the last temperature computation, its temperature increases of the corresponding value in the table and this value is stored in a stack for future cooling • otherwise, it decreases Jean-Marc.Pierson@insa-lyon.fr

  17. Temperature propagation up the data structure The temperature variation (DT) for each document is diffused along the edges of the data structure. More precisely, for each (document, concept) couple where there exists an edge of weight W between document and concept, the temperature of concept increases or decreases by W * DT The concept temperature variation may be further diffused to its parent node (with a given threshold). Jean-Marc.Pierson@insa-lyon.fr

  18. Example : DT1 for document 1 : +3 Temperature variation for Soccer (from DT1) : Ds = 3*70% = 2.1 Temperature variation for Sports = 2.1 * 40% = 0.84 Temperature variation for Recreation and Sports = 0.84*15% = 0.126 [stops here if threshold is 0.5] Jean-Marc.Pierson@insa-lyon.fr

  19. Temperature retropropagation down the data structure • Temperature is diffused from concepts down to documents • each document under a concept that has seen its temperature modified sees its temperature modified • even « non-accessed » documents might see their temperature increase Jean-Marc.Pierson@insa-lyon.fr

  20. Example : Temperature variation for Games concept = +0.126*15% = 0.0189 Temperature variation for Baseball = 0.84*40% = 0.336 Temperature variation for Document 2 = 2.1*50%= 1.05 Temperature variation for Document 3 = 2.1*60%= 1.26 In fact, one upward phase for all documents, then a downward phase for all concepts 0.126 0.84 +2.1 Jean-Marc.Pierson@insa-lyon.fr

  21. Document – Concept link (precision) • When a document is related to two concepts, we duplicate its node and link the two created nodes to the two related concepts. • Otherwise, with only one node, problem with the temperature variation propagation among non related documents (by rebound) Jean-Marc.Pierson@insa-lyon.fr

  22. A distributed collaborative architecture

  23. Client Connection Query processing Server/proxy connection Temperature Cache Profile Index Proxy architecture Jean-Marc.Pierson@insa-lyon.fr

  24. Navigator cache vs user proxy • Navigator "local caches" are basic and cannot communicate • Implementing true communicating proxies at the navigator/user level allows : • reducing the intermediate proxy load • optimizing the network traffic • reducing the response time • managing the user profile • counting document hits • customizing semantic and contextual information Jean-Marc.Pierson@insa-lyon.fr

  25. From proxies to virtual communities • User profile : topics of interest • Virtual community = users with similar profile • Virtual communities could be used for : • monitoring the document usage • associating proxies with specific communities • providing users with pertinent information about the content of proxy caches • monitoring the evolution of the topics of interest • sharing experiences and optimizing queries Jean-Marc.Pierson@insa-lyon.fr

  26. Collaboration and communities • Subscription : manual and static to evolve to dynamic and automatic • Relationships between the user proxy and the aggregate proxies in charge of the community : • to find in another user proxy a requested document • to see the most accessed documents in the community • The proxy organization must reflect the community structure and usages Jean-Marc.Pierson@insa-lyon.fr

  27. Prototype • Java • Indexation tree limited to 2 or 3 levels of Yahoo! • Matching done only with keywords (being or not in the indexing tree) and not with concepts • Interfaced with ThoughtTreasure (a french-english Wordnet) for keywords not in the indexing tree Jean-Marc.Pierson@insa-lyon.fr

  28. Evaluation • temperature notion already proved efficient for video archives caching (hit rate) • small scale experiments of the proxy-web architecture proved to be robust • indexation is working well (more than 90% of documents indexed) • difficulties related to the necessity to handle contents of web pages to test the behavior Jean-Marc.Pierson@insa-lyon.fr

  29. Conclusion • Enhancing the integration of distributed information systems or servers into a global service by the means of collaborative proxies • Management and collaboration based on semantic and contextual information  temperature • Performance improvement • Virtual communities • Attachment of a proxy to each user Jean-Marc.Pierson@insa-lyon.fr

  30. Future works • test the prototype on a large scale : design a test platform ! • push the intermediate cache management to the heart of the networks (active router) • enhance the indexation algorithm • apply the technology to Grid computing (cache management) Jean-Marc.Pierson@insa-lyon.fr

More Related