350 likes | 373 Views
This presentation discusses a framework for managing data in ubiquitous computing environments. Topics include ontology-based metadata, incentive-based routing, cooperative caching, and performance evaluation.
E N D
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment Weisong Chen, Cho-Li Wang, Francis Lau The University of Hong Kong
Presentation Outline • Ubiquitous Data Management • Design Philosophy • System Design • Ontology-based Metadata • Incentive-based Routing • Cooperative Caching • Performance Evaluation • Conclusion
Data Dilemma • We are drown in the ocean of data, but thirsty for useful information. • Can you recall the last time you wanted to access a document sent by one of your friends. You knew it must be somewhere, but you just could not find it. • Even the computer search cannot help, since you forgot the file/directory name.
Ubiquitous Data • The data dilemma would only be more severe in ubiquitous environment. • Users roam from smart space to smart space, leaving all kinds of data there. • At anytime and in any location, the users may want to access certain data. A scenario : A person uses his mobile phone to take photos as he moves around. Due to the limited space, he cannot store all the photos taken. Therefore, he chooses to offload photos to some nearby stable stations. Sometimes later, he would like to view the photos taken, regardless of his location. He may also want to share his photos with some friends. Tsuruga Castle Where is the photo of Alice?
Data Management Challenges • users are moving from place to placed (High Mobility) - Data are stored in everywhere. (High Distribution) • Various devices are with different capabilities and they use different means to store/access data. (High Heterogeneity) • Users cannot consistently control all the smart spaces he ever interacted with. (High Autonomy) • Users generate certain data and may want to access others. (Sharing and Collaboration) • Others: resource-constrained devices, unreliable connectivity,….
Design Philosophy • We incorporate the following philosophy in our system design • Computers should mimic human society to locate data of interest. • Computer not only share data, but also indexing information (metadata), and contribute to the overall infrastructure. • Metadata are widely propagated, while data are only moved to the locations where they are needed. • Incentives should be granted to those devices that help others to find the required data, thus to foster cooperation and encourage contribution (If I help more, I shall find things faster !)
The Abstract Model Shared/Public Devices Internet Smart Space 3 (Ad-Hoc Network) Smart Space 1 (LAN-connected) Private Devices Smart Space 2 WLAN-connected The Abstract Model A Pervasive Computing Environment
Node Architecture Routing Knowledge = Metadata + node id Cached Data • All devices adopt this consistent architecture. • Consisting of a routing table, a cache store, and the ontology knowledge To organize the routing table and cache store List of routing knowledge
Ontology • An ontology is a formal, explicitspecification of a sharedconceptualization*. • “formal” implies that ontology should be machine processable. • “explicit” means that ontology knowledges are explicitly defined. • “shared” indicates that ontology is agreed-upon, consensual knowledge. *R. Studer, V.R. Benjamins, and D. Fensel, ``Knowledge Engineering: Principles and Methods,'' Data and Knowledge Engineering, vol.25, 1998, pp. 161--197.
Representation of Ontology *http://www.semanticweb.org/ontologies/swrc-onto-2001-12-11.daml
Usage of Ontology In Ubi. Env. • Shared Conceptualization • Abstracting data diversity • Facilitating information exchange • Formal and Explicit • Providing context-awareness support • Describing user profile and preference • Maintaining accurate routing knowledge • Promoting reasoning and automatic processing • …
Ontology-based Metadata Concept layer Instance layer Metadata (indexing information) are aggressively and widely propagated !
Metadata Similarity Function The calculation of metadata similarity consists of calculating concept similarity, instance similarity, and literal value similarity.
Incentive-based Routing • “Free riding” is the prevalent problem in existing P2P systems. • Selfish nodes exploit the resources on other nodes, without making any contribution. • Generous nodes share resources with others, but gaining no benefit.
Incentive-based Routing (Cont’d) • All devices interact in a Peer-to-Peer fashion. • Devices forward received queries to the devices that are most likely to have the required data. • Once queried results are found, the corresponding metadata are sent back to the initiating devices, through the reverse query path. • The intermediate nodes that helped to find the results incorporate the returned metadata based on its current knowledge (ontology stored) into their routing table, enhancing their abilities to serve their subsequent queries.
Incentive-based Routing (Cont’d) On a hit, metadata is sent back to the initiating device (N1), through the reverse query path; all nodes (N5, N3, N2, N1) on the path update their routing entries. No limit on the search mechanism: DFS, BFS, etc
Incentive-based Routing (Cont’d) • Devices gains routing knowledge through helping others to find the required data. • The morecontribution a device makes to the success of others’ information access, the more (accurate) routing knowledge it will gain. • Therefore, devices are given incentives to contribute to others’ information access. • The net effect is that all devices become more generous and beneficial.
Incentive-based Routing (Cont’d) • Ubiquitous devices are mostly small, resource-constrained devices. User profiles described by ontology can be used to select and retain the most important knowledge. • On the other hand, devices may expand the received queries using concept generalization and specialization. • Devices handle received routing knowledge and queries according to their respective capabilities.
Incentive-based Routing (Cont’d) • Encourage devices with richer • Network Bandwidth: forwarding queries to more neighbors • Processing Power: expanding the queries with more levels of generalization and specialization • Storage Space: retaining more received routing knowledge
Cooperative Caching Receiving Query Reusing the Cached Data Original Data Stored Routing Knowledge Shared Cached Data
Performance Evaluation • We modify the simulation system used by NeuroGrid. • Use TTL to control the termination of experiment. • Run each experiment for 20000 iterations. • Measure the percentage of queries that got served and the number of messages sent for each query.
Evaluation : ontology-based metadata • Testing the effect of ontology-based metadata: conducting three simulations using the same parameter settings, but with • Ontology-based metadata • Keyword-based metadata • ID-based metadata • Comparing hit ratios, average search messages sent for each query, and the processing overheads
Ontology-based Metadata 4 less hops required More 40% hit ratio (b) Comparison of Search Messages Sent (a) Comparison of Hit Ratios Comparison with Keyword-based and ID-based Metadata
Ontology-based Metadata : Processing Time Overhead Using 1000 less operations Comparison of Processing Overhead
Incentive-based Routing • Dividing user devices into three groups of different participation levels: • Level 1: devices would not forward queries for others (selfish). • Level 2: devices would forward received queries to randomly selected peers. • Level 3: devices would forward received queries using their best routing knowledge.
Incentive-based Routing Protocol Level 3 has highest hit ratio Incentives for contribution!! Comparison of Query Hit Ratios by Devices with Different Participation Levels
Cooperative Caching Peer caches contribute to a significant portion of query serves Query Hits Contributed by Peer Caches
Overall Performance • Compare the overall performance of our system against: • FreeNet: ID-based metadata, best-effort routing • NeuroGrid: Keyword-based metadata, no routing incentive • Random Walk: Keyword-based metadata, forward queries to randomly selected peers
Overall Performance More 40% hit ratio 2 less hops required Comparison with FreeNet, NeuroGrid, and Random Walk
Conclusion • The use of ontology-based metadata can increase the hit ratio and reduce the number search messages sent. • Queries issued by more generous devices are more likely to be served. • Peer caches can be used to serve significant portion of queries generated. • Our system significantly outperforms other similar systems.
Thank You! Q & A
Similarity Between Two Metadata To be simple, we let all tuning parameters equal 1. Msim(M1, M2) = (Isim(book0001, report0002) + Isim(Data, Software) + LVsim(2000, 2002)) / 3 = (Csim(Book, Report) + Csim(Data, Software) + LVsim(2000, 2002)) / 3 = ( 1/3 + 1/2 + 1/(2002-2000+1) ) / 3 = (1/3 + 1/2 + 1/3) / 3 = 7/12
Semantic Matching New Queries with Concept Generalization and Specialization New Queries with Substituted Synonyms