300 likes | 318 Views
Research on Personal Dataspace Management. Yukun Li liyukun@ruc.edu.cn Renmin University of China. Outline. Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions. Introduction.
E N D
Research on Personal Dataspace Management Yukun Li liyukun@ruc.edu.cn Renmin University of China
Outline Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions
Introduction In 1945, Vannevar Bush predicted Personal Information Managemant Will become a serious problem. Today it comes into being… • Information explosion • Information islands
Introduction (Example) Where is it? My God, I forgot it! Distributed Storage Information island
Outline Introduction Related work CoreSpace based Framework for PDS OrientSpace: A prototype system Ongoing work Conclusions
Related work • Concepts [PIM workshop2005 report] • Personal dataspace - From databases to dataspaces. [Franklin M, etc SIGMOD Record, 2005] - Principles of dataspace systems [Halevy A ,etc. In PODS2006] - Data model: iDM [Dittrich J-P and Salles MAV…,VLDB 2006] • Systems of personal data management - iMemex[L. Blunschi, J.-P. Etc . In CIDR, 2007] - Semex[X. Dong and A. Halevy. In CIDR 2005] - Others • Systems for special data source management - Email data management - Desktop Search Engine
Related work The performance of personal data operation is still slow. The characters of personal dataspace are not modeled well. Components: Owner entity, Data Set, Service Attributes of Personal Dataspace Correlation, Controllable Characters: Versatile data sources From data to schema Pay-as-you-go Others The characters of user may be the key factor to improve the performance of data operation.
Outline Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions
Research work User-centered framework for PDS CoreSpace of personal dataspace CoreSpace Query Strategy
Research WorkA User-Centered Framework for PDS The characters of user may be the key factor to improve the performance of data operation.
Research WorkObservation The personal data is always distributed, rough-and-tumble, personalized, heterogenous and evolutionary. But, are there some rules or patterns in the PDS? If the answer is yes, What are them? Observations: -Importance of objects are always different. -Importance of a certain object is dynamic. -People tend to visit a small data set in a period.
Research WorkCoreSpace Two concepts : Object Weight (OW) Personal CoreSpace (PCS) Object Weight: To describe relation between the object and the owner, it can be defined as possibility that the object will be accessed in the future. Personal CoreSpace: It consists of the objects which OW is bigger than a given threshold. On the opposite, the full space of a person is made up of all objects with relation to the owner.
Research Work Preliminary experience • Real personal data of three months Visited object number vs. Totle object number VisiteTime based object number
Research work ObjectWeight Computing(1) The features which will affect OW as below: - FileType - FileModifyTime - FileAccessFrequency - FileOwner - Personal Task - Association Between objects
Research WorkObjectWeight Computing(2) VF : Visit frequency It is described with visit times in a day S: an attenuation factor.
Research workMore advantages of the concepts • Data integration (ObjectWeight > 0) • Data query (Scanning CoreSpace is enough in most cases) • Data Indexing (Different strategies for Indexing CoreSpace and FullSpace ) • Data Backup (Corespace-based backup strategy)
Research workCoreSpace-based Query Strategy Query Interface{ [attribute\\[keyword]*]*, K } f.g. “Title\\integration, uncertain" . It means "Please tell me the objects whose title contain the words Integration and and uncertain".
Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions
OrientSpaceFunctions Integration - Manual integration - Automatic integration Query - Extend Keyword Query - Results-based Navigation - CoreSpace explorer
OrientSpaceData Storage(vertical model) Advantages: An universal model to describe any object. Question: A great number of join operation lead to low performance.
Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions
Ongoing work ObjectWeight Computing - Computing Model of OW - Data set ObjectWeight based Data Operation Strategy - Integration, Backup, Query, Consistency, etc. OrientSpace Systems
Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions
Conclusions • Propose a new concept CoreSpace for PDS. It will result in many research issues including index, integration, storage, backup, query and so forth. • The following topics will be focused on in my PhD project User-centered data model (CoreSpace) CoreSpace-based Data Operation(Query) • Implement a prototype system
Thanks, Questions ?