1 / 23

LoG: A Methodology for Metadata Registry-based Management of Scientific Data

LoG: A Methodology for Metadata Registry-based Management of Scientific Data. July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr. Content. Motivation Objectives Related works Overview on the MDR The scientific data properties User levels and the data property Data visibility

weylin
Download Presentation

LoG: A Methodology for Metadata Registry-based Management of Scientific Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LoG: A Methodology for Metadata Registry-based Management of Scientific Data July 5, 2002 Doo-Kwon Baik baik@software.korea.ac.kr

  2. Content • Motivation • Objectives • Related works • Overview on the MDR • The scientific data properties • User levels and the data property • Data visibility • The conceptual model of the LoG • A LoG Framework • An Example • Conclusions and Future work CODATA/DSAO 2002

  3. Motivation • The existing data integration approaches • just focus on the technical researches and systemdevelopments • not consider the properties of the domain knowledge CODATA/DSAO 2002

  4. The Domain Knowledge • The domain knowledge property • is a very important factor in data integration • Many works and services depends on the domain knowledge properties • The quality degree and the quantity scope in data integration are defined depending on the domain knowledge property. • Many other services such as data services and application services depend on it. data services (information providing) the quality degree of data integration Domain knowledge the quantity scope of data integration application services CODATA/DSAO 2002

  5. Objectives • The objectives of our research • to solve the problems of the existing data integration approaches • to analyze and define the domain knowledge properties • In this paper, we focus on the scientific data. • to define relationship among the domain knowledge properties, users and metadata • i.e., define the considerations for data integration. • to create a new methodology considering the results of domain knowledge analysis • we called it as LoG (Localization-based Global MDR methodology). • finally to design a framework which is suitable for the methodology. CODATA/DSAO 2002

  6. Related works: Bottom-up approach(1/2) • The existing data integration approaches are classified into the top-down approach and the bottom-up approach • Bottom-up approach • is the most general approach • The ontology-based methodology is representative Analyze all factual databases (the number of databases = n) Design and create a guideline such as a global view from the specified databases the number of databases = n + c new databases (the number of them = c) CODATA/DSAO 2002

  7. Related works: Bottom-up approach(2/2) • Advantages • can reach the perfect data integration because we use a global guideline which is created through analysis and design about all databases • Disadvantages • the creation of a global guideline spends many costs and time • is not suitable for very large scale data integration • provides a static integration management mechanism • Whenever a new schema or a new database is added to the integrated database, the previous processes is required. • It causes the increase of costs and time geometrically. • not provide a standardized guideline • i.e., it depends on its domain. • each application domain for integration define and utilize the different and various guidelines respectively. CODATA/DSAO 2002

  8. Related works: Top-down approach(1/2) • Top-down approach • to solve the problems of the bottom-up approach • MDR(ISO/IEC 11179) is representative • MDR is the international standard Analyze all factual databases Design and create a guideline such as a global view(metadata elements) from the specified databases new databases Define the schemas of new database according to the standardized guideline CODATA/DSAO 2002

  9. Related works: Bottom-up approach(2/2) • Advantages • reduces many costs • because it doesn’t require for the rebuilding process of the global guideline. • provides a standardized schema • all new databases can be built and managed consistently. • Disadvantages • It also spends many costs initially as the bottom-up approach • because it require for the create a global view through analysis of all legacy databases. • It is a hard work in case of the very large scale integration. CODATA/DSAO 2002

  10. Overview on the MDR: Definition • Definition of MdR • Metadata Registry • System of Registering, Storing and managing the specification(Metadata) about data elements • Evolution of ISO/IEC 11179 • Metamodel of Data Registry : ANSI X3.285 • Purpose • Metadata Registry for data standardization • Support of data search, data specification • Support of data sharing among systems or organizations • Supporting System of creating, registering and managing data element • Support understanding of meaning, representation and identification of data for users CODATA/DSAO 2002

  11. Overview on the MDR: Basic concepts • Data Element • The basic unit of data management • the unit specifying the identification, context, representation of value about data • Components of Data Element • Object Class : The data for collecting or storing • Property : the characteristics needed to identify and explain objects • Representation : The description about representational form and value domain of each data elements Data Element Concept Data Element 1:N 1:N Object Class Object Class 1:1 1:1 Property Property 1:1 Representation CODATA/DSAO 2002

  12. Overview on the MDR: Specification • Specification of Data Element • Basic Attribute for specifying data element CODATA/DSAO 2002

  13. Overview on the MDR: An Example • Definition of a metadata element CODATA/DSAO 2002

  14. The scientific data properties • The scientific data(knowledge) has the following properties: • the general data • most people can understand and use it easily. • most databases in the scientific fields have the similar or same data elements. • the specialized data • are more complicated and detailed. • the general users can’t understand it. • the experts in the specific group are interested in the data, and can utilize it. ※ Building the MDR for all data as a whole is not necessary CODATA/DSAO 2002

  15. User levels and the data property • Classification of users • The users are classified into two groups according to the scientific data property • The general users and the specialized users. • The general users • use the general data in high-level and in the many fields. • The specialized users • domain experts in a specific field. • use the general data and specialized data. • also differentiated into more detailed fields. i.e., The specialized users are distributed into several groups, the experts in each group are interested in more specialized data independently. CODATA/DSAO 2002

  16. Data visibility • Data visibility • The quantity and the specialized degree is differentiated into several levels according to the knowledge property, • and each level has a independent data set general users all users detailed -specialized users 1 used by all users set 1 used by specialized users specialized users set 2 . . . used in independent expert domain group set 3 set 4 detailed -specialized users n set 5 the whole data set CODATA/DSAO 2002

  17. . . . The conceptual relation diagram General User 1 General User 2 General User n . . . Generalization Globalization Domain Expert 1 Global MDR Domain Expert 2 Local MDR 1 (Domain 1) Local MDR 2 (Domain 2) Local MDR m (Domain m) . . . Domain Expert n DB m1 DB m2 DB 11 DB 12 DB 21 DB 22 . . . DB mn DB 2n . . . . . . . . . DB 1n Domain m Domain 1 Domain 2 Specialization Localization CODATA/DSAO 2002

  18. User Interface Layer Global MDR Layer (Generalized Layer) Local MDR Layer (Specialized Layer) Factual Database Layer The conceptual model of the LoG • The LoG methodology has four layers • Interface Layer • provides the user interface environments for all users. • Global MDR Layer • manages the global MDR for the most generalized and common data which all users(general and specialized users) utilize and access. • Local MDR Layer • manages the local MDRs for the specialized data which the experts use. • The local MDR may be hierarchical structure. • Factual Database Layer • manages the low and factual data. CODATA/DSAO 2002

  19. GMeta Repository GMDR A LoG Framework(1/2) User Interface Layer Global User Interface (General User Level Interface) Global MDR Layer General User Level Interface Agent GMDR Agent (Registration, Classification) LMDRs LMeta Repository (Sets of actual metadata) Local MDR Layer Local User Interface (Expert Level Interface) LMDR 1 LMDR 2 LMDR n … Expert Level Interface Agent LMDR Agent (Registration, Classification, Authorization) Factual DB Layer Factual DB Layer DB m1 DB m2 DB 11 DB 12 DB 21 DB 22 . . . DB mn DB 1n DB 2n . . . . . . . . . Domain m Domain 1 Domain 2 CODATA/DSAO 2002

  20. A LoG Framework(2/2) • Interface Layer • Global user interface and local user interface sub-layers • Global MDR layer • GMDR agent • manage the GMDR(global MDR) and the GMeta(global metadata repository). • GMDR(global MDR) • a standardized guideline for general users and experts. • the set of metadata elements used commonly in all databases. • GMeta(global metadata repository) • the set of actual metadata • Local MDR layer • LMDR agent • manage the LMDRs and the LMeta • LMDRs(local MDRs) • a standardized guideline for the specialized users. • a set of metadata elements which is to generalize data in each field or detailed field. CODATA/DSAO 2002

  21. An Example GMDR LMDRs . . . . . . CODATA/DSAO 2002

  22. Conclusions and Future work • Conclusions • We considered and defined the domain knowledge property • The LoG methodology is proposed with the knowledge property • provides a dynamic integration mechanism partially. • provides a standardization guideline based on ISO/IEC 11179, the international standard. • reduces unnecessary costs from analysis and design all databases for creation of a global view. • Future work • to analyze and define the domain knowledge property in detail • to implement a prototype based on the framework we described CODATA/DSAO 2002

  23. Q / A Thanks !

More Related