1 / 28

CORE Architecture Mauro Bruno, Monica Scannapieco , Carlo Vaccari, Giulia Vaste

CORE Architecture Mauro Bruno, Monica Scannapieco , Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat). 1. 1. CORE Objective. Provide a unique environment for: Designing Statistical processes in terms of abstract services Exchanged data and metadata Running

Download Presentation

CORE Architecture Mauro Bruno, Monica Scannapieco , Carlo Vaccari, Giulia Vaste

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat) 1 1

  2. CORE Objective • Provide a unique environment for: • Designing • Statistical processes in terms of abstract services • Exchanged data and metadata • Running • Designed processes by invoking existing (wrapped) tools

  3. CORE Design: Services • Abstract services: specify a well-defined functionality in a technology-independent way • An abstract service can be implemented by one or more concrete services, i.e. IT tools • Examples: sample allocation, record linkage, estimates and errors computation, etc.

  4. CORE Design: Services • GSBPM classification • Documentation purpose • Provided that a CORE service can be linked to IT tools, GSBPM tagging enables the performance of a search e.g. retrieving “all the IT tools implementing the 5.4 Impute subprocess of GSBPM proposal”

  5. CORE Design: Services • Service inputs and outputs • Specified by logical names • Characterized with respect to their “role” in data exchange • Non-CORE: if they are not provided by/to other services of the process, but are only “local” to a specific service • CORE: they are passed by/to other services and hence they do need to undergo CORE transformations

  6. CORE Design: Data and Metadata • They are specified as service inputs and outputs • Logical names link them to previously specified services • Non-CORE data only need the file system path where they can be retrieved

  7. CORE Design: CORE Data The specification of CORE data is provided by 3 elements: • Domain descriptor • CORE data model • Mapping model

  8. Domain Descriptor: Model Entity • Like “entities” in Entity Relationships Entity properties • Like “attributes” in Entity Relationships Very simple (meta-)model: can easily describe other evolving models e.g.GSIM

  9. Domain Descriptor: Example <schema name="DEMO_Domain_Descriptor"> <entity name="SamplePlan"> <property name="STRATIFICATION_VAR"/> <property name="STRATUM_SAMPLE_SIZE"/> <property name="STRATUM_POPULATION_SIZE"/> </entity> <entity name="Enterprise"> <property name="IDENTIFIER"/> <property name="STRATIFICATION_VAR"/> <property name="WEIGHT"/> <property name="SAMPLING_FRACTION"/> <property name="ENTERPRISE_FLAG"/> <property name="EMPLOYEES_NUM"/> <property name="VALUE_ADDED"/> <property name="AREA"/> </entity> </schema>

  10. Domain Descriptor: Role Role of the Domain Descriptor (DD): from service-to-service data mapping to service-to-global data mapping i1 S1 O1 mapped to i2 Via ad-hoc mapping o1 i2 o1 DD o1 DD O1 mapped to i2 Via DD i2 DD i2 S2 o2

  11. CORE Data Model • Rectangular data set • CORE tag: • Data set level (mandatory) • Column level (optional) • Rows level (optional) • Data set kind • Column kind 11 11

  12. CORE Data Model: Role • Specified once and valid for all processes • Extensible, i.e. core tag, data set kind, column kind can be modified • Adds more semantics to data • Example of usage: mapping to other models

  13. Mapping Model • Rectangular data assumption • Mapping is intended to be specified with respect to Domain Descriptor • Columns are to be mapped to properties of an entity • It contains the specification of how CORE data model concepts are associated to data 13 13

  14. CORE Logical Architecture GUI Integration APIs SERVICES CORE Repository … Runtime Process Engine 14 14

  15. CORE GUIs • Process design • Ad-hoc customization of an existing tool (Oryx) • Service data flow • Service design • Set of interfaces for the definition of services and related data flow • Data design • Set of interfaces for the specification of domain descriptors and mapping files 15

  16. Process design: Oryx • Oryx is an academic open source framework for graphical process modeling • Based on web technology • Extensible via a plugin mechanism and new stencil sets • Supports BPMN and other process modeling languages • Programming language Javascript and Java, internal data format based on RDF 16

  17. Stencil Set • Set of graphical objects and rules that specify how to relate those graphical objects to others • Additional properties that can later be used by other applications or Oryx extensions (e.g. setting element colors and visibility) • Can be used to build process models 17 17

  18. The CORE Stencil Set • Graphical representation of CORE processes • Easy-to-use editor (desktop feeling) • Easy-to-extend source (JSON) • Defined from BPMN • Guarantees complete BPMN compliance

  19. Integration APIs • Purpose: wrapping a tool by a CORE service • Translates inputs and outputs of the tool in a completely transparent and automatic way CORE Service

  20. Repository • Processes and their instances • Services with their GSBPM and CORE classifications • Tools and their runtime features • Data with their logical classification within CORE processes

  21. Process Engine • Official statistics processes can be viewed from two perspectives: • Functional: they are data-oriented,reflecting acommon feature of scientific workflows • Organizational: they are workflow-oriented,have the complexity of real production lines, with the need for harmonizing the work of different actors 21

  22. Process Engine • Hence our process engine has two layers WF ENGINE • Complex control flows • Syncronizing constructs, cycles, conditions, etc. • E.g.: Interactive multi-user editing imputation • Simple control flows • Sequence of tasks is composed by connecting the output of one task to the input of another • Data intensive operations DATA FLOW CONTROL SYSTEM 22

  23. Implementation issues • Java web application implementing: • GUIs • CSV-CORE Integration API • Data flow control system • Layered design firmly based on frameworks: • Hibernate: database mapping • Struts2: model-view-controller approach • Repository implementation: MySQL dbms 23

  24. Web Application Design Business Logic Data access View (GUI) Model Controller Forms Actions Services DAOs Entities Input validation Hibernate Struts2 24

  25. Architecture Deployment • Web architecture based on a centralized component • CORE Environment • Different CORE deployments can co-exist • Intra- or Inter- organization • Services can be remotely executed • Support is needed in the form of a distributed component for tool execution and data transfer

  26. Type of runtime services • Batch • Tool executed by a command line call • Can be automated • Interactive • User interacts with the tool through the GUI provided by the tool • Cannot be automated • Web service • No tool procedure distributed on a web service actived by a programming language call • Can be automated

  27. CORE Distributed Deployment CORE Environment Batch-Interactive runtime Runtime Definition Repository GUI Runtime agent Process Engine Integration APIs Web service runtime Runtime Remote activation Web container Web service client

  28. Conclusions • CORE implementation is a proof-of-concept prototype showing: • Real implementation of industrialized (standardized and automated) statistical processes • Reuse of IT tools possibly developed on different platforms and by different NSIs • GSBPM-aware services implementation • A unique common data model enabling integration of heterogeneous data exchanged between services • Openess to evolving statistical information models (e.g. GSIM)

More Related