1 / 20

CLA s12 R econstruction and A nalyses framework

CLA s12 R econstruction and A nalyses framework. V. Gyurjyan 1 , S. Mancilla 2 , R. Oyarzun 2 , C. Timmer 1 , G. Gavalian 1 , A. Bartle 3 , S. Paul 4. Outline. Design requirements Design choices Implementation Conclusions. Goals.

moya
Download Presentation

CLA s12 R econstruction and A nalyses framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLAs12 Reconstruction and Analyses framework V. Gyurjyan1, S. Mancilla2, R. Oyarzun2, C. Timmer1, G. Gavalian1, A. Bartle3, S. Paul4

  2. Outline • Design requirements • Design choices • Implementation • Conclusions V. Gyurjyan

  3. Goals • Physics data processing as fast as it is reasonably achievable • Challenge traditional approaches we currently use. • Being open minded, flexible and adaptive to new technologies and ideas. • Prevent data pollution • Unprocessed data is worse than garbage data • Keep up with ever growing data production rates. • Learn from practices used in industry. • Say NO to data-racism ! • We all are bytes V. Gyurjyan

  4. Science Data 3V Expansion • Unprecedented growth of data • Volumes • Data at rest is growing exponentially. • EOS: 1Exabyte in 10 years. • LHC: 12-14 Petabyte/year. In 2023 400Petabyte/year. • SKA: 22Exabyte/year in 2023 • Velocities • Data acquisition rates, as well as new data producing devices are in rise. • LHC: 1.5GByte/sec • SKA: 700Terabyte/sec • Varieties • Plethora of data formats, data structures and data types V. Gyurjyan

  5. Design Requirements • Elastic • Vertical and horizontal scaling • Scaling back when resources are no more available • Data and algorithm separation • Distributed data • Distributed processing • Agility • Maintainable • Adaptable • Extendable • Ready for Big Data and more V. Gyurjyan

  6. CLARA Application Design • Application is defined as a network of loosely coupled processes, called services (SOA/FBP). • Services exchange data across predefined connections by message passing, where connections are specified externally to the services. • Services can be requested from different data processing applications. • Loose coupling of services makes polyglot data access and processing solutions possible • Services communicate with each other by exchanging the data quanta. • Thus, services share the same understanding of the transient data, hence the only coupling between services.

  7. Traditional vs. Service Based V. Gyurjyan

  8. CLARA Architecture Orchestration Layer Local Registration Local Registration DPE DPE SC SC SC SC S S FE S S S S Registration Proxy/gateway security Local Registration Local Registration DPE DPE SC SC SC SC S S S S S S Service Layer MPI Bus V. Gyurjyan

  9. CLARA Distributed File Storage System CDFS Master DPE • Heartbeat/2sec. • Available local SSD/HHD • Available memory • Stored Files • File name • Status: • active • passive_hot • Passive_cold • Passive_rady_remove • File Metadata DB • Key = File Name • Value= DPE Info Metadata of the Dataset to be processed • Control via SSH • Stage • Promote • Demote • Remove V. Gyurjyan

  10. Data Unit (event) Stream Processing • Data driven, data centric design. • The focus is on transient data event modifications. Advantage over algorithm driven design is that a much greater ignorance of the data processing code is allowed (loose coupling). • Design = service composition + event-routing. • Self routing (no routing scheduler) • Event routing graph defines application algorithm V. Gyurjyan

  11. CLARA Elasticity • Auto scales vertically to acquire processing units in a single node • Auto scales horizontally to use available computing resources in an accessible network. The same data processing experience: done on a desktop or on a cloud.

  12. CLARA Application Design S1 S2 S3 S4 S1 + S2 + S3 + S4; S3 + S5 +S6; S5 S6 F1 F2 S1 + S2 + S3; while( S3 == "xyz") { F1 + F2 +S3; } S3 + S4 + S6; S1 S2 S3 S4 S6 V. Gyurjyan

  13. Clas12 Reconstruction Application DPE EC HT TOF TT PID R W EC HT TOF TT PID Stream sink Stream source SS Memory Shared File System Computing Node 1 OR Computing Node N

  14. CLAS12 Reconstruction Vertical Scaling AMD 32 core Intel Haswell 24/48 core V. Gyurjyan

  15. CLAS12 Reconstruction Horizontal Scaling Intel Haswell 24/48 core system V. Gyurjyan

  16. NAIADS Parallel Event Building Application (Stream builder) Single service with composite engine Next Close files Splitter Orbit File EB Filter Search & Add Search & Convolve Clouds Search & Convolve Aerosols Search & Interpolate Convolve File File File File File Single service with composite engine File names Done with the orbit Staging Service N next File open IO optimized EB Node/DPE 1 In Memory Event Storage Queue Event Source EB Orchestrator Shared File System Network Optimized. Single core Node/DPE IO optimized EB Node/DPE N V. Gyurjyan

  17. NAIADS Parallel Event Processing Application Close files Stats Writer File Joiner N-Tuple Writer Watts Clean Resample PR Filter Close files File Joiner Writer Single service with composite engine Done Next File Done with the orbit Staging Service In Memory Event Storage Queue Event Source IO optimized Processing Node/DPE 1 Stage and delete files Event Processing Orchestrator Network Optimized. Single core Node/DPE Shared File System N next IO optimized Processing Node/DPE N V. Gyurjyan

  18. NAIADS Vertical Scaling • AWS c4.8xlarge instances, 36 vCPUs ~= 18 physical cores • Rate based on average workflow execution over 10 SCIAMACHY orbit measurements V. Gyurjyan

  19. NAIADS AWS Horizontal Scaling • Rate based on time to process 16 SCIAMACHY orbit measurements V. Gyurjyan

  20. Conclusions • CLARA: Distributed data, distributed process, elastic, data stream processing framework. • Bindings in Java, Python and C++ (no customers. Currently behind) • Outreach • V. Gyurjyan et al, “CLARA: A Contemporary Approach to Physics Data Processing”,Journal of Physics: Conf. Ser. Volume 331, 2011 • C. Lukashin et al, “Earth Science Big Data Challenge: Data Fusion NAIADS Concept and Development" presentation at the NASA Earth Science Technology Forum, Pasadena, CA, June 2015. • V. Gyurjyan et al, “Component Based Dataflow Processing Framework “. IEEE Big Data in Geoscience Workshop, Santa Clara, CA, 2015 • C. Lukashin et al., “Earth Science Data Fusion with Event Building Approach ”. . IEEE Big Data in Geoscience Workshop, Santa Clara, CA, 2015 • S. Mancilla et al, “CLARA: CLAS12 Reconstruction and Analysis Framework”, ACAT 2016, January 2016, Valparaíso, Chile • Code • https://github.com/JeffersonLab/clara-java.git • https://github.com/JeffersonLab/clara-python.git • Documentation • claraweb.jlab.org V. Gyurjyan

More Related