310 likes | 409 Views
CLAS12 Software. D.P. Weygand Thomas Jefferson National Accelerator Facility. Projects. ClaRA Simulation GEMC CCDB Geometry Service Event Display Tracking SOT Gen III Event Reconstruction Post-Reconstruction Data Access Data-Mining Slow Controls Documentation Doxygen Javadoc
E N D
CLAS12 Software D.P. Weygand Thomas Jefferson National Accelerator Facility
Projects ClaRA Simulation GEMC CCDB Geometry Service Event Display Tracking SOT Gen III Event Reconstruction Post-Reconstruction Data Access Data-Mining Slow Controls Documentation Doxygen Javadoc Testing/Authentication Detector Subsystems (Reconstruction and Calibration) EC PCAL FTOF CTOF LTCC HTCC OnLine Code Management SVN Bug Reporting Support Visualization Services Support Packages Eg. CLHEP Root
Service Oriented Architecture Overview: Services are unassociated loosely coupled units of functionality that have no calls to each other embedded in them. Each service implements one action. Rather than services embedding calls to each other in their source code, they use defined protocols that describe how services pass and parse messages. SOA aims to allow users to string together fairly large chunks of functionality to form ad hoc applications that are built almost entirely from existing software services. The larger the chunks, the fewer the interface points required to implement any given set of functionality; however, very large chunks of functionality may not prove sufficiently granular for easy reuse. Each interface brings with it some amount of processing overhead, so there is a performance consideration in choosing the granularity of services. The great promise of SOA suggests that the marginal cost of creating the nth application is low, as all of the software required already exists to satisfy the requirements of other applications. Ideally, one requires only orchestration to produce a new application.
SOA/Complexity SOA is principally based on object oriented design. Each service is built as a discrete piece of code. This makes it possible to reuse the code in different ways throughout the application by changing only the way an individual service interoperates with other services that make up the application, versus making code changes to the service itself. SOA design principles are used during software development and integration. Software complexity is a term that encompasses numerous properties of a piece of software, all of which affect internal interactions. There is a distinction between the terms complex and complicated. Complicated implies being difficult to understand but with time and effort, ultimately knowable. Complex, on the other hand, describes the interactions between a number of entities. As the number of entities increases, the number of interactions between them would increase exponentially, and it would get to a point where it would be impossible to know and understand all of them. Similarly, higher levels of complexity in software increase the risk of unintentionally interfering with interactions and so increases the chance of introducing defects when making changes. In more extreme cases, it can make modifying the software virtually impossible.
ClaRaand Cloud Computing • Address physics data processing major components as services. • Services and information bound to those services can be further abstracted to process layers and composite applications for developing various analyses solutions. • Agility, or the ability to change physics data processing process on top of existing services. • Ability to monitor points of information and points of service, in real time, to determine the well-being of entire physics data processing application. SOA is the choice for ClaRA as a key architecture: highly concurrent: cloud computing.
ClaRA Stress Test V. Gyurjyan S. Mancilla JLAB Scientific Computing Group
ClaRA Components Platform (cloud controller) DPE compute node C S “service” “container” DPE DPE C S Orchestrator C S
16 core hyper-threaded (no IO) 150 ms/event/thread Event Reconstruction Rate vs Number of Threads kHz
Batch job submission <Request> <Project name="clas12" /> <Track name="reconstruction" /> <Name name="clara-test" /> <CPU core="16" /> <TimeLimit time="72" unit="hours" /> <Memory space="27" unit="GB" /> <OS name="centos62"/> <Command><![CDATA[ setenv CLARA_SERVICES /group/clas12/ClaraServices; $CLARA_SERVICES/bin/clara-dpe -host claradm-ib ]]></Command> <Job></Job> </Request>
Single Data-stream Application Farm Node N Sn S1 S2 Sn S1 S2 Sn S1 S2 Executive Node ClaRA Master DPE R AO Persistent Storage W orchestrator Administrative Services
Multiple Data-stream Application Sn S1 S2 Sn S1 S2 Sn S1 S2 W R Persistent Storage DS Farm Node N Persistent Storage Executive Node AO Administrative Services ClaRA Master DPE
Batch queue Common queue Exclusive queue : CentOS 6.2 16 core, 12 processing nodes
Single Data-stream ApplicationClas12 Reconstruction: JLAB batch farm
Computing Capacity Growth Today: 1K cores in the farm (3 racks, 4-16 cores per node, 2 GB/core) 9K LQCD cores (24 racks, 8-16 cores per node 2-3 GB/core) 180 nodes w/ 720 GPU + Xeon Phi as LQCD compute accelerators 2016: 20K cores in the Farm (10 racks, 16-64 cores per node, 2 GB/core) Accelerated nodes for Partial Wave Analysis? Even 1st Pass? Total footprint, power and cooling will grow only slightly. Capacity for detector simulation will be deployed in 2014 and 2015, with additional capacity for analysis in 2015 and 2016. Today Experimental Physics has < 5% of the compute capacity of LQCD. In 2016 it will be closer to 50% in dollar terms and number of racks (still small in terms of flops).
Compute Paradigm Changes Today, most codes and jobs are serial. Each job uses one core, and we try to run enough jobs to keep all cores busy, without overusing memory or I/O bandwidth. Current weakness: if we have 16 cores per box, and run 24 jobs to keep them all busy, that means that there are 24 input and 24 output file I/O streams running just for this one box! => lots of “head thrashing” in the disk system. Future: most data analysis will be event parallel (“trivially parallel”). Each thread will process one event. Each box will process 1 job (DPE) 32-64 events in parallel, with 1 input and 1 output => much less head thrashing, higher I/O rates. Possibility: the farm will include GPU or Xeon Phi accelerated nodes! As software becomes ready, we will deploy it!
PATTERN RECOGNITION ALGORITHMS FOR THE DRIFT CHAMBERS Tested on calibration and simulated data V. Ziegler & M. Mestayer
FTOF Reconstruction Jerry Gilfoyle & Alex Colvill • Code to reconstruct the signals from the Forward Time-of-Flight system (FTOF) * • Written as a software service so that it can be easily integrated into ClaRA framework. • FTOF code converts the TDC and ADC signals into times and energies and corrects for effects like time walk. • The position of the hit along the paddle determined by the difference between the TDC signals and the time of the hit -- reconstructed using the average TDC signal and correcting for the propagation time of light along the paddle. • Energy deposited extracted from the ADC signal and corrected for light attenuation along the paddle. Modifications to this procedure applied when one of more of the ADC or TDC signals are missing. • FTOF code is up and running and will be used in the upcoming ‘stress test’ of the full CLAS12 event reconstruction package. • * modeled after the ones used in the CLAS6 FTOF reconstruction and tested using Monte Carlo data from the CLAS12, physics-based simulation gemc. • Fig. 1: histogram of the number Nadj of adjacent paddles in a cluster normalized to the total number of events. Clusters are formed in a single panel by grouping adjacent hits together. The red, open circles are Nadj for panel 1b. The black, filled squares are for panel 1a which is behind panel 1b relative to the target. Most events consist of a single hit, but there is a significant number that have additional paddles in each cluster.
Intel Xeon Phi MIC Processor The Intel Xeon Phi KNC processor is essentially a 60-core SMP chip where each core has a dedicated 512-bit wide SSE (Streaming SIMD Extensions) vector unit. All the cores are connected via a 512-bit bidirectional ring interconnect (Figure 1). Currently, the Phi coprocessor is packaged as a separate PCIe device, external to the host processor. Each Phi contains 8 GB of RAM that provides all the memory and file-system storage that every user process, the Linux operating system, and ancillary daemon processes will use. The Phi can mount an external host file-system, which should be used for all file-based activity to conserve device memory for user applications.
CLAS12 Constants Database (CCDB) Johann Goetz & Yelena Prok
Proposed Programming Standards • Programming standards create a unified collaboration among members • Standards and documentation increase the expected life of software by creating unified design aiding in future maintenance • This is a proposed working standard and is open for suggestions and modification. It is found on the Hall-B wiki.
Profiling and Static Analysis 150 ms/event/thread • Profiling is a system of dynamic program analysis. • Individual call stacks • Time analysis • Memory analysis • Static Analysis is a pre-run analysis that pinpoints areas of potential error. • Possible bugs • Dead code • Duplicate code • Suboptimal code
Testing • Unit Testing • extends life of code • catch errors made by modification • decrease amount of debugging • decrease amount of suboptimal code • Testing is a required portion of the proposed coding standards and decreases the amount of time spent working with incorrect code.
PYTHON/ClaRa Python has a simple syntax that allows for very quick prototyping of experimental analysis services, along with a large number of incredibly useful built in functions and data-types. There are also a huge number of open source, highly optimized, and well documented computational analysis modules available to be imported into any analysis service. A small handful of the supported areas are:
PYTHON/ClaRa The existing Cmsg Protocol will be wrapped into an importable Python module that provides the needed methods to receive and send a Cmsg Container to/from the Python Service and CLARA. Imported Existing CMsg written in C Python Analysis Service PythonWrapper If the received Cmsg Container contains EVIO data, a separate EVIO Support module can be imported in order to provide the needed functions to read and append data to the EVIO Event stored in the Cmsg container. EVIO Support Module Written in Python Imported
Summary • Scaled SOA implemented successfully via ClaRa Software infrastructure is being integrated into the Jlab batch farm system More services need to written More Orchestrators/Applications • Progress on Major Systems CCDB Geometry Service Reconstruction Tracking TOF & EC • Testing/Standards/QA Actively being developed
Summary cont. • User Interfaces/Ease of Use • GUI to CCDB • Data Handling • Data Access/Mining • EVIO data access via dictionary • Virtual Box • Programming Libraries: ROOT,ScaVis, SciPy/NumPy … • Examples Examples Examples