150 likes | 280 Views
ALTRO data preparation and clustering algorithms. Marco Villa – CERN 24 th May 2011. Outline. ALTRO data clean–up: Data analysis paradigm & clean–up framework Implementing the clean–up Selecting the cuts Choosing the distributions Clustering algorithms: Purpose & boundary conditions
E N D
ALTRO data preparation and clustering algorithms Marco Villa – CERN 24th May 2011
Outline • ALTRO data clean–up: • Data analysis paradigm & clean–up framework • Implementing the clean–up • Selecting the cuts • Choosing the distributions • Clustering algorithms: • Purpose & boundary conditions • Agglomerative hierarchical approach • Current implementation
Paradigm & framework Data analysis paradigm Clean–up framework • Different readout electronics used (ALTRO, APV, BNL) • Interesting data from all electronics • Not all chambers fully tested with all electronics • Redundant info in ALTRO Common data format will avoid analysis code replication
Implementing the clean–up • ALTRO ntuples have 2 sets of values for each firing channel: high gain and low gain • Each signal is fitted and fit results are stored • Clean–up must take care of selecting the best charge and time value for each strip: • Use fit charge from high gain • Use fit charge from low gain (rescaled) • Use sample charge from high gain • Use sample charge from low gain (rescaled) • Item unusable (label as “–1”)
Selecting the cuts • Items validation through cuts: • For all high gain values: • Overflow cut @ qs=1000 (ADC saturation @ 1023) • For all fit values: • Low tau cut @ =2 • High tau cut @ =4 • Fitness cut (F = fit, S = sample, P = pedestal): • F – S + P = 0 (my original distribution) • F / (S – P) = 1 • S / (F + P) = 1 (Dimos’ distribution) • P / (S – F) = 1
Choosing the distributions (1) High gain value Low gain value 132 % Runs 4857, 4858, 4861, 4867, 4868: R12, Ar:CO2 85:15, 0 angle, 570 / 870 V
Choosing the distributions (2) High gain value Low gain value 5 % Runs 4857, 4858, 4861, 4867, 4868: R12, Ar:CO2 85:15, 0 angle, 570 / 870 V
Choosing the distributions (3) High gain value Low gain value 141 % Runs 4857, 4858, 4861, 4867, 4868: R12, Ar:CO2 85:15, 0 angle, 570 / 870 V
Choosing the distributions (4) High gain value Low gain value 51 % Runs 4857, 4858, 4861, 4867, 4868: R12, Ar:CO2 85:15, 0 angle, 570 / 870 V
Purpose & boundary conditions • Purpose: coding a software module that, given a data file in “standard format”, produces an output file with data and cluster information • Boundary conditions: 0 impact angle
Agglomerative hierarchical method • Hierarchical clustering seeks to build a hierarchy of clusters. It can be of 2 types: • Divisive: top–down approach, in which all observations start in one cluster, and splits are performed recursively • Agglomerative: bottom–up approach, in which each observations starts in its own cluster, and clusters are merged (in pairs) • Using an agglomerative algorithm with custom step–0 clustering (primary clusters)
Current implementation • Step–0: primary clusters are formed from neighboring firing strips; • Iterative merging (in pairs): • if clusters are “close”: • if both clusters have unitary size ask user • if only one cluster has unitary size check amplitude • if no cluster has unitary size: • Set proper starting points and boundaries for fits • Fit both clusters with a gaussian • if gaussians are not resolved then merge clusters
Conclusions & outlooks • ALTRO data clean–up: • Framework ready • Selection works and produces clean output files in “standard format” • Timing correction can be implemented • Clustering algorithms: • Framework ready • Primary clustering works fine • Hierarchical clustering works in most of the cases, only needs some parameter tuning