120 likes | 398 Views
Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty Dario Favretto. Dario Favretto 9 September 2002 1. Summary. ALTRO data format Data compression based on standard Huffman technique (ref. A. Nicolaucig, M. Mattavelli, S. Carrato)
E N D
Production and Compression of Raw data for Time Projection Chamber Ajit Kumar Mohanty Dario Favretto Dario Favretto 9 September 2002 1
Summary • ALTRO data format • Data compression based on standard Huffman technique (ref. A. Nicolaucig, M. Mattavelli, S. Carrato) • Using one table • Using 5 tables • Preliminary results • Future developments Dario Favretto 9 September 2002 2
Altro Data Format • ALTRO (Alice Tpc Read Out) • Only the samples over a given threshold are considered (while the others are discarded) • A Bunch is a group of adjacent over threshold samples coming from one pad (The signal can be represented bunch by bunch). • Information relative to one pad is stored in one packet A packet is a sequence of 10 bit words (range 0 -1023) followed by a trailer • Bunch length (number of samples in the bunch) • Time information (temporal position of the last sample in the bunch • Sequence of amplitude values Trailer • Number of words in the packet (10 bits) • Hardware and channel address (8 and 4 bit respectively) Dario Favretto 9 September 2002 3
Compression • Lossless compression technique • Static Huffman coding • Variable length coding technique based on frequency of the symbols (symbols that appear more frequently are coded with a shorter sequence of bits respect to those symbol that appear less frequently in the source file • Static means that the algorithm is based on one or more tables that are built before the compression phase according to the frequency of the symbols Dario Favretto 9 September 2002 4
Compression using one table • Frequency distribution using one table (entropy: 4.97) Dario Favretto 9 September 2002 5
Results • Compression applied on a source file generated simulating one event of 1000 primaries • Threshold value: 2 (Source file dimension 6.5 MB) • Huffman (Dimension of the compressed file: ~3.5 MB) 54% • Gzip (Dimension of the compressed file: ~4.5 MB) 69% • Threshold value: 5 (Source file dimension 1.4 MB) • Huffman (Dimension of the compressed file: ~0.9 MB) 68% • Gzip (Dimension of the compressed file: ~1.2 MB) 83% • Threshold value: 10 (Source file dimension 1 MB) • Huffman (Dimension of the compressed file: ~0.7 MB) 72% • Gzip (Dimension of the compressed file: ~0.9 MB) 85% Dario Favretto 9 September 2002 6
Compression using 5 tables Improvement in compression can be obtained considering the nature of the data. Most of the bunches have a pseudo Gaussian shape in which first and last sample have a smaller value with respect to those in central position. • Samples are classified in three categories (each category correspond to a table) • Isolated samples • Border samples • Central samples • Two more tables are used to store the frequency for the Time-Bin values and bunch length values. Dario Favretto 9 September 2002 7
Frequency distribution Entropy • Bunch length: 1.00 • Bunch of 1 sample: 0.36 • Border samples: 4.43 • Central Samples: 6.95 Dario Favretto 9 September 2002 8
Results • Compression applied on a source file generated simulating one event of 1000 primaries • Threshold value: 2 (Source file dimension 6.5 MB) • Huffman (Dimension of the compressed file: ~3.5 MB) 54% • Huff. 5 Table (Dimension of the compressed file: ~2.8 MB) 42% • Threshold value: 5 (Source file dimension 1.4 MB) • Huffman (Dimension of the compressed file: ~0.9 MB) 68% • Huff. 5 Table (Dimension of the compressed file: ~0.8 MB) 55% • Threshold value: 10 (Source file dimension 1 MB) • Huffman (Dimension of the compressed file: ~0.7 MB) 72% • Huff. 5 Table (Dimension of the compressed file: ~0.6 MB) 57% Dario Favretto 9 September 2002 9
Results • Compression applied on a source file generated simulating one event of 10000 primaries • Threshold value: 2 (Source file dimension 21.8 MB • Gzip (Dimension of the compressed file: ~17.5 MB) 80% • Huff. 5 Table (Dimension of the compressed file: ~10.7 MB) 49% Dario Favretto 9 September 2002 10
Main Macros and Classes • StoreDigits.C is a macro that creates a binary file (DigitsData.dat) containing the sequence of digits (Amplitude, Time-bin, Sector, Row and Pad number) • AliTPCBuildAltroFormat.C is a macro used to generate the Altro format file (AltroFormat.dat) from DigitsData.dat. • AliTPCBuffer160 is a class used to read/write values according to the Altro data format (10 bits words) • AliTPCHNode and AliTPCHTable are classes used to create and manage the tables used by Huffman coding. • AliTPCHCompression class for the implementation of compression and decompression based on one table • AliTPCCompression class for the implementation of compression and decompression based on 5 table Dario Favretto 9 September 2002 11
Future developments • Test phase using bigger source file (80000 primaries) • Complete the implementation of the Altro data format • Optimize frequency tables independently of a particular source file • Improve the compression factor • Abstract the classes to make them available for others detectors (ITS) Dario Favretto 9 September 2002 12