340 likes | 660 Views
High data-rate readout logic design for a 1024 512 pixel array dedicated for CEPC experiment.
E N D
High data-rate readout logic design for a 1024512 pixel array dedicated for CEPC experiment Xiaomin Wei1, Bo Li1, Wei Wei2, Tianya Wu3,4, Ying Zhang2, Xiaoting Li2, Liang Zhang5, Weiguo Lu2, Zhijun Liang2, Jianing Dong5, Long Li5, Jia Wang1, Ran Zheng1, Raimon Casanova Mohr4, Sebastian Grinstein4, Yann Hu6, and Joao Guimaraes da Costa2 1) Northwestern Polytechnical University, Xi’an, China 2) Institute of High Energy Physics Chinese Academy of Sciences, Beijing, China 3) Central China Normal University, Wuhan, China 4) Institut de Fisicad’Altes Energies, Spain 5) Institute of Frontier and Interdisciplinary Science and Key Laboratory of Particle Physics and Particle Irradiation, Shandong University, Qingdao, China 6) University of Strasbourg, France
Outline • 1. Introduction • 2. Requirements • 3. Circuit design • 4. Experimental results • 5. Conclusion 21st iWoRiD weixm@nwpu.edu.cn
Outline • 1. Introduction • 2. Requirements • 3. Circuit design • 4. Experimental results • 5. Conclusion 21st iWoRiD weixm@nwpu.edu.cn
Introduction • CEPC Vertex Detector * Ref: Status of vertex detector, Q. Ouyang, International workshop on CEPC, Nov. 2017 Monolithic CMOS Pixel Sensors (CPS) is preferred! 21st iWoRiD weixm@nwpu.edu.cn
Previous CPS prototypes • Ref Y. Zhang: “IHEP CMOS pixel sensor activities for CEPC”, 2018.3 The previous chips focus on the study of sensing diode and analog pixel. This work aims to study the design of a high data-rate readout logic for a 1024*512 matrix. 21st iWoRiD weixm@nwpu.edu.cn
Outline • 1. Introduction • 2. Requirements • 3. Circuit design • 4. Experimental results • 5. Conclusion 21st iWoRiD weixm@nwpu.edu.cn
CPS design requirements(1) • Hit density and data rate of CPS for CEPC Hit rate: 40 MHz in average ( pixels to be read: 120 data/μs) Chip readout speed: 160 MHz @ Triggerless, 5 MHz @ Trigger frequency of 50 kHz • From CDR of CEPC * Estimation condition: • 1) The chip sensing active area is 3.2768 cm2 (1024*512 pixel array, 25 μm pixel pitch), 2) The cluster size is 3 pixels. Each hit pixel is recoded with 32 bits (Timestamp: 8 bits, pixel address: 19 bits). 3) The trigger latency is supposed 3~6 μs, and the average trigger rate is 50 kHz. 21st iWoRiD weixm@nwpu.edu.cn
CPS design requirements(2) • Dead time:500nsforaDcol (Double column) Efficiency : 99% Dead time : 500 ns • Hit number per 500 ns for a Dcol: 0.0395hits in average, considering 10 pixels in maximal. Readout time of one pixel < 50ns From Simulation of Z. LIANG The readout of one pixel in 50 ns can be satisfied by the design of ALPIDE or FEI3. This work will significantly consider the fast read out scheme and the data reduction methods in the peripheral readout circuits. 21st iWoRiD weixm@nwpu.edu.cn
Outline • 1. Introduction • 2. Requirements • 3. Circuit design • 4. Experimental results • 5. Conclusion 21st iWoRiD weixm@nwpu.edu.cn
Chip readout architecture • Dcols are read out in parallel considering data reduction. • Fast readout logic of 512 channels is realized. 21st iWoRiD weixm@nwpu.edu.cn
Dcol readout circuts • Dcol reader: • Recording the timestamp for each pixel group • Receiving the pixel address and realizing real-time compressing • FIFO1: Temporally storing the data from Dcol reader • Trigger discrimination logic (Trigger&match): • Trigger mode : only the matched data are sent to next. • Triggerless mode: all the data are sent to next. • Dcols mainly includes three parts to realize the functions of data readout, real-time data reduction and data matching in trigger mode. 21st iWoRiD weixm@nwpu.edu.cn
Data match in trigger mode • In order to reduce the pixel area, the timestamp is only recorded in Dcol level. Uncertain of the timestamp is considered in the trigger discriminating logic. 21st iWoRiD weixm@nwpu.edu.cn
Data match in trigger mode • Register control of trigger parameters: TRIGGER_LATENCY: 0-6 μs (8-bit register) TRIGGER_UNCERTAIN: 0-175 ns with step of 25 ns (3-bit register) => Only the data in recent 6 μ s are stored; the old data is discarded. Example of setting trigger latency and trigger uncertain • Example: Suppose the trigger signal comes at 6 μs, and users wants to acquire the hits from 2.925 μ s to 3.075 μs, then we should set TRIGGER_LATENCY as 8’d 123 (123*25 ns = 3.075 μs), and TRIGGER_UNCERTAIN as 3’b110 (6*25 ns = 0.15 μ s). 21st iWoRiD weixm@nwpu.edu.cn
Real-time data compression • Recording the address of the first pixels in a package, the following three pixels are indicated by a three-bit code, where “0” indicates no hit in the pixel and “1” indicates a hit pixel. Example of Data compression in peripheral logic 21st iWoRiD weixm@nwpu.edu.cn
Real-time data compression • To realize the real-time data compression, the main cost is to add a latch for storing the first address and add the compression code in each data package. 21st iWoRiD weixm@nwpu.edu.cn
Real-time data compression • Benefits: • Reduce the required depth of FIFO1 and FIFO2 • Reduce the data volume to be send off chip Data format of FIFO1 Data format of FIFO2 and output data 21st iWoRiD weixm@nwpu.edu.cn
Fast readout of 512 Dcols • The whole matrix is divided into 4 blocks considering system clock of 40 MHz and hit pixels of 120 MHz in triggerless mode. 21st iWoRiD weixm@nwpu.edu.cn
Fast readout of 512 Dcols • In bottom level (32 Dcol): Data driven & Address Priority readout => To read out the data quickly and avoid unnecessary readout • In top level: Hierarchical data MUX => To avoid data blocked by providing equal output opportunity • FIFO2 in each block => matching the interface speed. • The chip is divided into 4 blocks considering 40MHz system clock and 120 MHz data rate in triggerless mode. 21st iWoRiD weixm@nwpu.edu.cn
Bottom level : Data driven & Address Priority readout • Only the FIFO not empty is accessed. • 32 Dcols are divided to 4 groups. The 4 groups worked parallel. Priority control of 8 Dcols (The column on the right has the highest priority.) 21st iWoRiD weixm@nwpu.edu.cn
Top level: Hierarchical data multiplexer State machine for MUX 2 to 1 • Special MUX 2 to 1 : When A/B is reading, request from B/A are responded with higher priority. • A and B are responded alternatively in case that both of them are not empty. One can be read continuously only in case that the other one is empty. • Hierarchical data MUX 4 to 1 can be realized with three MUX 2 to 1. 21st iWoRiD weixm@nwpu.edu.cn
Outline • 1. Introduction • 2. Requirements • 3. Circuit design • 4. Experimental results • 5. Conclusion 21st iWoRiD weixm@nwpu.edu.cn
Simulation • Testbench • Hit generation • The hit data number is generated according Poisson Distribution and then the data are randomly assigned to the double columns. • Double column model • The FEI3-like timing and ALPIDE-like timing are modeling according the assigned hit number. • The pixel addresses can be set to random, increasing, descending respectively for simulating the read-time data compression. • Data analysis • Comparing the sending data and the receiving data. • Counting the data to calculate the actual data rate. 21st iWoRiD weixm@nwpu.edu.cn
Simulation • Results • All the functions are well supported: • Trigger and triggerless modes are simulated, respectively. • Different trigger latencies,trigger windows, data compression control, random data, increasing data, descending data, concessive data, inconsecutive data are considered. • Accept data rate: • Triggerless mode: 120-130 data/μs (limited by the output data frequency of 160MHz) • When activating the data compression , higher than 150 data/μs can be accepted. (depends on the data compressibility) • Trigger mode: > 160 data/us (limited by the depth of FIFO1, since most data are not read out.) The hit rate of 120 data/μscan be read out correctively in both trigger and hit pixels triggerless mode. 21st iWoRiD weixm@nwpu.edu.cn
Layout & Power consumption • Layout Area (TJ 0.18μm): 25.664x1.124 mm2 • Power consumption of peripheral analyzed by PrimeTime • Tigger : 25~30 mW/cm2 • Triggerless : 35~45 mW/cm2 • Power consumption of clock network takes 80%, and should be improved. 21st iWoRiD weixm@nwpu.edu.cn
Outline • 1. Introduction • 2. Requirements • 3. Circuit design • 4. Experimental results • 5. Conclusion 21st iWoRiD weixm@nwpu.edu.cn
Conclusion • This work presents the design of a high data-rate readout logic for a 1024 x 512 pixel array for CEPC. • Fast readout architecture : data-driven readout in bottom level and hierarchical data multiplexer in top level • Data reduction methods: real-time data compression; trigger mode. • The simulation results indicate the pixel hit in average of 120data/μs can be well processed. • The layout area and the power consumption still need to be improved in advance. 21st iWoRiD weixm@nwpu.edu.cn
Future work • Chip test • The readout architecture and block circuits will be evaluated by test. • Design improvement • Reduce the power consumption • Reduce the layout area 21st iWoRiD weixm@nwpu.edu.cn
Pixel address encoder (1) • ALPIDE-like • “Readint” is the state of a pixel. It is set when a hit comes and is reset at the falling edge of SYNC (synchronization signal for hit reading). • Encoding readint0, readint1, … by priority. 21st iWoRiD weixm@nwpu.edu.cn
Pixel address encoder (2) • FEI3-like • “Readint” is the state of a pixel. It is set when a hit comes and is reset at the positive edge of READ (synchronization signal for hit reading). • The priority logic guarantees only one Readint is “1” . 21st iWoRiD weixm@nwpu.edu.cn
Pixel readout timing 21st iWoRiD weixm@nwpu.edu.cn
Analog Pixel • Ref Y. Zhang: “Design of the pixel Analog”, chip design review of MOST2 2019.4 21st iWoRiD weixm@nwpu.edu.cn
Analog Pixel • Ref Y. Zhang: “Design of the pixel Analog”, chip design review of MOST2 2019.4 21st iWoRiD weixm@nwpu.edu.cn
Future work • Chip test • A reduced scale chip with 192×64 pixel array was submitted on June 2019. Readout of 128 Dcol was integrated. The readout architecture and block circuits will be evaluated by test. • Design improvement • Reduce the power consumption • Reduce the layout area • Optimize the depth of FIFO1 • Optimize the clock network • Optimize the pixel read frequency and let every four columns share one channel of readout circuit. 21st iWoRiD weixm@nwpu.edu.cn