170 likes | 312 Views
Internal simulation meeting 27-06-2014 – Elia Conti OUTLINE. Description of distributed latency counter architecture Simulation results (comparison with zero-suppressed FIFO). Distributed latency counters architecture block diagrams (I). Pixel Region (PR). PUC 1,1. PUC 2,1. …. PUC
E N D
Internal simulation meeting 27-06-2014 – Elia Conti OUTLINE • Description of distributed latency counter architecture • Simulation results (comparison with zero-suppressed FIFO)
Distributed latency counters architecture block diagrams (I) Pixel Region (PR) PUC 1,1 PUC 2,1 … PUC m,1 PUC matrix … PUC 1,2 HITS PUC outputs (ToT, status, …) triggered output (ToA) … … PR LATENCY MEMORY PUC 1,n … … PUC m,n ..... Derandom.FIFO MEMORY MANAGEMENT UNIT regionalsignals triggered hit packet to col. bus regionalsignals EXT CNT TRIGGER ToT BUFFER IN EACH PUC SHARED ToA MEMORY
Distributed latency counters architecture building blocks Pixel Unit Cell • Computation of ToT from analog hits • Storage of ToT in local buffer, which is read and written using regional signals Latency Memory (array of Latency Cells) • Copy of ToA in latency cell register • Trigger matching (latency down counters) enables ToA output or not • Read and written using regional signals Memory Management Unit • Generation of regional signals • PR status assessment • Output hit packet assembling Main regional signals • r_edge_or: used by PUC state machine • free_addr: address of first available memory location to write into • read_data: address of memory location to read from when trigger matching • triggered: raised when trigger matching • wr_en_PR: used by latency memory as ToA write enable and by PUC as free_addr write enable in memory pointer • ack: used for “releasing” PUCs into idle state
Distributed latency counters architecture block diagrams (II) Pixel Unit Cell (PUC) DIGITAL “ANALOG” free_addr MEMORY POINTER wr_en_PR ..... ToT COUNTER ToT free_addr_reg ToT CONVERTER analog_hit ToT_out discr_out enable/ clear wr_en PIXEL FSM r_edge_or pixel state + r_edge ack read_data triggered = read_enable • Memory Pointer is in practice a register for free_addr • ToT memory described as an oversized array (no SV queue) ToT MEMORY
Distributed latency counters architecture – PUC FSM r_edge = 0 r_edge_or = 1 r_edge = 0 r_edge_or = 0 BLIND COUNTING READY IDLE ack = 0 ack = 1 clearToTcounter r_edge = 1 r_edge_or = 1 ack = 1 f_edge = 0 f_edge = 1 enableToTcounter writeToT in ToT Memory ack = 0
Distributed latency counters architecture block diagrams (III) PR Latency Memory trigger ext_count LATENCY CELL[0] LATENCY CELL[1] LATENCY CELL[D] ToA count ..... enable/clear wr_en[0] wr_en[1] wr_en[D] wr_en_PR free_addr ToA_out_select FSM LATENCY COUNTER ADDR SELECT ToA REGISTER busy_cell triggered_cell ToA[0] ToA[1] ToA[D] ToA_out read_data • Oversized array of latency cells
Distributed latency counters architecture – Latency Cell FSM wr_en = 1 wr_en = 0 TRIGGERED FREE BUSY count > 0 clearlatencycounter count = 0 trigger = 0 enablelatencycounter count = 0 trigger = 1 enableToA out
Distributed latency counters architecture – Analysis interface What is monitored? • PUC busy (ToT converter busy or logic delays) • ToT and Latency Memory occupancy (deduced from busy_cell signal) • PR busy (from r_edge_or pulse to ack)
Comparative study between architectures – Simulation (I) Simulation parameters: • Hit rate: 2 GHz/cm2, trigger latency: 10 µs • Full pixel matrix (hit generation): 512 x 512 pixels, pixel size 50 x 50 µm2 • Sensor thickness: 100 µm • Simulation run for 500,000 BX cycles 500,000 hit transactions generated 10,000 hit packets into PR buffer • Class of hits generated: tracks, track angle 90° with charge sharing, 43% hit pixels
Comparative study between architectures – Simulation (II) • Simulation run for square (1x1-8x8) PR configurations with zero-suppressed FIFO and distributed latency counters architectures • Required number of buffer locations carried out for each configuration for both1% and 0.1% overflow probability • Total number of memory bits per pixel carried out with B-ID (16 bits) + full ToT(4 bits) memory organization, without hit map • Simulation results compared with each other and also with statistical/analytical ones (symmetrical cluster model with “Average 4.22” distribution)
Comparative study between architectures – Buffer occupancy (I)
Comparative study between architectures – Buffer occupancy (II) A distributed architecture looks slightly more efficient in terms of buffering performance at a behavioral description level
Comparative study between architectures – Lost hits Hits lost due to PR busy The currently implemented distributed architecture keeps the PR slightly less busy but features longer logic delays Hits lost due to PUC busy (ToT + logic delays)
Conclusion and discussion • Preliminary simulation results show that the distributed latency counters architecture features a slightly better buffering performance at a behavioral description level • Hit loss should be investigated in more detail (percentage, graphical comparison) • Feedback needed from FE-I4 designers • How much should we rely on simulating DUTs that feature behavioral models for memories? • What if we insert already developed RTL blocks in place ofnon-synthesizable code (e.g. dual-port RAMs)?
Development of zero-suppressed FIFO architecture (II) Block diagram PUC 1,1 PUC 2,1 … PUC m,1 Pixel Region (PR) … PUC 1,2 … … PUC matrix Derandom.FIFO PR buffer PUC 1,n … … PUC m,n ToT_out HITS WRITE LOGIC ..... ..... HIT PACKET to col. bus Trig’d hits r_edge PUC state wr_en_global r_edge_or TRIGGER MATCHING LOGIC ack ACK LOGIC EXT CNT TRIGGER
Development of zero-suppressed FIFO architecture (III) Pixel Region Write Logic PR WriteLogic EXT CNT ToA COUNTER ToA_out r_edge HIT PACKET HIT MAP GENERATOR ToT_out r_edge_or COMBINATIONAL LOGIC PUC state wr_en_global New virtualflagsignal: PR busy