Development and simulation results of a sparsification and readout circuit for wide pixel matrices

Nuclear Physics B (Proc. Suppl.) 215 (2011) 307–309 www.elsevier.com/locate/npbps Development and simulation results of a sparsiﬁcation and readout c...

Download PDF

286KB Sizes 2 Downloads 68 Views

Report

PDF Reader
Full Text

Nuclear Physics B (Proc. Suppl.) 215 (2011) 307–309 www.elsevier.com/locate/npbps

Development and simulation results of a sparsiﬁcation and readout circuit for wide pixel matrices A. Gabriellia , F. Giorgia∗ , F. Morsanib , M. Villaa a

University and INFN of Bologna

b

University and INFN of Pisa

In future collider experiments, the increasing luminosity and centre of mass energy are rising challenging problems in the design of new inner tracking systems. In this context we develop high-eﬃciency readout architectures for large binary pixel matrices that are meant to cope with the high-stressing conditions foreseen in the innermost layers of a tracker [1]. We model and design digital readout circuits to be integrated on VLSI ASICs. These architectures can be realized with diﬀerent technology processes and sensors: they can be implemented on the same silicon sensor substrate of a CMOS MAPS devices (Monolithic Active Pixel Sensor), on the CMOS tier of a hybrid pixel sensor or in a 3D chip where the digital layer is stacked on the sensor and the analog layers [2]. In the presented work, we consider a data-push architecture designed for a sensor matrix of an area of about 1.3 cm2 with a pitch of 50 microns. The readout circuit tries to take great advantage of the high density of in-pixel digital logic allowed by vertical integration. We aim at sustaining a rate density of 100 Mtrack · s−1 · cm−2 with a temporal resolution below 1 μs. We show how this architecture can cope with these stressing conditions presenting the results of Monte Carlo simulations.

1. Introduction The readout circuit presented in this work is an evolution of previous architectures that were implemented on silicon by the SLIM5 collaboration, such as for example the APSEL4D chip [3]. Several enhancements were introduced in this new version of the architecture that improved the readout eﬃciency, the timing resolution and the overall required data-bandwidth. Our goal is to reduce to the minimum level the sensor dead-time due to readout latencies, that is: fast hit mining from the sensor matrix and data compression for a fast de-queuing of the hits over the output bus. 2. Hit Mining The readout developed has been designed to cope with large matrices of binary pixels. Here we present the study and the results for a matrix of dimensions 200 columns by 256 rows, for more ∗ Corresponding

author on behalf of the VIPIX collabora-

tion

0920-5632/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.nuclphysbps.2011.04.040

than 50 thousands total active pixels. The total corresponding active area is 1.3 cm2 considering a 50 microns pitch. The readout algorithms of the peripheral readout rely on a dense in-pixel digital logic that can be realized exploiting for example the vertical integration technology. This logic is responsible for the hit/not hit information storage and the time stamping of the ﬁred pixels. The sparsiﬁed hit extraction takes place through a columnwide pixel data bus, whose lines are shared among the pixel rows and which are driven in turn by the pixels of the activated column. The readout activates in sequence only the columns that need to be scanned following a time order. In order to perform this time-sorted extraction of the hits, the readout queries the matrix for a certain time stamp, receiving back the pattern of the ﬁred columns that need to be activated for readout and reset. A single clock cycle is enough to extract and encode all the hits from the Active Column. The hit extraction procedure is represented in Fig. 1

308

A. Gabrielli et al. / Nuclear Physics B (Proc. Suppl.) 215 (2011) 307–309

encode up to W ﬁred pixels. Also, the hit encoding takes advantage of the time-sorted extraction of hits from the matrix. This feature makes redundant the time tagging of every single hit-word, thus a special Time Stamp Header word is interposed between the hit-words that belong to diﬀerent time windows. 4. Hit De-queuing

Figure 1. Readout scheme.

The whole sensor matrix is divided vertically into 4 sub-matrices, each one with its own independent readout circuit for a higher parallelism. Four parallel active columns read out contemporaneously in one clock cycle means that 1024 pixels are scanned at each clock edge, that is 50 Gpxl/s with a 50 MHz read clock.

The hit storage element is called Barrel which is basically an asymmetric FIFO with a dynamic input width. We developed this particular kind of hit-container since it has to store a variable number of hit-words in one clock cycle. Each sparsiﬁer is connected to its dedicated Barrel22 , then the hit-dequeuing system follows a tree data-ﬂow where each sparsiﬁer is a leaf and the Output Data Bus is the root. The nodes that gather data from the four Barrel2 of a sub-matrix are called Concentrators. They convey the incoming data in the Barrel1 preserving the time sorting of the hits (see Fig. 2).

3. Hit Encoding All the hits found on the Pixel Data Bus, driven by the active column, can be read out in one clock cycle independently of the pixel occupancy thanks to a component called Sparsiﬁers. The horizontal x coordinate is encoded as the binary address of the current Active Column, while the vertical y coordinate is encoded using a zone algorithm in order to take advantage by the presence of clustered events. The active column is divided into 8-bit zones, each sparsiﬁer is fed by 64 data lines of the pixel data bus that correspond to 8 zones. At each clock cycle the sparsiﬁer puts on its output bus the addresses of the zones that contain at least one ﬁred pixel, with the relative zone pattern appended to it. Compared to a direct binary encoding of the xy coordinates, the zone encoding technique increases the length of each hit-word, but in case of clustered events it reduces the number of words to be sent. The hitword enlarges of a quantity W − log2 W where W is the zone width. It corresponds to a small enlargement for small W , but a single word can

Figure 2. Hit readout system for a sub-matrix.

The four Barrel1 in turn feed a Final Concentrator, not shown in ﬁgure, that performs a round robin emptying cycle over the four sub-matrices in order to drive the output data bus of the chip. 2 stands

for Barrel of level 2

A. Gabrielli et al. / Nuclear Physics B (Proc. Suppl.) 215 (2011) 307–309

5. The Output Stage The common output stage drives the output data bus of the chip. The architecture developed is data-push, which means that all hits that are extracted from the matrix are sent out of the chip. It is possible to show with few calculations the improvement in term of bandwidth brought by the zone encoding technique and the time-sorting. A direct xy-t hit-encoding in a 200×256 matrix requires 24 bits assuming an 8-bit time-stamp. The produced data rate R is then: R = 24 bits × 100 MHz cm−2 × 1.3 cm2 = 3.1 Gbps If we introduce the time sorting of the hits, and assuming that each leading TS word is followed by about 10 hits3 , we have a hit-word length of only 16 bits and a rate R = 16 × 100 × 1.3 × 1.1 = 2.3 Gbps. The 1.1 factor is the rate increment due to the presence of the TS words. Now let us introduce also the zone sparsiﬁcation algorithm. A factor 4 is supposed to be included in the 100 MHz cm−2 hit ﬂux value. In order to simplify calculations we assume that the cluster factor has a ﬁxed shape of 2×2 pixels. The number of hit words that need to be sent depends on the overlapping of the cluster shape on the grid of zones. There are 8 possible geometrical conﬁgurations, only in 1 of them the cluster overlaps 4 diﬀerent zones. In the remainder 7 conﬁgurations only 2 hit-words are produced. It is possible then to evaluate the data rate R with a weighted average over the possible conﬁgurations: R = [2(L + Δ) × 78 + 4(L + Δ) × 18 ] · Φ · A · 1.1 = 1.7 Gbps. L = 16 is the hit-word length. Δ = 5 is the increase in word length due to the zone sparsiﬁcation4 . Φ = 25 MHz cm−2 is the track ﬂux. A = 1.3 cm2 is the active area. 1.7 Gbps vs 3.1 Gbps corresponds to a considerable 45% reduction of the output bandwidth. 6. Eﬃciency

309

the dead time introduced by the readout 5 . The graph in Fig. 3 shows the evaluated eﬃciencies plotted against the length of the time windows (BC) for several read clock frequencies.

Figure 3. Eﬃciencies of the readout architecture in function of the BC clock evaluated by simulations. Mind the y scale

REFERENCES 1. The SuperB Conceptual Design Report, INFN/AE-07/02, SLAC-R856, LAL 07-15, Available online at: http://www.pi.infn.it/SuperB 2. V. Re et al., Nuc. Instr. and Meth. in Phys. Res. A, doi:10.1016/j.nima.2010.05.039. 3. A. Gabrielli for the SLIM5 collaboration, Nuc. Instr. and Meth. in Phys. Res. A, 408411 (604) year 2009. 4. G. Rizzo for the SLIM5 collaboration, Nuc. Instr. and Meth. in Phys. Res. A, 103-108 (576) year 2007.

A Monte Carlo simulation has been performed in order to evaluate the eﬃciencies. We report the eﬃciency measurements that refer only to 3 value

not far from that expected with a ﬂux of 100 MHz cm−2 , A=1.3 cm2 and 1 μs of BC. 4 considering that we adopted a zone width of W = 8, see section 3

5 assumed 100% sensor eﬃciency and few ns of pixel reset time

Development and simulation results of a sparsification and readout circuit for wide pixel matrices

Development and simulation results of a sparsification and readout circuit for wide pixel matrices

Recommend Documents