Overview of trigger systems

Overview of trigger systems

ARTICLE IN PRESS Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56 www.elsevier.com/locate/nima Overview of trigger systems Vol...

379KB Sizes 2 Downloads 114 Views

ARTICLE IN PRESS

Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56 www.elsevier.com/locate/nima

Overview of trigger systems Volker Lindenstruth, Ivan Kisel Kirchhoff Institute for Physics, Ruprecht Karls University, Im Neuenheimer Feld 227, Heidelberg 69120, Germany Available online 28 August 2004

Abstract This article presents an overview of trigger systems in existing and planned modern experiments, the physics observables used for triggering, and the techniques which are deployed to process the detector information within the often tight real-time budget. Examples of presented trigger systems include large-scale on-detector processing and readout systems based on microelectronics, systolic trigger processors based on field programmable gate arrays (FPGAs), high-rate cluster systems and high-level triggers, which typically perform full event reconstruction online. r 2004 Elsevier B.V. All rights reserved. Keywords: Trigger; FPGA; DSP; Systolic processor; PC farm; Data flow

1. Introduction The majority of physics reactions, which are studied in modern experiments, are rare. For instance, the planned experiments at the CERN LHC [1] expect an event rate of 109 Hz for the design luminosity of L ¼ 1034 cm2 s1 : Depending on its mass, the Higgs Boson is expected to be produced at rates that vary between 101 and 102 Hz; resulting in a minimum selectivity of 1011 : Consequently, trigger systems are used at all levels of modern experiments and are typically designed together with the detector systems to match the physics requirements of the experiment. Corresponding author.

E-mail address: [email protected] (V. Lindenstruth). URL: http://www.ti.uni-hd.de.

Typically (Fig. 1), a hierarchy is built where lower trigger levels reject events based on obvious and more simple criteria, while higher levels of trigger implement more complex analysis and selection techniques with more time available for processing. However, they only operate on the subset of events which are not rejected at the lower trigger levels. Therefore, trigger systems are hierarchical where higher levels cannot undo a reject of a lower level. In general, there are various parameters describing triggers, especially their selectivity and efficiency. The trigger latency and required buffer space are correlated, which result in a maximum latency for lower trigger layers, once the detector front-end buffers are designed. On the other hand, large supported trigger latency by the front-end buffers allows the coalescing of events

0168-9002/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.nima.2004.07.267

ARTICLE IN PRESS V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

being part of the signal processing and readout system, or as systolic triggers, implementing complex architectures of field programmable gate arrays (FPGAs). Those systems may also include digital signal processors (DSPs) for complex arithmetic. High level triggers (HLT) typically involve PCs operating at larger latencies and lower event rates, performing complex operations similar to an off-line event reconstruction. In the following, each of these architectures will be outlined and detailed using a concrete example.

Detector, digitization

L1

Detector front-end pipeline

Front-end pipeline readout

L2

49

Readout Buffers

Detector readout

2. Integrated triggers High Level Trigger/ Event Buffers time

HLT

Fig. 1. A generic trigger hierarchy.

and consequent trade-offs between event rate, event size and latency. This is particularly relevant for first-level triggers where the event sizes are relatively small and the event rates high. The trigger processing time varies as a function of the event type and, therefore, the trigger systems may not produce results in the order of events received. Some systems implement a reordering stage, allowing for simpler buffer management (FiFos) at the lower levels. But this reordering increases the corresponding average trigger latency. Triggers can be fully pipelined or may produce dead time during their processing. In general, with increasing time after the event, the triggered event rate and aggregate RAW data rate decreases and information considered increases. Furthermore, the event size, decision complexity, trigger latency and corresponding buffer requirements increase as well as the selectivity of the trigger, which can range from 101 to less than 105 in one level. Trigger systems can be categorized with respect to their architecture and available trigger latency. The first-level triggers typically have short available latencies in the range of ms and are implemented as either integrated trigger,

Latest developments of GEM detector readout implement a VLSI chip that uses its top metal layer as PADs and integrates an appropriate preamplifier at the silicon layers below, thereby achieving unprecedented noise performance [2,3]. In general the increasing channel count of modern detectors drives the electronics onto the detectors, possibly including trigger functionality. The ALICE TRD [4] is one example of a highly integrated readout and trigger system, which consists of six concentric, cylindrical layers of drift chambers. Given the high particle multiplicity of the ALICE heavy ion experiment, the TRD requires 1.2 million analog channels. The trigger is based on the selection of high pt electron/ positron pairs. Their identification is based in addition to the transition radiation on the fit of each particle tracklet in all detector layers, the selection of possible high pt tracklets based on their geometry, and the shipping of the selected tracklets to a global tracking unit for track merging into a particle track. The TRD is designed to process up to 2  105 track candidates simultaneously. Fig. 2 sketches the timeline of the various processing stages in the TRD trigger system. The drift time is 2 ms. During this time, the charge is digitized at 10 MSPS into 10 bits, filtered digitally and stored in the event buffers. All clusters are identified and the cluster position is computed. Further, the sums Sðxi ; yi ; x2i ; y2i ; xi yi Þ; where x denotes the drift direction and y the PAD position, are determined simultaneously for each

ARTICLE IN PRESS V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

50

Global Tracking Data ship

Calculate

CalculateTracklets

fit

drift pipeline ADC output Visible TEC drift area TEC drift

Tracklet TRD

PASA

ADC

Preprocessor

TPP

event buffer

Tracklet

Tracklet

Preprocessor

Processor

TPP

TP

Tracklet Merger TM

GTU

240 5990

t

event buffer

Fig. 2. The ALICE Transition Radiation Trigger.

hit candidate. During the following 1.5 ms the track parameters are fit by a custom 120 MHz RISC processor per track candidate, which computes tilt ¼ ðNSxi yi  Sxi Syi Þ=ðNSx2i  ðSxi Þ2 Þ and position ¼ ðSx2i Syi  Sxi Sxi yi Þ=ðNSx2i  ðSxi Þ2 Þ; where N is the number of hits on a tracklet. The tilt angle defines the particles’ transverse momentum and is used for selecting high pt candidates. Then all tracklet candidates are shipped during the following 600 ns off the detectors, using 1080 2.5GBit optical links, to the global tracking unit for track merging and trigger decision, which are made during the last 1.8 ms of the available 6 ms time budget. The available trigger latency budget is defined by the TPC drift time, which is used as analog storage medium of the TPC track information. The TRD trigger latency corresponds to an effective 7% shortening of the 88 ms TPC drift time. The electronics are organized in groups of 18 channels. Two chips are mounted on one low-cost multi-chip module (MCM). The first chip implements a low-noise preamplifier, while the second implements the ADCs, digital filters, the tracklet processing and readout functionality. The performance of the 10-bit 10-MSPS ADC on the digital chip was measured as 9.5 ENOB1. The integrated PASA and digitizer chips are operational, demonstrating the feasibility of such highly integrated systems. The ADC requires 9 mW and 0.1 mm2 of 1

Effective number of bits.

silicon per channel. The TRD implements a total of 70848 MCMs directly mounted on the detector. Another highly integrated detector readout system, however, without trigger functionality, is the ALICE TPC. These examples demonstrate the feasibility and advantages of highly integrated readout and trigger systems. Original concerns to operate an ADC together with fast-switching digital circuitry on the same chip are invalid, provided care is taken in the design of the floor plan, power routing and clock phases.

3. Microelectronics trends Integrated trigger systems and the corresponding detector architectures are possible only due to recent advances in the field of microelectronics. Given this ongoing trend [5] the question arises what kind of architectures the next silicon generations might make possible, as projected and documented by the International Technology Roadmap for Semiconductors [6]. The projections made here can be considered conservative. The driving factor is that the integration density is expected to reach minimum gate lengths of 9 nm and effective oxide thicknesses (EOT) between 0.4 and 0.5 nm in 2016. Local clock rates are expected to approach 30 GHz for high performance logic processes, while their core supply voltage drops to 0.4 V, which is driven by EOT and the breakdown

ARTICLE IN PRESS V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

field of 300 kV/cm. The gate leakage currents are expected to exceed current levels by three orders of magnitude. Low-power processes project gate lengths of 11 nm, EOT of 0.6–1.0 nm with core supply voltages of 0.6 V. The decreasing structure sizes result in a higher inherent radiation tolerance. Given these facts, on-detector analog electronics may not use the smallest feature sizes. However, deep sub-micron silicon may have interesting benefits in the area of fast serial readout. Fast low-power serializers, operating at or beyond 2 GBits/s have become feasible, permitting lowcost optical readout that could even be integrated into a readout and trigger chip. The increasing non-recurring setup cost for masks will drive the development of multi-purpose analog chips and force larger communities to jointly develop fewer chips. Detectors may synchronize their developments such that their own multi-project wafer runs are orchestrated. Deep sub-micron processes will result in increasing memory integration densities, allowing for increased trigger latencies at constant cost. On the other hand, FPGAs are based on internal memory matrices. Their integration density closely follows Moore’s law. Similar arguments also lead to increased switching performance of FPGAs. It is a common trend to add customizable memory banks to FPGAs as well as specialized logic such as fast (de)serializers and even complete processors. This is possible because these features now require little silicon real estate. For applications in the networking environment, built-in memories in FPGAs can often be configured to be associative (cache) memories. On the subject of commodity processors, performance improvements are expected as well. These will be only partly due to increasing clock rates because today large fractions of the processor chips are cache memories and multi-threaded architectures are expected to become increasingly abundant.

4. Systolic FPGA trigger processors This section discusses two trigger architectures that use FPGAs for the processing of signals.

51

FPGAs are particularly useful for the first combinatorial parts of the analysis where large amounts of highly localized data are massively processed in parallel. THE HERA-B FIRST-LEVEL TRIGGER (FLT) [7] reduces the 10 MHz event rate by a factor of 200 with a maximum delay of 1.2 ms, by an event selection based on invariant mass, particle multiplicities, etc., which is based on online track reconstruction using a Kalman filter. It operates in three phases: pre-trigger, tracking and decision. Pre-triggers originate from three sources: coincidences of pads and pixels in the third and fourth super-layer of the HERA-B muon detector; high pt clusters in the calorimeter; and, finally, coincidence patterns between the three PAD chambers in the magnet. The three pre-trigger systems produce messages which define a region of interest (RoI) and an estimate of momentum. When operating at design rates, several such RoIs are expected per event. Messages from pre-triggers are routed to a parallel pipelined network of custom FPGA processors, which attempt to track them through four of the six main tracker super-layers behind the magnet. The processors map the super-layers geographically; each processor takes inputs from three views of a continuous region of a single super-layer. In each processor, a search is made for hits inside a RoI and when found, a new message is generated with refined track parameters and sent to the next processor. Messages arriving at the furthest upstream super-layer are tracks with parameters determined with a typical accuracy of a single cell width in the outer tracker and four strips in the inner tracker. These messages are collected in a single processor where they are sorted by event number. A trigger decision is made based on the kinematics of single tracks and pairs of tracks. The HERA-B FLT is a parallel and pipelined hardware processor system consisting of 60 Track Finding Units (TFU), 4 Track Parameter Units (TPU), and one Trigger Decision Unit (TDU). The filter process needs a total amount of about 100,000 bits of information per event, which is produced every 96 ns (1 Terabit/s). The data is transmitted by about 1400 optical links with 900 MBit/s each. The FLT is realized in FPGAs

ARTICLE IN PRESS 52

V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

and fast SRAMs. For the RoI definition and track search, complex arithmetic is required including trigonometric functions and divisions, which are implemented as look-up table memories for the defined parameter ranges. The FLT operates at 50 MHz and is highly pipelined. It is able to reconstruct up to 500 million tracks per second. The efficiency of the system is at the level of 55% for electron tracks and 30% for muon tracks. This results in 1000–1500 J=c per hour. THE H1 TRACK TRIGGER [8] uses a different pattern-recognition-based scheme for the online fast track reconstruction. The Fast Track Trigger (FTT) is integrated into the first three levels (L1–L3) of the H1 trigger. The FTT functionality is based on the central jet chambers information derived from four groups of three wire layers each. It is designed to handle up to 48 tracks, which is sufficient for about 98% of the events of interest. The FTT permits reconstruction of three-dimensional tracks down to 100 MeV/c within the L2 latency of 23 ms and a momentum resolution of 5% (at 1 GeV/c). The four-wire groups are processed independently, where each of the threewire groups is digitized at 80 MSPS. The time bin of a hit determines its drift time and corresponding position. At the L1 stage the hit information is evaluated at 20 MHz granularity, defining a 3  20 bit field for the 1 ms drift time of the three wires. Only a small subset of hits determines a valid track candidate. A look-up table would require 260 bits, with a negligible number of bits being set. Modern FPGAs implement associative memories, which compare an input bit pattern (here 60 bits) with all stored bit patterns and return the associated data (the track parameters pt ; f) in case of a match. Such memories are used as caches in all modern processors or switches for routing information (CAM) [9]. The four identified tracklets are matched with a sliding window technique in the pt ; f plane. The final track parameter optimization is done at 80 MHz granularity using non-iterative fits, which are implemented in stand-alone digital signal processors (DSPs) at the second level. The required data flow is orchestrated by FPGAs. For example, like in the case of CAMs, modern FPGAs tend to implement fast RISC processors,

however most are without a floating point unit. The H1 third-level trigger implements the functionality of a high-level trigger.

5. High rate PC farms Integrated and FPGA hardware can be highly customized but development cycles are long, which forces an early technology freeze, thereby inhibiting the adoption of new technologies. In contrast, the fast developing, low-cost commodity PCs have a very short time to market for new silicon technologies. Furthermore, the development of trigger algorithms does not require special infrastructure. This raises the question of to which extent PCs can be used for high-rate real-time trigger systems. The prototype discussed below was developed with the requirements of the LHCb vertex trigger in mind [10], requiring an event processing rate of 1 MHz for event sizes of about 6 kB and originating from about 20 data sources (readout units). The Level-1 trigger of the LHCb experiment [10] uses a cellular automaton algorithm to reconstruct tracks in the vertex detector. The algorithm takes about 5 ms per event on a 1 GHz Pentium III processor. Given the data set size of 6 kB, which fits well into the first-level cache of modern processors, one can expect performance scaling with Moore’s law. Furthermore, about 75% of this computing time is the combinatorial hit selection and can be offloaded into an FPGA. Note that in this system every ms of processing time requires one CPU. The aggregate bandwidth of 6 GB/s corresponds to the same peak target node bandwidth if the readout units send their data fragments unsupervised because all readout units receive their corresponding sub-events at about the same time. One solution to this problem is traffic shaping. In this particular example [11], traffic shaping is orchestrated by a scheduling unit, which controls the sending of all readout units in such a way that at all times only one sender transmits to a particular receiver. The synchronization of the different readout units is done by means of a simple round-robin point-to-point message

ARTICLE IN PRESS V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

passing link, TagNet [12], which allows orchestration of any transfer sequence. Given the high event processing rate, each readout unit has to transmit a complete sub-event within less than 1 ms. The particular prototype implements direct transfers from a readout unit PCI card into a PCI network interface card without processor intervention (HDMA). The particular network chosen here was SCI [13], however, the concept only requires a network technology that translates a write to a physical address inside the computer to a remote memory write operation (RDMA). The network data flow is matched best by a torus or mesh (R rows C columns, RpC) with geometric routing. Such topologies have the additional advantage of not requiring complex switches but are rather implemented as a plurality of multi-port network interfaces. The trigger system prototype (Fig. 3) is based on PCs connected by SCI in a 2D torus (X!Y routing) topology with 3 RUs (00!10!20) connected to 27 compute nodes. TagNet orchestrates the pipelined data flow such that all readout units simultaneously send to independent columns. Therefore, each event requires R time slots for its complete transmission to one node. A compute node is typically addressed again after R  C trigger signals. It should be noted that the TagNet scheduler can easily exclude certain nodes that are being temporarily overloaded or even match different processing powers if C4R:

53

The bottom (solid) curve in Fig. 4 shows that a maximum frequency of 2.56 MHz can be achieved for sending 64 B bursts in HDMA mode. The corresponding PCI transfer comprises 7 wait cycles, a maximum of 16 data phases, and 10 idle cycles. The wait cycles are caused by both the network interface and the idle cycles of the PCI DMA master in the FPGA-based readout unit. Similar measurements can be obtained on the receiver side where the network interfaces to the computer. The idle cycles can be eliminated [11] by operating two interlaced PCI readout units in one PC (dashed curve). The dotted curve is the theoretically absolute maximum rate. This prototype demonstrates the capabilities of high-rate PC farms if some care is taken in the design of the network data flow, and the network feeding and the actual network transfers are scheduled. The scheduling infrastructure can be easily implemented using low-cost FPGAs. Provided enough buffering is foreseen, the virtual transaction rate can be increased by coalescing events. In that case, the aggregate network bandwidth becomes the only limiting factor.

6. High-level trigger The class of high-level triggers (HLT) performs a full event reconstruction online. In general, two

Scheduler Scheduler

RU RU 00

01

02

03

04

05

06

07

08

09

RU RU 10

11

12

13

14

15

16

17

18

19

RU RU 20

21

22

23

24

25

26

27

28

29 DU DU x y

Fig. 3. A torus based high-rate trigger farm prototype.

Fig. 4. Measured and theoretical event rate limits for orchestrated hardware-initiated DMA.

ARTICLE IN PRESS 54

V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

philosophies are pursued: the first combines an entire event into one HLT node for processing [14,15], with the associated requirement for traffic scheduling and shaping; the second philosophy processes the data where it is available, using a high degree of data locality in the event processing. The latter are used typically for heavy-ion experiments with comparably large event sizes. THE STAR HIGH-LEVEL TRIGGER [16] focuses on the event reconstruction of the STAR TPC, the central detector in the experiment. For a Au+Au pffiffi run at s ¼ 200 GeV a tracking efficiency of 460% for pt X1 GeV/c pions and a momentum resolution of 5% for 1 GeV pions was achieved. The vertex in the STAR L3 is reconstructed with a resolution s ¼ 2:5 mm [17].Using energy loss as a function of rigidity allows for the selection of double negative particles, such as 3 He; online. Fig. 5 shows a measured trigger efficiency plot for this complex trigger [17]. THE ALICE HLT [18] connects about 400 commodity PCs with a fast and low-overhead network for the online event reconstruction. Examples of HLT capabilities are dE/dx, high pt electrons, di-electron or di-muon events, hadronic charm decays by secondary vertices and jets. Intelligent data compression, such as pile-up removal in the LHC proton–proton mode, form a second class of applications. The ALICE HLT is designed as a hierarchy of stages, where parts of data are processed locally with only the resulting output data being forwarded to the next stage. For instance, the RAW data is reduced to space points

Fig. 5. Trigger efficiency for anti-helium.

in the HLT front-end processors where the RAW data is received and stored. At each higher stage, data may be merged with other segments of the detector system, performing an analysis tree. At the final stage, the fully reconstructed event is available for the described signature search. This architecture reduces the data flow and connectivity requirements without incurring any traffic shaping or synchronization overhead, while imposing the least requirements on the required networking infrastructure. HLT communication is orchestrated by a generic, low-overhead software framework consisting of separate components and communicating via a generic interface. Any analysis tree can be setup based on data dependencies and availability of processing nodes. The configuration is dynamic and can be changed at runtime for load balancing and fault tolerance reasons. In addition to templates for application-specific components, the framework also includes components to manage and define the flow of data in such a system. The very I/O bound hit selection and clusterfinder functionality required for all first-level processing typically performs rather poorly on fast clocked processors. Another application is Hough transforms. By implementing a PCI-based FPGA card, the PC is complemented with a custom signal processor. As the RAW data must be received into the high-level trigger, the same device can be used as an interface to the typically optical detector links. The memory of the PC becomes the event buffer, receiving the RAW data by FPGA initiated DMA transfers, while the FPGA also performs the first-level processing, assisted by the host processor. Typical high-level triggers implement farms with several hundred PCs, making fault tolerance an integral part of the system. The HLT framework allows dynamic reconfiguration, therefore supporting online remapping of any failing node or link. Results of a node failure test is shown in Fig. 6, showing the network throughput of each node, where the data stream fans out from one source A to three nodes B. . .D and merges back to one destination E. As a consequence to node D failing

ARTICLE IN PRESS V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

55

Fig. 6. High-level trigger error recovery (the plots are offset).

(1), the sending rate drops (2) and the receiving rate of the remaining processors B,C increases. After a configured time-out, the hot stand-by F is used automatically as replacement (3) and the rates drop to normal. During such a reconfiguration, all events that are lost in failing nodes are automatically reprocessed, thereby avoiding any loss of data in the system.

7. Future architectures — data flow machines One new architectural trend emerging now is the transition from the traditional hierarchical trigger systems, with a first selection and subsequent reject layers, to more data flow-type machines, where the detector front-end performs time-stamped zero-suppressed readout and the multiple data streams merge the appropriate event fragments together [19,20]. The architecture of these systems is mainly driven by the data flow, using a high-level trigger for final selection. These architectures require a global time clock with a granularity shorter than the shortest expected event interval, to be distributed to all readout systems for the correct labeling of the event fragments. One version of such a time distribution

system, also distributing triggers, was developed for the LHC experiments [21].

References [1] The Large Hadron Collider, Conceptual Design, CERN/ AC/95-05(LHC), CERN 1995. [2] R. Bellazzini, Reading a GEM with a VLSI pixel ASIC used as a direct charge collecting anode, Nucl. Instr. and Meth. A, these proceedings. [3] R. Bellazzini, The GLAST large area space telescope: a new instrument to explore the high energy Universe, Fifth International Meeting on Front-End Electronics, Colorado, 2003. [4] ALICE TRD Technical Design Report, CERN/LHCC 2001-021, ALICE TDR 9, 2001. [5] G.E. Moore, Electronics 38(8) (1965). Available from: http://www.intel.com/research/silicon/mooreslaw.htm (July 2003). [6] The International Technology Roadmap for Semiconductors web site, http://public.itrs.net. [7] V. Balagura, et al., Nucl. Instr. and Meth. A 494 (2002) 526. [8] A. Baird, et al., IEEE Trans. Nucl. Sci. NS-48 (2001) 1276. [9] Implementing high-speed search applications with Altera CAM, Technical Report, ALTERA, July 2001, Annotation Note 119. [10] LHCb Trigger System, Technical Design Report, TDR 10, CERN-LHCC-2003-031, September 2003.

ARTICLE IN PRESS 56

V. Lindenstruth, I. Kisel / Nuclear Instruments and Methods in Physics Research A 535 (2004) 48–56

[11] A. Walsch, Architecture and prototype of a real-time processor farm running at 1 MHz, Ph.D. Thesis, University of Mannheim, 2002. [12] H. Mu¨ller, et al., Tagnet, a twisted pair protocol for eventcoherent DMA transfers in trigger farms, Eighth WS on Electronics for LHC, Colmar, 2002. [13] IEEE standard for scalable coherent interface (SCI) 1596–1992, The Institute of Electrical and Electronics Engineers, Inc. 1993. [14] ATLAS Collaboration, ATLAS High-Level Triggers, Data Acquisition and Controls Technical Design Report, CERN/LHCC 2003–22, 2003. [15] CMS Collaboration, Data Acquisition and High-Level Trigger Technical Design Report, CERN/LHCC 2002-26, 2002.

[16] F.S. Bieser, et al., Nucl. Instr. and Meth. A 499 (2003) 766. [17] C. Struck, Antinuclei production in central Au–Au collisions at RHIC, Ph.D. Thesis, Frankfurt, 2003. [18] ALICE Technical Design Report of the Trigger, Data Acquisition, High Level Trigger and Control System, CERN-LHCC-2003-062, ALICE TDR-010, 2004. [19] T. Skwarnicki, Nucl. Instr. and Meth. A 462 (2001) 227. [20] Letter of Intent for the CBM experiment, Darmstadt 2004, http://www.gsi.de/zukunftsprojekt/experimente/CBM/ LOI2004v6.pdf. [21] B.G. Taylor, Timing Distribution at the LHC, Eighth WS on Electronics for LHC, Colmar, 2002.