ARTICLE IN PRESS Nuclear Instruments and Methods in Physics Research A 608 (2009) 328–330
Contents lists available at ScienceDirect
Nuclear Instruments and Methods in Physics Research A journal homepage: www.elsevier.com/locate/nima
Coherent operation of detector systems and their readout electronics in a complex experiment control environment Stefan Koestner 1,2 CERN, Switzerland
a r t i c l e in f o
a b s t r a c t
Article history: Received 22 January 2009 Received in revised form 20 July 2009 Accepted 20 July 2009 Available online 29 July 2009
With the increasing size and degree of complexity of today’s experiments in high energy physics the required amount of work and complexity to integrate a complete subdetector into an experiment control system is often underestimated. We report here on the layered software structure and protocols used by the LHCb experiment to control its detectors and readout boards. The experiment control system of LHCb is based on the commercial SCADA system PVSS II. Readout boards which are outside the radiation area are accessed via embedded credit card sized PCs which are connected to a large local area network. The SPECS protocol is used for control of the front end electronics. Finite state machines are introduced to facilitate the control of a large number of electronic devices and to model the whole experiment at the level of an expert system. & 2009 Elsevier B.V. All rights reserved.
Keywords: SCADA systems Hardware protocols Hierarchical control systems
1. Introduction The LHCb experiment [1] has an experiment control system (ECS) which is built upon the Joint COntrols Project (JCOP) framework [2] which is common to all experiments at the large hadron collider (LHC) at CERN. The JCOP framework is an extension of the industrial SCADA system PVSS II [3] tailored for the use in high energy physics. The LHCb experiment comprises 12 subdetectors for a total of more than 700,000 readout channels which require reliable, radiation tolerant, and remote controllable electronics. In addition about 400 boards of 15 different electronics designs are in use for trigger and DAQ purposes in radiation safe counting rooms. The ECS handles the configuration, monitoring and operation of all experimental equipment under the various running conditions. It is designed as hierarchical structure as shown in Fig. 1. The top of the hierarchy corresponds to the run control, allowing the user to have an integrated view of the experiment’s status and to interact with the different subsystems, e.g. detector control system (DCS) or data acquisition (DAQ). In this hierarchy commands flow down and status and alarm information flow up. Control and device units are described via finite state machines: a well-known technique for modeling the behavior of a component defining the states that it can occupy and the transitions that can take place between those states.
E-mail addresses:
[email protected],
[email protected]. On behalf of the LHCb Collaboration. 2 Current address: Max Planck Institute of Microstructure Physics, Germany. 1
0168-9002/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.nima.2009.07.033
For control inside the radiation area, e.g. front end chips, the SPECS system [4] is used. SPECS specifies the use of an intermediate board to translate the long-distance protocol (100 m—from the counting room to the radiation area) into the short-distance protocol (a few meters). A DIM [5] server is running on a PC holding the SPECS master card which can receive commands from the control system and updates requested services. The SPECS slave on the other side is radiation tolerant. It is hosted inside service boxes which are put close to the subdetectors. For local board control outside the radiation area an embedded credit card sized PC (CCPC) [6] to access local hardware was developed. Also in this case DIM is used as the main protocol to transfer information to the experiment control system and the PVSS clients.
2. Hardware and protocol The serial protocol for experiment control system (SPECS) [4] is an evolution of the ATLAS serial protocol for the atlas calorimeter (SPAC) designed for the configuration of remote electronics elements. An intermediate board translates the SPECS longdistance protocol which is used for transmitting data from the counting room to the radiation area into the short-distance protocol, which is used by the front end electronics. It is a 10 Mbit/ s serial link with two signals in each direction (clock and data). SPECS is a single master multi-slave bus (up to 32 slaves) as shown in Fig. 2. The SPECS master board hosts four SPECS masters and is implemented on a standard 32-bit 33 MHz PCI board, which can
ARTICLE IN PRESS S. Koestner / Nuclear Instruments and Methods in Physics Research A 608 (2009) 328–330
Control
329
ECS
Units DCS
SubSys1
Device Units
DetDcsN
SubSys2
Dev1
Dev2
DetDaq1
Comands
DetDcs1
DAQ Status & Alarms
LHC
SubSysN
Dev3
DevN
To Devices (HW or SW) Fig. 1. The hierarchical architecture of the experiment control system of the LHCb experiment. The top node represents the run control from where commands can be sent downwards. State transitions triggered by hardware can propagate upwards the tree to the run control representing the state of the entire experiment.
Fig. 3. Picture from the top side of the SM520PCX embedded PC.
Fig. 2. The SPECS protocol. A scheme showing its possible implementations.
be plugged into a PC. The core of the system is implemented on an Altera APEX FPGA. The slave is designed as a portable VERILOG code and is integrated in an ACTEL APA 150 flash FPGA. This technology guarantees immunity against single event latch-ups (SEL) and was tested up to a radiation dose of 40 krad which corresponds to the lifetime of the experiment including generous safety margins. Hardness to single event upsets (SEU) can be achieved by appropriate redundancy of registers due to triple voting and the use of one hot state only. It is not foreseen to refresh the firmware remotely in regular intervals. On the counting house side, where radiation damage is not an issue, more complex micro controllers are used and directly attached to the FPGA based DAQ and trigger boards. For local board control the SM520PCX from Digitallogic is used, which is a complete embedded PC, based on the i486 compatible AMD ELAN 520 micro controller as shown in Fig. 3. It is running under Linux with a frequency of 133 MHz. Each micro controller has an isolated access path to a large local area network (10=100 Ethernet). To provide an interface to the on-board chips, a small glue card has been created which is built around a PLX PCI9030 PCI bridge and JTAG and I2C controllers. It also includes a bus
switch which electrically isolates the glue-card from the carrier boards, when the PC is rebooted. A remote download of the firmware is possible via the fast JTAG chains. When reading from memories attached to the local bus from PCI, a transfer rate of 20 MB/s is achieved. Communication between the embedded micro controllers or the SPECS masters and the various clients of the control system is achieved using the DIM protocol [5], which is a portable lightweight publish/subscribe system. A DIM server is running on the embedded micro controller of the electronics boards or the PC holding the SPECS master card. This server performs all the required actions on the various different electronic devices, e.g. write and read operations on registers and memory blocks, FPGA programming, monitoring of registers, etc. The server is started automatically and publishes several DIM services, which can be single items, arrays or structured items on the DIM name server (DNS) which keeps its coordinates and from where they can be requested by the clients. Services can be subscribed to either on change or on a time basis. In addition commands can be sent from the client to the server. Both, services and commands, can invoke a callback function. A usual procedure is that a command, requesting a read or write operation, is sent to the server where in the callback function the interaction with the hardware is executed and the data are sent back to the client by updating a service calling again a callback function on the client side. The
ARTICLE IN PRESS 330
S. Koestner / Nuclear Instruments and Methods in Physics Research A 608 (2009) 328–330
UNKNOWN
ERROR
NOT READY Configure Reset READY Start Stop RUNNING Fig. 4. Transition scheme as defined for finite state machines of the DAQ domain.
advantage of this design is its portability, as clients can be installed on any machine by just specifying the DNS node. It also contributes to robustness as a crashed server can easily republish its services on the DNS node.
3. Abstraction layer and finite state machines The PVSS SCADA system is organized in datapoints. Datapoints are complex structures which are connected to the PVSS internal memory database. If an entry of a datapoint is changing, a callback function can be triggered. A special PVSS API manager allows associating DIM commands and DIM services to datapoints. An electronic device, in order to be described and controlled in PVSS, has to be modeled as a datapoint type. Each physical device corresponds then to an instantiation of a datapoint type, which reflects the entire state of all controllable resources on the board. In order to facilitate the modeling of hardware as PVSS datapoints a tool was introduced, FwHw [7]. This tool offers a graphical user interface allowing to assign necessary information like bus address or register length to the various register types. The tool automatically creates a well-defined datapoint structure connected to the DIM API manager and sends instructions to the server to store the register’s information in a list. Upon subscription of the registers the server publishes the appropriate services and is ready to receive commands. From this moment the communication between server and client is established. When a command is sent, the server recognizes the type of register and acts accordingly. This allows to hide diversity and complexity of the various hardware and bus types within the server. Separation of the access mechanism on the hardware and the modeling of the components in the control system allows to reuse the various components on different hardware types even using different protocols, e.g. SPECS or CCPC. In addition the tool offers the possibility to define ‘recipes’. Recipes are a bundle of predefined register settings which can be retrieved from a configuration database and uploaded to the electronic devices. Different settings can be applied for different run conditions. To allow simple operation and monitoring each electronic device is represented by a device unit modeled as a finite state machine. The finite state machine modeling is realized
by using the SMI ++ framework [8] which comprises a language to define the ‘states’, ‘actions’ and ‘rules’. It has to be distinguished between device units being always the leaves of a hierarchical tree acting directly on the hardware and control units which can send commands down to their children which itself can further be control units or device units. If the state of any child changes, rules can be defined according to which the state of the whole control unit changes. Each electronic device has a set of predefined states. Fig. 4 shows the states and the allowed transitions for devices attached to the DAQ domain. A device unit is always connected to a datapoint representing the hardware. A state transition of a device unit can be directly triggered by the hardware if some datapoints connected to real registers are changing. If a device unit goes into an error state it can automatically try to recover by acting on the hardware. For DAQ boards the main operations are to download the firmware to the FPGAs, which is allowed in state NOT READY, and the configuration of the registers. The main configuration of the board, basically downloading recipes from the configuration database, takes place from state NOT READY to READY. The content of the recipes can differ according to the run type which can be specified as parameter when launching the command for configuration. Little action is required from state READY to RUNNING. The state UNKNOWN is accessible from all states and basically defines the state when control is lost, e.g. DIM server has crashed. Also the ERROR state is accessible from all other states. Manual or automatic procedures can be launched to recover from error conditions and transit to the other operative states. More detailed information about the various software layers can be taken from the PVSS/TELL1 homepage which contains introductory material and tutorials intended for user-support [9]. 4. Conclusion LHCb has chosen to use embedded processors with an isolated access path for board control in the counting house area and the long distance serial link protocol, SPECS, for accessing hardware inside the radiation area. Both choices have proved to be a robust solution and have been extensively under test on various occasions. A well-structured ensemble of software layers has been developed which disentangles the hardware layer from the control system. This allows for communication to hardware registers in an abstract manner. The entire experiment is modeled as a hierarchical tree using finite state machines which allows to operate the whole experiment in an intuitive way, even by nonexperts. References [1] LHCb Collaboration, LHCb Reoptimized Detector, Design and Performance, CERN/LHCC 2003-030. [2] A. Daneels, W. Salter, The LHC Experiments Joint COntrols Project, JCOP, ICALEPCS, Triste, 1999. [3] PVSS /http://www.pvss.comS. [4] D. Breton, D. Charlet, P. Robbe, I. Videau, SPECS: A Serial Protocol for the Experiment Control System of LHCb, October 2005. ¨ [5] C. Gaspar, M. Donszelmann, Ph. Charpentier, DIM, a portable, light weight package for information publishing, data transfer and inter-process communication, Presented at International Conference on Computing in High Energy and Nuclear Physics, Padova, Italy, 1–11 February 2000. [6] F. Fontanelli, G. Mini, M. Sannino, Z. Guzik, R. Jacobsson, B. Jost, N. Neufeld, IEEE Trans. Nucl. Sci. NS-53 (3) (2006). [7] /http://lhcb-online.web.cern.ch/lhcb-online/ecs/FWHW/default.htmlS. [8] B. Franek, C. Gaspar, IEEE Trans. Nucl. Sci. NS-47 (2) (2000) 86. [9] S. Koestner /http://lhcb-online.web.cern.ch/lhcb-online/ecs/PVSS_TELL1/default. htmlS.