Automatic spike detection in beam loss signals for LHC collimator alignment

Automatic spike detection in beam loss signals for LHC collimator alignment

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18 Contents lists available at ScienceDirect Nuclear Inst. and Methods in Physics Res...

3MB Sizes 1 Downloads 39 Views

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Contents lists available at ScienceDirect

Nuclear Inst. and Methods in Physics Research, A journal homepage: www.elsevier.com/locate/nima

Automatic spike detection in beam loss signals for LHC collimator alignment Gabriella Azzopardi a,b ,∗, Gianluca Valentino a , Adrian Muscat a , Belen Salvachua b a b

University of Malta, Malta CERN, Geneva, Switzerland

ARTICLE Keywords: Large Hadron Collider Collimation Machine learning Spike detection Pattern recognition Automatic alignment

INFO

ABSTRACT A collimation system is installed in the Large Hadron Collider to protect its super-conducting magnets and sensitive equipment from potentially dangerous beam halo particles. The collimator settings are determined following an alignment procedure, whereby collimator jaws are moved towards the beam until a suitable spike pattern, consisting of a sharp rise followed by a slow decay, is observed in nearby beam loss monitors. This indicates the collimator jaw is aligned to the beam. The current method for aligning collimators is semiautomated whereby an operator must continuously observe the loss signals to determine whether the jaw has touched the beam, or if some other perturbation in the beam caused the losses. The human element in this procedure can result in errors and is a major bottleneck in automating and speeding up the alignment. This paper proposes to automate the human task of spike detection by using machine learning. A data set was formed from previous alignment campaigns, from which fourteen manually engineered features were extracted and six machine learning models were trained, analysed in-depth and thoroughly tested. The suitability of using machine learning in LHC operation was confirmed during collimator alignments performed in 2018, which significantly benefited from the models trained through machine learning in this study.

1. Introduction The Large Hadron Collider (LHC) at CERN was built to deliver proton–proton collisions at a centre of mass energy up to 14 TeV, making it the particle accelerator with the largest centre-of-mass energy in the world [1]. It can also operate with ions and has delivered both lead– lead, xenon–xenon, and proton–lead collisions, however this paper will focus on proton–proton collisions. The LHC is susceptible to beam losses from normal and abnormal conditions, which can negatively impact the state of superconductivity in its magnets. Therefore it must be protected from any damage or down-time that may be caused due to beam losses [2]. A robust collimation system consisting of 98 collimators protects the LHC. Each collimator is made up of two parallel blocks, referred to as jaws, inside a vacuum tank. The jaws are identified as left or right depending on the beam direction in the collimator, and must be positioned symmetrically around the beam as shown in Fig. 1. Each of the four jaw corners can be moved individually using dedicated stepper motors. An additional fifth degree of freedom is provided by the possibility to displace the collimator tank in the orthogonal plane to the beam cleaning one. In the case of a catastrophic beam damage scenario, this functionality would be used to offer a fresh collimating surface to the beam without the need to replace the collimator. Collimators are installed to clean in the horizontal, vertical or skew planes to cover as much of the phase space as possible.

The collimator jaws are positioned with an accuracy of 5 μm around the circulating beam, with the tightest operational gap being around 1.5 mm at top energy. A beam-based alignment procedure is used to align collimators, which is the process of moving collimator jaws towards the beam halo whilst monitoring the measured beam loss signal. A collimator is classified as aligned when both jaws centred around the beam after touching the beam halo (i.e. before retracting them to the operational settings), which is indicated by signature spike pattern in the recorded beam loss signal. At present, collimation experts are required to manually detect and classify such spikes following training and experience. Collimators are aligned at different machine states; at injection (450 GeV) a total of 79 collimators are aligned, and at flat top (6.5 TeV) a total of 75 collimators are aligned. The collimators are aligned each year before the start of operation, in order to ensure the correct setup for the LHC to achieve nominal operation. The settings are monitored along the year via beam loss maps, which consist of the spatial distribution of beam losses around the ring generated following an intentional blow-up of the beam in the transverse plane which provides sufficient resolution. Over time, typically several months of operation, the beam orbit may shift due to ground motion, thermal effects and machine effects [3], therefore the collimators might need to be realigned. Moreover, different collimator setups are required when machine parameters are changed, such as the 𝛽*, which defines the

∗ Corresponding author. E-mail address: [email protected] (G. Azzopardi).

https://doi.org/10.1016/j.nima.2019.04.057 Received 21 March 2019; Received in revised form 12 April 2019; Accepted 14 April 2019 Available online 2 May 2019 0168-9002/© 2019 Elsevier B.V. All rights reserved.

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Fig. 2. The jaws of collimator i around the beam, which is assumed to have a Gaussian distribution of particles in the transverse plane. The collimator’s left jaw is scraping the beam halo and the showers are detected by the corresponding BLM detector downstream.

Fig. 1. An LHC collimator inside the casing, with its jaws on either side of the beam (indicated by the arrow). Source: Adapted from [4]

colliding beam size in the experimental points where the beams are brought into collisions. This motivated the development of an automatic method, to allow for collimator alignments to be performed more efficiently and upon request, at regular intervals. Automating the alignment procedure requires replacing each of the user tasks with dedicated algorithms. This paper proposes to automate one of these tasks by automatically classifying spikes in the losses using pre-trained machine learning models. Furthermore, collimators have always been aligned assuming no tilt between the collimator and the beam. Tank misalignments or beam envelope angles at large-divergence locations could introduce a tilt, which would limit the collimation performance. A recent study [5] introduced three novel angular alignment methods to determine a collimator’s most optimal angle, however these methods also make use of the semi-automated software. Introducing tighter settings would increase the importance of angular alignments as they would need to be performed more frequently. As a result, this further motivates the need for an automatic approach using machine learning for spike detection. The paper is organized as follows. The collimation system and beam loss monitoring system are described in Section 2. Section 3 discusses related work on machine learning used for spike detection in timeseries data. Details on the construction of the data set and training of the machine learning models studied in this paper are described in Sections 4 and 5. Finally, Section 6 discusses the results obtained when machine learning was deployed in LHC operation to automatically classify spikes in real-time.

Fig. 3. Typical BLM signals as a function of time showing examples of (a) an alignment spike and (b) non-alignment spikes, after a collimator movement towards the beam.

order to detect beam losses generated when halo particles impact the collimator jaws, as shown in Fig. 2. These particle losses are proportional to the amount of beam intercepted by the collimator jaws, which are in units of Gy/s. 2.1. Beam-based alignment Collimators are aligned using a beam-based alignment procedure to be able to determine the beam centre and beam size at each collimator. This is a four-step procedure established in [6], which was tested with a prototype collimator in the Super Proton Synchrotron (SPS) [7] and was used in the LHC from the start-up in 2010 onwards [8]. The alignment of the collimators is beam-based as a collimator’s jaws are moved towards the beam whilst observing the presence of any spikes in the beam loss signal of its respective BLM. The alignment of any collimator relies on being able to classify between alignment spikes and non-alignment spikes, such that a collimator must continuously move towards the beam ignoring any non-alignment spikes, until a clear alignment spike is observed. Fig. 3(a) shows an example of a clear alignment spike indicating the collimator in question is aligned, whilst Fig. 3(b) shows an example of non-alignment spikes indicating that the collimator has not yet touched the beam halo and must resume its alignment. This alignment procedure is crucial as it is a pre-requisite for every machine configuration to set up the system for high intensity beams. In

2. Collimation system The LHC collimation system protects the LHC from beam losses with a 99.998% cleaning efficiency of all halo particles. Collimators are set up in the form of a hierarchy, whereby primary collimators (TCP) are placed closest to the beam to intercept the primary halo particles; the secondary collimators (TCSG) are retracted from the primary ones to clean secondary particles; the absorbers (TCLA) absorb the remaining showers; and the tertiary collimators (TCT) are further retracted to provide a local protection for the LHC’s experiments. In order to preserve this cleaning hierarchy, the collimators need to be aligned with a precision of less than 50 μm. The collimators are mainly concentrated in two dedicated cleaning insertion regions (IR); IR3 for momentum cleaning and IR7 for betatron cleaning. Each collimator has a dedicated Beam Loss Monitor (BLM) positioned outside the beam vacuum, immediately downstream, in 11

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Fig. 4. Examples of various beam loss shapes classified as alignment spikes (in green) and non-alignment spikes (in red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

previous years, the hardware and software required for this procedure have been improved in various ways and the latest software used to align collimators is a semi-automated BLM-based algorithm [9,10]. This software is able to automatically move collimator jaw(s) towards the beam and stop them when the BLM losses exceed a predefined threshold. Once the jaw(s) stop moving, the operator is required to analyse the respective BLM losses to classify the spikes to determine whether the collimator jaw(s) are aligned or not, and proceed accordingly. 2.2. Beam loss signals in collimator alignment Automating the process of spike detection can be cast as a classification problem, by training a machine learning model to distinguish between alignment and non-alignment spikes, as shown in Fig. 4. A clear alignment spike consists of; a steady-state signal before the spike, the loss spike itself, temporal decay of losses, and a steady-state signal after the spike. The steady-state is a result of continuous scraping of halo particles when the jaw positions are fixed. The further a jaw cuts into the beam halo the greater the increase in steady-state signal, as the density of the particles near the jaw increases. As shown in Fig. 4, alignment spikes can have various shapes, including;

Fig. 5. Spikes highlighted in EEG recording. Source: Adapted from [13].

often used to diagnose epilepsy, which causes abnormalities in EEG readings. Successful epilepsy diagnoses heavily depend on the detection of interictal (between seizures) paroxysmal epileptic discharges (IPED), the spikes. An EEG spike varies from the background activity as it has a pointed peak and lasts between 20–70 ms [11], as shown in Fig. 5. A subset of EEG is intracranial electroencephalography (iEEG), which uses electrodes placed directly on the exposed surface of the brain to record electrical activity from the cerebral cortex, thus making it more accurate [12]. Research has been done in search for algorithms to automatically detect spikes in EEG recordings and a number of machine learning models have been used in novel methods presented in recent papers. The algorithm presented in [14] makes use of a Random Forest binary Classifier [15] to detect spikes in both EEG and iEEG recordings. The data used was processed using notch and bandpass filters, then features were extracted based on discrete wavelet transform (DWT). This generated the feature vectors which were inputted directly into the Random Forest Classifier. The algorithm resulted in 62% recall and 26% precision for surface EEG and 63% recall and 53% precision for iEEG. These results indicate limited precision, however this can possibly be explained through a number of phenomena, therefore it was concluded that the proposed method has potential for diagnosis support in clinical environments.

• The maximum value can be high (1, 3, 4) or low (2) • The decay of losses can be short (1, 2, 3) or long (4) • The decay of losses can be noisy (1, 2, 4) or smooth (3) On the other hand, non-alignment spikes do not have a fixed structure and can contain spurious high spikes (1, 2, 3). These can arise due to various reasons, such as; beam instabilities, mechanical vibrations of the opposite jaw when close to the beam, or drifts of the beam position. 3. Related work 3.1. Spike detection in EEG Electroencephalography (EEG) is a method used to record brain activity through electrodes placed on the surface of the scalp and records cumulative electrical activity of the brain in time. EEG is most 12

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Another study performed in [13] inputs frequency-band amplitude features into a Support Vector Machine [16] to detect spikes in iEEG signals. These features are based on the rhythms of brain waves, whereby different rhythms correspond to different brain states. The data set used consists of 875 spikes and 2125 non-spikes. The averaged performance of this algorithm achieved 98.44% sensitivity, 100% selectivity and 99.54% accuracy, however this algorithm is limited, as it only uses single-channel (univariate) information for spike detection. Moreover, the algorithm presented in [17] uses an Artificial Neural Network [18]. This algorithm first processed the data, then segmented it into single-channel segments which were then combined into multi-channel segments. For each multi-channel segment, features were extracted from the wave characteristics. The data set used contains 5582 spikes and 264193 non-spikes and the averaged performance of the model achieved a 44.1% detection sensitivity, 56.2% positive predictive value. These results were considered favourable for a first implementation. 3.2. Spike detection in LHC collimation A first attempt at using machine learning to detect spikes in BLM signals for LHC collimator alignments, was performed in [19]. This work involved fitting a Gaussian function to the mirrored loss spike and a Power function to the temporal decay component. Six features were extracted from the data: 1. Maximum Value — The maximum of the ten BLM values after the jaws stopped moving. 2. Minimum Average — The average of the three smallest points of the seven loss points preceding the maximum value. These smallest values allow for eliminating any spikes due to a previous movement. An optimal spike generally has a high maximum value relative to the minimum average. 3. Variance — The width of the Gaussian fit, whereby an optimal spike would have a smaller width as this would reflect a sharp increase and a quick decrease of the losses. 4. Gaussian Correlation Coefficient — The proximity of the loss pattern to the Gaussian fit, such that the closer this value is to unity, the sharper the loss spike. 5. Power Coefficient — A steep temporal decay indicates an optimal spike. 6. Power Correlation Coefficient — The temporal decay becomes smoother as this value approaches unity, and this value indicates how well the Power fit represents the loss pattern.

Fig. 6. Two data samples extracted for spike classification, whereby the left and right jaw movement towards the beam generated alignment and non-alignment spikes, respectively. The convention is to have the left jaw on the positive side of the beam axis, and the right jaw on the negative.

4.1. Data acquisition

These features were scaled and used to train a Support Vector Machine with the radial basis function (RBF) kernel. The data set contained 444 samples at 3.5 TeV, and the model achieved an accuracy of 82.4%. Due to the low accuracy, it was decided that this was not adequate to deploy in operation. Since then, the energy in the LHC has doubled, thus forming new spike patterns in the BLM loss signal. In addition, BLM data was acquired at a rate of 1 Hz in 2010 and 2011, whereas presently, data is available at 100 Hz, therefore resulting in a higher temporal resolution. The goal is to achieve a higher recognition rate, so that the model can be incorporated into the fully-automatic alignment algorithm.

The data logged during alignment campaigns consist of the 100 Hz BLM signals and the collimator jaw positions logged at a frequency of 1 Hz. The data extracted for the data set consists of the moments when each collimator jaw(s) stopped moving, when the losses exceeded the threshold defined for the semi-automatic alignment. The BLM signals at the moments the collimators stopped moving were individually analysed and labelled by experts into the two classification classes, as per the example shown in Fig. 6. The data was extracted by first collecting all the jaw movements towards the beam that were closer than ±10 mm from the beam axis. These movements were then divided into left and right jaw movements, and the times of these movements were extracted to obtain the corresponding BLM signals. The time window to extract the BLM signal at each collimator movement starts and ends a maximum of 20 s before and after the current movement, as this contains all the required information. The time window was set to keep a 2 s gap between the previous and the next collimator movements, in order to ensure that there is no overlap between the decay of the first sample and the steady state of the next sample. If the next movement started within 2 s after the

4. Data set construction Data was gathered from five semi-automatic collimator alignments campaigns performed in 2016, both at injection (450 GeV) and at flat top (6.5 TeV). A total of 1652 samples were extracted, 467 positive (alignment spikes) and 1185 negative (non-alignment spikes). This data set was used to engineer useful features, as inputs to machine learning models for spike detection. 13

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Fig. 8. The list of features ranked in overall importance according to the individual machine learning models. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Gaussian and Power functions did not manage to fit the decay of all the data samples, as the least-squares minimization used when fitting the functions, failed. From these observations it can be concluded that the exponential function is a better and more reliable fit to the decay.

Fig. 7. Heatmap of feature pairwise Spearman correlation indicating; Maximum value, Height and Factor are correlated (blue highlight), the decay fits do not seem to be fully correlated (green highlight) and most features have some correlation with the spike class (pink highlight). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

5. Model training current movement, or the BLM losses for the entire window are less than 1e−6 Gy/s, then the current data sample is discarded.

This section focuses on training, optimizing and comparing the performance of six different classifiers using the engineered features. The exponential function has already been selected as the best function for fitting the decay, therefore after removing the Power and Gaussian functions’ features, the nine remaining features are further analysed in this section, whilst optimizing the machine learning models accordingly. The models used in this section are; Logistic Regression, Neural Network, Support Vector Machine, Decision Tree [21], Random Forest and Gradient Boost. The scikit-learn [23] implementations were used. When aligning any collimator it is vital that after the spike detection declares a collimator to be aligned, then the collimator is actually aligned. As a result, false detection of an alignment spike is more grievous than not detecting an alignment spike, therefore precision will be used as the main performance metric throughout this section.

4.2. Feature engineering Fourteen features were initially extracted for spike detection from the alignment data set as shown in Table 1. The first five features were taken from [19], explained in Section 3.2. This feature set was augmented using the three coefficients from the Power function, whilst seven features were newly engineered for this study. 4.3. Feature analysis In order to select the most relevant features, the strength of association between each pair of variables was first analysed using the Spearman Correlation, as shown in Fig. 7. This is a non-parametric test used to check for a statistically significant relationship without assuming the distribution of the data. This test is necessary so as to avoid selecting correlated features. The Maximum value, Height and Factor, were found to be correlated, whilst Exp_b and Pow_b are not correlated to any other features except with each other. A feature selection analysis was then performed using five different machine learning models to see how they order the importance of each of the features. The models used, specifically designed for feature selection are; Logistic Regression [20], Linear Support Vector Machine, Extra Trees Classifier [21], Random Forest Classifier and Gradient Boosting Classifier [22]. The models were individually trained using all features and outputted the features ranked in ascending order, according to their importance. The outputs from the models are stacked in Fig. 8, which displays the overall rank of all the features in ascending order. The Power, Gaussian and Exponential Correlation Coefficients are used in this section to measure the reliability of the functions’ fit to the data. The results in Fig. 8 clearly indicate that the exponential fit to the decay is the first most important feature. Moreover, the

5.1. Hyper-parameter optimization The hyper-parameters were optimized for each model. An exhaustive grid search algorithm, guided by precision as a performance metric, was applied on the training set using a 10-fold cross-validation randomly stratified 30 times, in order to handle lucky splits. The data set was split using stratified sampling into an 85% training set and a 15% held-out testing set. The following parameters were tested for the corresponding models and the best parameters (in bold) were selected according to the highest precision obtained on the held-out testing set. • Logistic Regression (LR): – Regularization - 1e−4, 1e−3, 1e−2, 1e−1, 1, 10, 100 • Neural Network (NN): Default Architecture - 1 hidden layer with 100 units – Hidden layer activation - tanh, relu 14

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Table 1 A list of the fourteen features extracted from the data set. Name

Description

Numerical range

Maximum value

The highest point taken to be within a two second range after the collimator stopped moving, from [19]. The width of the Gaussian fit, whereby an optimal spike would have a smaller width as this would reflect a sharp increase and a quick decrease of the losses, from [19].

0.7 : 26000

Gaussian Correlation Coefficient

The proximity of the loss pattern to the Gaussian fit using least squares, such that the closer this value is to unity, the sharper the loss spike from [19].

0 : 1

Power function (a) Gradient (Pow_a) (b) Power decay (Pow_b) (c) Horizontal Asymptote (Pow_c)

Function to fit the decay using 𝐚𝑥𝐛 + 𝐜 This is proportional to the decay steepness. This is the rate of decay, from [19]. This is proportionate to the new steady state.

– −15000 : 80 0 : 200 −0.033 : 15000

Power Correlation Coefficient

The temporal decay becomes smoother as this value approaches unity, and this value indicates how well the Power fit represents the loss pattern using least squares, from [19].

0 : 1

Height

This is calculated by subtracting the average steady state from the Maximum value. The average steady state is calculated from the BLM signal after the decay of the previous alignment, until the current collimator was stopped.

0.456 : 26000

Factor

This is calculated by taking the Maximum value as a fraction of the average steady state. The average steady state is calculated from the BLM signal after the decay of the previous alignment, until the current collimator was stopped.

1.46 : 1130

Exponential function (a) y-Intercept (Exp_a) (b) Gradient (Exp_b) (c) Horizontal Asymptote (Exp_c)

Function to fit the decay using 𝐚𝑒−𝐛𝑥 + 𝐜 This is proportional to the height of the spike. This is the rate of decay. This is proportionate to the new steady state.

– −5800 : 15600 −20 : 2200 −3500 : 6000

Exponential correlation coefficient

Indicates how well the exponential fit represents the decay using least squares.

0 : 1

Position in sigma (Pos)

A beam size invariant way of expressing the fraction of the normally distributed beam interrupted by the jaw, as the beam size in mm varies across locations in the accelerator.

0 : 80

Variance

Table 2 Ranking of best features selected by each model.

– Weight optimization solver - L-BFGS [24], Stochastic gradient descent, Adam [25] – Regularization - 1e−4, 1e−3, 1e−2, 1e−1, 1, 10

Best feature ranking

• Support Vector Machine (SVM): – Linear kernel - Penalty - 1e−4, 1e−3, 1e−2, 1e−1 – RBF kernel - Gamma - 1e−4, 1e−3, 1e−2, 1e−1 - Penalty - 1e−3, 1e−2, 1e−1, 1, 10 • Decision Tree (DT): – – – –

Split criterion - Gini Impurity, Entropy Splitter at each node - best, random Maximum tree depth - 2, 3 Minimum samples at leaves - 1, 5, 10, 20, 40

Number of trees - 10, 50, 100, 150 Split criterion - Gini Impurity, Entropy Maximum tree depth - 2, 3 Minimum samples at leaves - 1, 5, 10, 20, 40

• Gradient Boost (GB): – – – –

ML model

#1

#2

#3

#4

#5

Logistic regression Neural network SVM Decision tree Random forest Gradient boost

Height Height Height Pos Height Height

Exp_b Exp_b Exp_b Height Exp_b Pos

Pos Exp_a Pos Exp_b Pos Exp_b

Exp_a

Exp_c

Exp_a

Exp_c

A closer look at the training of each model using the selected features and hyper-parameters is shown in Fig. 10, in which the learning curves over various data set sizes using a 10-fold cross-validation randomly stratified 30 times, are plotted. The results show that the Gradient Boost model is suffering from high variance on the training set, whilst the learning curves for the rest of the models on the two sets converge, indicating that the models are successfully learning. Overall the Logistic Regression model obtained the best results, followed by the Support Vector Machine.

• Random Forest (RF): – – – –

0.002 : 0.133

5.2. Model robustness

Learning rate - 1e−4, 1e−3, 1e−2, 1e−1, 1, 10 Number of trees - 10, 50, 100, 150, 200, 250 Maximum tree depth - 2, 3 Minimum samples at leaves - 1, 5, 10, 20, 40

The six machine learning models were trained on 2016 data, achieving high precision. Fig. 11 plots the precision results obtained by each of the models and their Ensemble. Throughout this section the results are collected over a 10-fold cross-validation randomly sampled 30 times. Moreover, Tukey’s HSD test [27] was used to determine whether the means of the results obtained by the models are significantly different. This test is based on the Student t-test while adjusting the p-values for multiple comparison tests, therefore allowing to compare all possible pairs of means. Due to this, each box plot comparing means in this section has a corresponding table of the results obtained using Tukey’s test. In the case of Fig. 11, Tukey’s test in Table 3 indicates that the mean precision obtained by the Support Vector Machine is not significantly different from that of the Logistic Regression, but significantly greater than the rest of the models. In addition, the mean of the Ensemble model is also similar to

The grid search was nested within the sequential forward selection algorithm (SFS) [26], to select the best features with the best hyper-parameters. The SFS algorithm tries all feature combinations by introducing one feature at a time and keeping the best feature for future combinations. Moreover, from Section 4.3 we identified three correlated features (Maximum value, Height and Factor), therefore if a model selects any of these three, the SFS algorithm used will remove the other two features from future combinations, to avoid selecting correlated features. The precision results from the various feature combinations using the best performing hyper-parameters for each combination are plotted in Fig. 9, and the selected best features are listed in Table 2. 15

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Fig. 9. The precision obtained by each of the models using the different feature combinations from the SFS with nested grid search. Each feature is indicated by a unique shape, such that the selected features are marked in red and the features ignored due to correlation are marked in black. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

the Logistic Regression, whereas the other models obtained lower and similar results. To test the robustness of the machine learning models, new data was gathered from six alignment campaigns performed in 2018, both at injection and flat top. A total of 4794 samples were extracted, 3912 positive (alignment spikes) and 882 negative (non-alignment spikes). Each model was first tested on the entire 2018 data after being trained on the 2016 data, therefore cross-validation was not applied as the training and testing sets are fixed in this case. Table 4 lists the

precision results obtained, indicating that each model obtained high precision, with the Neural Network obtaining the lowest precision at 0.948. Fig. 12 plots the precision results obtained using solely 2018 data for training and testing, and the Support Vector Machine obtained the best results, followed by the Ensemble and Neural Network (see Table 5). These results indicate that using the 2018 data set for training can further improve the predictions of the models, therefore they were trained and tested on both data sets combined, and Fig. 13 plots the 16

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Fig. 12. The precision obtained by each model and their Ensemble, using the 2018 data set.

Fig. 10. The learning curves based on the precision obtained by each model on the training and testing sets, using the best selected features and hyper-parameters, on various data set sizes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 5 2018 data set: Tukey HSD Test — Models with the same letter are not significantly different, Alpha = 0.05. ML model

Precision

Groups

SVM ALL NN GB DT RF LR

0.9749714 0.9678411 0.9659314 0.9598743 0.9458548 0.9358178 0.8924207

a b b c d e f

Fig. 11. The precision obtained by each model and their Ensemble, using the 2016 data set.

Table 3 2016 data set: Tukey HSD Test — Models with the same letter are not significantly different, Alpha = 0.05. ML model

Precision

Groups

SVM LR ALL RF DT NN GB

0.9930386 0.9909449 0.9873208 0.9853248 0.9840476 0.9824100 0.9821891

a ab bc cd cd d d

Fig. 13. The precision obtained by each model and their Ensemble, using the combined 2016 and 2018 data sets. Table 6 2016+2018 data set: Tukey HSD Test — Models with the same letter are not significantly different, Alpha = 0.05.

Table 4 The precision obtained by each model and their Ensemble, using the 2016 data set for training and the 2018 data set for testing. ML model

Precision

ALL SVM RF DT LR GB NN

0.9913721 0.9905603 0.9860101 0.9813967 0.9788605 0.9683479 0.9443495

ML model

Precision

Groups

SVM ALL NN GB RF LR DT

0.9749064 0.9726233 0.9690865 0.9612389 0.9488177 0.9326096 0.9319209

a a b c d e e

Logistic Regression obtained the lowest precision results overall, with similar results to the Decision Tree. For the case combining 2016 and 2018 data, the bootstrap method [28] was used to estimate the performance of each of the machine learning models. This method is a re-sampling technique used to estimate summary statistics of the models when making predictions. In contrast to cross-validation, this method provides confidence intervals together with the precision of each machine learning model. Table 7

precision results obtained. These results and Table 6 indicate that the Support Vector Machine obtained the best precision, with a mean similar to the Ensemble, which has a more stable range of results. The 17

G. Azzopardi, G. Valentino, A. Muscat et al.

Nuclear Inst. and Methods in Physics Research, A 934 (2019) 10–18

Table 7 95% confidence interval on precision for each model. ML model

Precision interval

Logistic regression Neural network SVM Decision tree Random forest Gradient boost

89.0 97.0 97.0 93.0 94.0 96.0

-

collimators. This paper presents the first comprehensive work on using machine learning techniques to classify spike patterns in a time-series data set, and is the first successful machine learning application used in LHC collimation operation.

98.0 98.0 98.0 95.0 96.0 98.0

References [1] L. Evans, The large hadron collider, New J. Phys. 9 (9) (2007) 335. [2] R. Aßmann, et al., Requirements for the LHC collimation system, Tech. rep., 2002. [3] R. Steinhagen, LHC beam stability and feedback control-orbit and energy, (Ph.D. thesis), RWTH Aachen U., 2007. [4] R. Aßmann, et al., Operational experience with LHC collimation, in: Proceedings of IPAC09, Vancouver, BC, Canada, PAC-2009-TU4GRI01, 2009, pp. 789–793. [5] G. Azzopardi, et al., Automatic angular alignment of LHC collimators, in: Proceedings of ICALEPCS’17, Barcelona, Spain, 2017, pp. 928–933. [6] R. Aßmann, et al., Expected performance and beam-based optimization of the LHC collimation system, in: Proceedings of the PAC’04, Lucerne, Switzerland, 2004, pp. 1825–1827. [7] S. Redaelli, et al., Operational experience with a LHC collimator prototype in the CERN SPS, in: Proceedings of the PAC’09, Vancouver, Canada, 2009, pp. 2835–2837. [8] D. Wollmann, et al., First cleaning with LHC collimators, in: Proceedings of the IPAC’10, Kyoto, Japan, 2010, pp. 1237–1239. [9] G. Valentino, S. Redaelli, R. Aßmann, N. Sammut, D. Wollmann, Semi-automatic beam-based alignment algorithm for the LHC collimation system, in: Proceedings of the IPAC’11, San Sebastian, Spain, 2011, pp. 3768–3770. [10] G. Valentino, et al., Semiautomatic beam-based LHC collimator alignment, Phys. Rev. Spec. Top. Accel. Beams 15 (5) (2012). [11] P. Xanthopoulos, et al., A novel wavelet based algorithm for spike and wave detection in absence epilepsy, in: 2010 IEEE International Conference on BioInformatics and BioEngineering, IEEE, 2010, pp. 14–19. [12] J.X. Tao, A. Ray, S. Hawes-Ebersole, J.S. Ebersole, Intracranial EEG substrates of scalp EEG interictal spikes, Epilepsia 46 (5) (2005) 669–676. [13] B. Yang, Y. Hu, Y. Zhu, Y. Wang, J. Zhang, Intracranial EEG spike detection based on rhythm information and SVM, in: 2017 9th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 2, IEEE, 2017, pp. 382–385. [14] J. Le Douget, A. Fouad, M.M. Filali, J. Pyrzowski, M. Le Van Quyen, Surface and intracranial EEG spike detection based on discrete wavelet decomposition and random forest classification, in: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2017, pp. 475–478. [15] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [16] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, 1992, pp. 144–152. [17] W. Ganglberger, et al., A comparison of rule-based and machine learning methods for classification of spikes in EEG, J. Clin. Microbiol. 12 (10) (2017) 589–595. [18] C.M. Bishop, Pattern Recognition and Machine Learning, springer, 2006. [19] G. Valentino, R. Aßmann, R. Bruce, N. Sammut, Classification of LHC beam loss spikes using support vector machines, in: 2012 IEEE 10th International Symposium on Applied Machine Intelligence and Informatics (SAMI), IEEE, 2012, pp. 355–358. [20] D. Kleinbaum, K. Mitchel, Logistic Regression, Springer, 2002. [21] P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Mach. Learn. 63 (1) (2006) 3–42. [22] J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Statist. (2001) 1189–1232. [23] L. Buitinck, et al., API design for machine learning software: experiences from the scikit-learn project, 2013. [24] D. Liu, J. Nocedal, On the limited memory BFGS method for large scale optimization, Math. Program. 45 (1–3) (1989) 503–528. [25] D. Kingma, J. Ba, Adam: A method for stochastic optimization, in: Proceedings of the 3rd ICLR, 2015. [26] S.J. Reeves, Z. Zhe, Sequential algorithms for observation selection, IEEE Trans. Signal Process. 47 (1) (1999) 123–132. [27] J.W. Tukey, et al., Comparing individual means in the analysis of variance, Biometrics 5 (2) (1949) 99–114. [28] B. Efron, R. Tibshirani, An introduction to the bootstrap, CRC Press, 1994. [29] G. Azzopardi, et al., Spike Pattern Recognition for Automatic Collimation Alignment, Tech. rep., 2018.

lists the 95% confidence intervals of the precision that can be obtained by each model, indicating that the better models are the Neural Network and Support Vector Machine. 6. Model validation in LHC operation The results obtained in this paper achieved a consistent precision of over 95%, higher than the previous work discussed in Section 3.2, which achieved a score of 82.4% on unseen data. Moreover this study has focused on more recent LHC data, at new higher energies at 6.5 TeV, thus making it suitable for present operational use. The machine learning models presented in this paper have been successfully deployed in operation throughout 2018, with the aim of transforming the semi-automatic alignment into a fully-automatic one. At first, tests were carried out in the LHC at injection [29], which involved having a collimation expert align collimators using the semiautomatic procedure, whilst the Ensemble model was run in parallel as an independent component. The performance of the machine learning model was determined by comparing its classifications to those of the user as the collimators were being aligned in real-time, and it correctly classified 50/52 alignments. The two incorrectly classified spikes were false negatives therefore the only repercussion was a re-alignment, i.e moving the jaw further in to obtain another spike. These tests confirmed the reliability of introducing machine learning for aligning collimators. As a result, machine learning was used for all future alignment campaigns at injection and flat top, using both proton and ion beams. This made it possible to fully automate the alignment process without requiring any human intervention, and as a result the alignment time was decreased by a factor of three. Moreover, the machine learning was also incorporated into the angular alignment software to fully-automate it. 7. Conclusion The collimation system can offer maximum protection to the LHC if the collimators jaws are precisely positioned around the beam in the form of a hierarchy. Alignments are currently performed using a semi-automatic tool, whereby collimators are automatically moved towards the beam, whilst a collimation expert continuously observes the corresponding BLM signal. The expert is required to distinguish between alignment spikes and non-alignment spikes observed in this BLM signal, to determine whether a collimator is aligned or not. This paper proposes to use machine learning to automatically classify between the two classes of temporal beam loss patterns, as a necessary step to fullyautomate the entire alignment procedure. Past alignment campaigns were studied by extracting all collimator alignments performed, to form a data set. Fourteen features were initially extracted from this data set, which were then analysed in detail. Six machine learning models were trained and tested, whilst selecting their best features and hyperparameters, and achieved a precision of over 95%. Robustness testing was then applied to each of the models, and confirmed that the machine learning models presented in this paper are indeed reliable to be used during LHC operation. In fact, throughout 2018 these models were used to develop new software to fully-automate the alignment of all

18