Towards an automated data cleaning with deep learning in CRESST

G. Angloher¹,
S. Banik^2,3,
D. Bartolot²,
G. Benato⁴,
A. Bento^1,9,
A. Bertolini¹,
R. Breier⁵,
C. Bucci⁴,
J. Burkhart²,
L. Canonica¹,
A. D’Addabbo⁴,
S. Di Lorenzo⁴,
L. Einfalt^2,3,
A. Erb^6,10,
F. v. Feilitzsch⁶,
N. Ferreiro Iachellini¹,
S. Fichtinger²,
D. Fuchs¹,
A. Fuss^2,3,
A. Garai¹,
V. M. Ghete²,
S. Gerster⁷,
P. Gorla⁴,
P. V. Guillaumon⁴,
S. Gupta²,
D. Hauff¹,
M. Ješkovský⁵,
J. Jochum⁷,
M. Kaznacheeva⁶,
A. Kinast⁶,
H. Kluck²,
H. Kraus⁸,
M. Lackner¹,
A. Langenkämper^1,6,
M. Mancuso¹,
L. Marini^4,11,
L. Meyer⁷,
V. Mokina²,
A. Nilima¹,
M. Olmi⁴,
T. Ortmann⁶,
C. Pagliarone^4,12,
L. Pattavina^4,6,
F. Petricca¹,
W. Potzel⁶,
P. Povinec⁵,
F. Pröbst¹,
F. Pucci¹,
F. Reindl^2,3,
D. Rizvanovic²,
J. Rothe⁶,
K. Schäffner¹,
J. Schieck^2,3,
D. Schmiedmayer^2,3,
S. Schönert⁶,
C. Schwertner^2,3,
M. Stahlberg¹,
L. Stodolsky¹,
C. Strandhagen⁷,
R. Strauss⁶,
I. Usherov⁷,
F. Wagner ORCID: orcid.org/0000-0001-5687-6392²,
M. Willers⁶,
V. Zema¹,
W. Waltenberger² &
CRESST Collaboration

1671 Accesses
4 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

The CRESST experiment employs cryogenic calorimeters for the sensitive measurement of nuclear recoils induced by dark matter particles. The recorded signals need to undergo a careful cleaning process to avoid wrongly reconstructed recoil energies caused by pile-up and read-out artefacts. We frame this process as a time series classification task and propose to automate it with neural networks. With a data set of over one million labeled records from 68 detectors, recorded between 2013 and 2019 by CRESST, we test the capability of four commonly used neural network architectures to learn the data cleaning task. Our best performing model achieves a balanced accuracy of 0.932 on our test set. We show on an exemplary detector that about half of the wrongly predicted events are in fact wrongly labeled events, and a large share of the remaining ones have a context-dependent ground truth. We furthermore evaluate the recall and selectivity of our classifiers with simulated data. The results confirm that the trained classifiers are well suited for the data cleaning task.

Detector Monitoring with Artificial Neural Networks at the CMS Experiment at the CERN Large Hadron Collider

Article 03 January 2019

Use of neural networks to analyze pulse shape data in low-background detectors

Article 07 July 2018

Autoencoder-Based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter

Article Open access 24 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Dark Matter (DM) particles are hypothetical particles beyond the Standard Model of particle physics (SM) and thought to make up $(83.9 \pm 1.5 )$% of all matter in our universe [1]. The experimental search for particle DM inspired many experiments in the past decades. The Cryogenic Rare Event Search with Superconducting Thermometers (CRESST), located in the Laboratori Nazionali del Gran Sasso, is a direct DM search and uses scintillating, cryogenic calorimeters as targets. This technology achieves a) low nuclear recoil thresholds, with a currently lowest reported value of 10 eV [2]; b) currently the strongest exclusion limits on spin-independent (spin-dependent) DM-SM interactions for DM masses in the range 0.16-1.5 (0.25-1.5) GeV/c$^2$, under standard assumptions [3, 4]. However, the sensitivity of the detectors and readout electronics cause not only particle recoils to trigger, but also a variety of artefacts: spikes, drifts, jumps and glitches in the noise baseline (BL) from the readout and heater electronics, and piled up pulse shapes (PSs) from multiple particle recoils in close temporal proximity. The recorded signals therefore need a cleaning step, before a meaningful data analysis can be started.

The standard approach is based on the calculation of PS and BL features, as e.g. the pulse height (PH), rise and decay time and BL slope. The cleaning is then carried out by an analyst, who defines individual rejection regions (cuts) in the space of the calculated features for each detector. Automating this process brings two benefits: a) detector setups with a large number of simultaneously operated detectors require less human effort to clean and analyse the recorded data and b) it helps preventing biases from individual decisions of the analyst made in the manual intervention.

In each detector a particle recoil produces a characteristic PS, determined by the thermal properties of the target and sensor. Artefacts usually deviate from this characteristic PS (see Fig. 1). One common multi-purpose data cleaning method is to fit the numerical array of the characteristic PS to each record and reject all those, whose fit error exceed a certain value. This method requires a relatively high computational cost for each record, either the prior knowledge of the detector-specific PS or dedicated training data from the same detector to build a template, and manual interventions in tuning the cut values. Furthermore, the discrimination power of cuts is often limited by overlaps of the artefact and target recoil feature distributions, where only correlated cuts on multiple features could achieve an optimal cleaning of the data.

In our work, we create a data set with samples from measurements that were done in CRESST-II and CRESST-III. We clean the data for each detector individually, and we label each record as accepted (positive) or rejected (negative). The data cleaning task is now equivalent to a binary, supervised time series classification problem, i.e. learning to discriminate between positive and negative records. We approach this task with deep neural networks, with which promising results were obtained for time series classification tasks [5].

Table 1 The sizes of the used data sets

Full size table

Similar supervised deep learning approaches were successful in discriminating between individual PSs, originating from different types of particle recoils [6,7,8], or recoils in different positions and components of the detector [9,10,11,12]. The discrimination and reconstruction of pile-up artifacts was studied in Refs. [13, 14]. The task of general data cleaning for cryogenic calorimeters with autoencoders (AEs) and variational AEs (VAEs) was studied in Refs. [12, 15], and with a Principal Component Analysis (PCA) in Ref. [16]. Differently to those approaches, we do not rely on previous knowledge about the detector to which our method is applied, or the tuning of a cut value. We explore the synergies with the PCA method from Ref. [16] in Sect. 4.3, by combining our approaches. A method for the supervised discrimination between pulses and artifacts with neural networks was proposed in Ref. [17], and shown on purely simulated data. We train and verify with both measured and simulated data, which adds necessary reliability to the results.

Our work was organized as follows: first, we create a large-statistics data set of labeled events. The procedure is described in Sect. 2. Second, we show that neural networks can learn the data cleaning task. The performance of our chosen models on the training set is presented in Sect. 3. Finally, we bridge the gap between triggering and the parts of the event selection which require detector-specific knowledge or tuning. In Sect. 4, we apply the trained models to a test set or recorded data and to simulated positive and negative events.

2 Used data and pre-processing

The CRESST detector concept is based on a multi-channel read out. Within a joint housing (a detector module), a phonon and a light detector measure the produced phonon population and scintillation light from a particle recoil inside a scintillating crystal (the target). The detector response is measured with a Transition Edge Sensor (TES), operated in a read-out circuit with a SQUID amplifier. For a detailed description of TES we refer to Refs. [18, 19]. In some detector modules, additional TES serve to veto energy depositions in the housing or holding structure. In the context of this work, we treat all of them as individual detectors. We create our data set by extracting samples of several tens of hours measurement time from all measurements that were done between 2013 and 2019 in the CRESST setup at LNGS, for a total of 68 detectors and more than one million records. We exclusively use data from the $e^{-}/\gamma $ and neutron-calibration runs. These are dedicated measurement campaigns before or after the data for dark matter searches was taken. During the $e^{-}/\gamma $ calibration, a Co-57 source is used to produce a characteristic calibration line in the energy spectrum of the detector. During the neutron calibration, an AmBe source is used to produce a strong neutron flux and calibrate the response of the detectors for particles with no electromagnetic charge. By only using calibration data in this work we additionally prevent biasing effects for a potential application of the trained models to physics searches in future work. We chose seven detectors as a test set, these make up three detector modules: two times a target and one veto detector in a joint housing, and once a target and two veto detectors. These detectors were used only to evaluate the selection of the trained models, reported in 4. For the optimization of hyperparameters of the classifiers and fit process, we split five percent of the data from the remaining 61 detectors into an individual validation set. The total sizes of the data sets are summarized in Table 1. The data preparation and cuts are done with the Python package ‘Cait’ [17].

We want our classifier to successfully perform these operations:

reject all jump, drift, spike, glitch and pile-up artifacts that deviate significantly from a recoil-type PS,
reject all PSs that rise far away from the trigger position at 1/4 of the record window or do not decay within the window,
let all PSs survive that fit the above criteria, not only those from target recoils, and also if they show saturation effects typical for high energy recoils,
let empty noise traces survive if their slope is within the typical slope of noise traces for the corresponding detector.

We apply to the triggered and noise events cuts on the PS parameters, to create the desired positive and negative labels for the classifier training. The used PS parameters include the PH, onset, rise and decay times of pulses, the difference between the offset levels on the left and right side of the record, the maximal and minimal derivatives and their positions within the record, and the mean, variance and skewness of the values in the record. We define rejection regions with the rectangle and lasso selection tools of Cait’s VizTool, as described in Ref. [17].

However, the applied cuts are imperfect, and generally tend to reject more pulses than necessary, which introduces a small share of wrongly labeled records in our data set. Their impact on the trained classifiers and reported performance metrics is studied in Sect. 4.1. The training set consists to 83.6% of positive records. We counter this imbalance with data augmentation, explained later in this section, and a weighted loss function, explained in Sect. 3.

We applied several transformations to the records as a preprocessing. The recordings were initially made with a sampling frequency of 25 kHz and record window lengths of 8192 or 16384 samples, depending on the measurement campaign (run). First, we downsampled the records to the length of 512 samples. The values in the downsampled time series correspond therefore to a time interval of 0.64 or 1.28 ms, while those in the original records corresponded to 40 mus. This time resolution and record length is sufficient to distinguish pulses from artefacts, and the step significantly reduces the necessary computing power for optimization. In a second step, we scale the values within all records individually, such that every record’s minimal value is 0 and maximal value is 1. The information of the amplitude of individual records is, without further information about the corresponding detector, arbitrary and all useful information is contained in the shape of the time series.

The classifier optimization is an iterative process, for which the training set is split in so-called mini-batches of 64 records each. Three potential biases were identified in our data set, and their impact was mitigated by data augmentation methods: (a) while we have an over-density of positive records, we have only a relatively small amount of pile-up artefacts. However, they are the class of artifacts that is most difficult to clean from the data with standard cuts. We therefore randomly pick several of the positive records in each mini-batch and superpose them with time-shifted copies of themselves, to artificially create pile-up events. The shifts are chosen such that the superposed pulse appears at a random position within the record window. We then changed their label from positive to negative. (b) The data set contains many records with relatively high Signal-to-Noise Ratio (SNR), because most recorded pulses originate from calibration sources with typical energy depositions far above the detection threshold. A common method to robustify neural networks against small numerical deviations is to randomly add Gaussian noise to the inputs. We apply that method and by that artificially create records with lower SNR. (c) The record window is built such that the trigger time is at 1/4 of its length. While large deviations in the pulse onset from this position are interpreted as artefacts and rejected, pulses with small deviations, of the order of milliseconds, should still be accepted. To implement this objective in our data set, we randomly shift all records by up to 26 samples (33.28 or 16.64 ms). The augmentations are applied to a record when it is drawn from the training set. Augmentation (c) is always applied, but (a) and (b) only with a probability of 20%.

A mini-batch of 64 records from the training set is shown in Fig. 2.

3 Models and training process

A neural network classifier is a function $f_{w}$, which is parameterized with so-called weights $w \in \mathbb {R}^M$, that maps an element x from a data set $\mathcal {D} \subset \mathbb {R}^N$ to a prediction $\hat{y} \in (0,1)$. A lower (higher) value indicates a higher belief that x corresponds to a negative (positive) label $y \in \{0,1\}$. The number N corresponds to the dimensionality of the data, namely the number of samples in the record (N = 512). The number M describes the number of weights in the neural network, and depends on the chosen model. Compared to other commonly used fit models, e.g. polynomials or splines, which typically have maximally several tens of parameters, neural networks have from thousands up to billions of parameters. In the limit $M \rightarrow \infty $, neural networks are proven to be universal function approximators [20]. Furthermore, while the computational cost of many other function approximators increases exponentially with their number of parameters, neural networks do not suffer from this curse of dimensionality. These properties make them useful for the classification of high dimensional and strongly correlated data, which is the case for our raw sensor signals. For a high-quality pedagogical introduction to neural networks and machine learning we recommend Ref. [21].

For this work, we used four different models:

A small Convolutional Neural Network (CNN). This model applies a set of filters to the input, to extract meaningful features. It is highly efficient for spacially correlated data, e.g. for images. Our CNN applies successively 3 layers of filters, and extracts 64 feature channels. The filter kernels are weights, i.e. they are optimized jointly with the classifier parameters.
A larger convolutional model, which we will call Time Series Convolutional Network (TSCN). The performance of convolutional neural networks often depends on their number of weights, i.e. their size. To compare the capabilities of both a small and a larger model, we include this network with 8 layers of filters and 128 feature channels.
A bidirectional Long Short Term Memory (LSTM) network [22]. This model accumulates the information from a sequence of inputs into two internal states, one optimized to recognize long term dependencies, and one to recognize short term dependencies. It is widely used for sequential data, e.g. for natural language processing and time series. We use two bidirectional layers, and internal states with the dimensionality 200.
A for Time Series classification optimized Transformer (TST) [23]. Transformers are attention-based models, that recently have shown great performance on multiple types of data, among them time series applications [24]. They are very efficient in recognizing dependencies between different parts of sequences and the new state-of-the-art large language models [25]. We use the implementation of Ref. [26].

The specifications and the hyperparameters of the models used in this work are given in in App. A. All implementations are built in PyTorch [27].

The weights of neural networks are iteratively optimized (learned) by minimizing an error (loss) function between its predictions and the labels. This process is called the ‘training’ of the model. We minimize the binary cross entropy loss function, a quantification of the prediction error commonly used for classifier training ([21], p. 49). We weight the loss resulting from positive and negative records individually to counteract the preponderance of positive records. This way, both groups have the same impact in the optimization process. A sweep through the whole training data set, i.e. one iteration through all mini-batches, is called an epoch. We train all models for 15 epochs. We randomize the mini-batches with a technique called weak shuffling: individual mini-batches consist of records from the same detector, while the order in which mini-batches are used in the training process is randomized. This technique significantly reduces the data loading time during training compared to randomization within each mini-batch, as the records are stored consecutively on our hard drive. The optimization process was done on a Tesla P100 GPU and with the ADAptive Moment estimation (ADAM) optimizer [28], a variant of stochastic gradient descent. The optimizer does one update of the weights (one optimizer step) for each mini-batch. The magnitude of weight updates in each optimizer step is steered by the learning rate. We optimized the learning rates for all models individually with the cyclic learning rate finder technique, described in Ref. [29]. The learning rates used for each model are listed in Table 7.

After each training epoch, we evaluate the loss value on our validation set. The loss values on the training and validation set for all models throughout the training process are shown in Fig. 3. All curves resemble the typical shape of neural network training curves, and in three out of four models the loss on the validation set continues to decrease proportionally to the loss on the training set. Only for the TST model, the validation loss rises slightly, which indicates overfitting on the training data. We experimented with the models hyperparameters to improve the overfitting, but other configurations lead to overall worse results on the validation set. We saved the trained model after each epoch, and can therefore use those which had the best agreement between labels and predictions (accuracy) on the validation set. The TSCN model shows the lowest loss value on the validation set after the training process. We do not apply the data augmentations (see Sect. 2) on the validation data, therefore, the validation loss is generally lower than the training loss.

4 Results

We run multiple tests with our trained models: we evaluate the balanced accuracy, recall and precision on the test set, the recall (selectivity) on simulated particle recoil (pile-up) events and we visualize the data manifold before and after application of our model with a PCA. Additionally, we compare a PH spectrum cleaned with classical cuts to one cleaned with our models. The metrics used in this section are defined as follows:

$$\begin{aligned} \begin{array}{llll} \text {Recall } &{}R {:}{=}TP/T, &{}\text {Selectivity } &{}S {:}{=}TN/N, \\ \text {Balanced Accuracy } &{}BA {:}{=}(R + S)/2, &{}\text {Precision } &{}P {:}{=}TP/(TP + FP), \\ \text {Integral Over Recall } &{}IOR {:}{=}\int _\varOmega R(\mu ) d\mu , &{}\text {Integral Over Selectivity } &{}IOS {:}{=}\int _\varOmega S(\mu ) d\mu , \\ \end{array} \end{aligned}$$

where T are positive records, N are negative records, TP are true positive predictions and TN are true negative predictions. $R(\mu )$ and $S(\mu )$ are the recall and selectivity as functions of PS features. $\varOmega $ is the region over which we integrate the recall or selectivity and $\mu $ is a placeholder for one or multiple PS features (e.g. the PH) w.r.t. which we perform the integration.

4.1 Evaluation on the test set

Table 2 The metrics of the trained models, evaluated on the test set and simulated data. (col. 2-4) The balanced accuracy, recall and precision on the test set with a cutoff value of 0.5. (col. 5, 6) The IOR score for the simulated positive particle recoils, and the IOS score for the simulated negative pile-up events. The values are defined in the text. (col. 7) The runtime for predicting the whole test set. In brackets is the runtime divided by the lowest runtime

Full size table

We apply all models to the test set. To use the models as classifiers, a cutoff on the output value (the belief) has to be defined, below which we reject the record. We evaluate the balanced accuracy, the precision and the recall w.r.t. the cutoff value. They are visualized in Fig. 4. For specific applications, the cutoff value can be tuned to fulfill specific demands to recall or precision. In this work, we choose a generic cutoff value of 0.5 for all classifiers.

The recall and precision are naturally in a opposing relationship, explaining the typical kink in the curve shape. The LSTM and TSCN models give the best performance across all metrics. The metrics for the cutoff value 0.5 are reported in Table 2. The predictions on the test set were done in batches as well, with a batch size of 32. The total run times for the predictions on the test set are reported in Table 2. There are typically several thousand predictions done per second, which is due to the low computational cost for inference with neural networks, and the strong parallelization of the necessary matrix multiplications on the GPU.

We take a closer look to the wrongly predicted records. For this, we use the predictions of the LSTM network, and pick one of the detectors from the test set. The detector has 70 records that were wrongly predicted, out of 8422 records in total. We show the first 64 of the wrongly predicted records in Fig. 5. After visual inspection, we discover that 39 of them are wrongly labeled, mostly good pulses that were collaterally rejected by the imperfect quality cuts on the main parameters. Among the wrongly labeled records there are also noise traces for which the decision if they should survive quality cuts or not depends on the distribution of noise traces in the individual detector. In a full analysis of a detector, these events would most likely not surpass the trigger threshold and are therefore irrelevant.

4.2 Evaluation on simulated data

The evaluation on data labeled by cuts has the drawback of wrong labels, we therefore do a second evaluation on simulated data. For this, we superpose the typical pulse shape of one detector from our test set onto empty noise traces that were generated according to Ref. [30]. We simulate a data set of 50000 positive particle recoil records and negative pile-up records each, with PHs between zero and three hundred SNR, i.e. the ratio of the PH of the event and the BL noise resolution. For the pile-up events the onset of the first pulse is placed to 1/4 of the window, the second is placed randomly within the window. We evaluate the recall on the positive records w.r.t. the SNR. The result is visualized in Fig. 6 (left). We calculate the IOR for the SNR-range from three to three hundred. Furthermore, we evaluate the selectivity on the pile-up events (Fig. 6, right), w.r.t the difference in onset and relative difference in PH. We calculate the IOS over the whole plotted range. Both IOR and IOS values are reported for all models in Table 2. The LSTM and TSCN score best and equally good in IOS, while the TST scores best in IOR. Later result is however due to the fact, that the TST defaults to high output values, which is also visible in Fig. 4 (left). The convolutional models CNN and TSCN feature a significantly lower runtime as the LSTM and TST.

4.3 Pulse height spectrum and data manifold

As a last evaluation of our model, we choose a 3-channel detector module from the test set. For this detector, the target is a cylindrical calcium tungstate crystal. The first channel of the module is a TES placed on the target, collecting the thermal and athermal phonons produced in particle recoils. The second channel is a TES placed on a shielding ring structure around the first TES, which is directly connected to both the target and the light detector. The third channel is a TES on the light detector, a beaker-shaped object that fully surrounds the target.

We apply our LSTM model to all channels of the detector module, and call the events clean that correspond to positive prediction for all three channels. In a technique called manifold visualization we want to visualize the distribution of the data in the high dimensional vector space of their sample values. To include the information of all three channels, we concatenate the time series of the records that correspond to the same event in different channels to vectors of length 1536, and form by that a combined data matrix. We do this for all data, and only the cleaned data, for which we use a PCA. The PCA is a singular value decomposition, i.e. the calculation of eigenvectors and eigenvalues, of the covariance matrix of the combined data matrix. The first (second) eigenvector of this matrix, called the first (second) principal component (PC), is itself a time series of length 1536 that corresponds to the template that accounts for the highest (second highest) variance of values in the concatenated records. For a more detailed description of the PCA method, we refer to Ref. [16], where it was first applied for pulse shape identification in cryogenic calorimeters. We plot the projection of the full raw data to its first and second PCs in Fig. 7 (left) and the same for only the cleaned data in Fig. 7 (right). The projection is calculated by building the dot product of the concatenated records and the PC. This corresponds to a change of basis in the high dimensional vector space of the sample values, with the first and second PCs as the new basis vectors. While the variance in the raw data is dominated by steps in the record windows induced by flux quantum losses of the SQUID amplifier (example in Fig. 1, orange), the cleaned data manifold resembles the actual particle recoil types present in the data: target hits, hits in the ring veto detector, and direct hits of the light detector. For the target and veto detector hits electron and nuclear recoils are also separately visible in the plot, due to the different amplitude in the light channel. The identification of the lines of individual event types in Fig. 7 (right) was done by comparing the equivalent plot for $e^-/\gamma $ and neutron calibration, and visual inspection of events and principal components individually. The first (second) principal component strongly resembles the PS of phonon-only (light-only) events. Thus, the line structure we observe in Fig. 7 (right) is in agreement with our expectation. Direct light detector hits, with no phonon signal, cluster along the second PC. Target hits cluster mostly along the first PC, but they differ in the amount of produced scintillation light. Electron hits produce significantly more scintillation light, and ring hits produce an additional signal in the light detector due to a weak thermal link between them.

Additionally, we show the PH spectrum of the target channel of the same detector module without any cuts, with the analysis cuts that we used as labels, and cutted with the LSTM model, in Fig. 8. The PH is for low energy particle recoils proportional to the recoil energy. We observe a strong agreement of the LSTM model and the analysis cuts over the full width of the spectrum. The uncleaned data is strongly polluted with artifacts.

To summarize, we have shown in Sect. 4, that we reliably discriminate events originating from particle recoils and artefacts. Differently to standard approaches, we do not rely on prior knowledge of individual detectors, as its characteristic pulse shape, or manual interventions, as finding individual cut values. We have shown in Sect. 4.3, that our trained models reproduce the objectives of the cuts that were made by analysts. The event selection, and with that the physics reach of the experiment, does therefore not change significantly whether our model is applied or the selection is done per hand. This is the outcome we had hoped for and our results can have the following impacts:

1.
For large-scale detector setups the fine tuning of dedicated cuts for each detector can produce non-negligible delays in the analysis or even become infeasible. The application of our models instead can produce equivalent cuts instantly.
2.
Recorded data can be monitored in real time. This can uncover unwanted shifts of the measurement setup, as e.g. an increased rate of artifacts, immediately and enable fast interventions. Furthermore, features in the event distribution, e.g. a peak in the PH spectrum, can be identified as particle-like or artefact-like, without the need for designing cuts first.

5 Conclusion and outlook

We trained a selection of deep learning models to perform the data cleaning task for cryogenic detectors. We observed equally promising performance from a convolutional neural network, TSCN, and an LSTM neural network. The best achieved balanced accuracy score on the test set is 0.932, with a recall of 0.986 and precision of 0.985. Notably, the majority of the wrongly predicted records by the best performing model are either wrongly labeled records or hard-to-decide records. The method therefore apparently outperforms the cuts which we performed on the data, in the task of discriminating pulse-shaped from other records. For practical applications, our method has another advantage: the runtime is with O(1k-10k) predictions per second significantly faster than standard fit methods that are a typical alternative for the same purpose. Note, that our method can be applied to a new detector blindly, i.e. without relying on any information from the detector itself, as typical PSs. We therefore reduce the necessary manual task in data cleaning from handcrafting cut values for each detector individually, to merely controlling the predictions of the model.

In future work, our method can be extended with an appropriate, unsupervised clustering of the different PSs. A promising approach was presented in Ref. [15]. Furthermore, an investigation of the poor Transformer performance could be started. Likely the extraction of proper representations of the time series, as done for raw audio data in the popular wave2vec 2.0 framework [31], would improve its performance.

Data Availibility Statement

This manuscript has no associated data or the data will not be deposited. [Authors’ comment: There are no associated data available.]

References

N. Planck Collaboration, Y. Aghanim, Akrami et al., Planck 2018 results - vi. cosmological parameters. (2020) https://doi.org/10.1051/0004-6361/201833910
G. Angloher, S. Banik, G. Benato et al., Latest observations on the low energy excess in CRESST-III,” (2022). arXiv:2207.09375
A.H. Abdelhameed, G. Angloher, P. Bauer et al., First results from the CRESST-III low-mass dark matter program. Phys. Rev. D 100, 102002 (2019). https://doi.org/10.1103/PhysRevD.100.102002
Article ADS Google Scholar
G. Angloher, S. Banik, G. Benato et al., Testing spin-dependent dark matter interactions with lithium aluminate targets in CRESST-III, (2022). arXiv:2207.07640
H. Ismail Fawaz, G. Forestier, J. Weber et al., Deep learning for time series classification: a review. Data Mining and Knowle. Dis. 33, 917–963 (2019). https://doi.org/10.1007/s10618-019-00619-1
Article MATH Google Scholar
P. Holl, L. Hauertmann, B. Majorovits et al., Deep learning based pulse shape discrimination for germanium detectors. Eur. Phys. J. C 79(6), 450 (2019). https://doi.org/10.1140/epjc/s10052-019-6869-2
Article ADS Google Scholar
C.K. Khosa, L. Mars, J. Richards, V. Sanz, Convolutional neural networks for direct detection of dark matter. J. Phys. G: Nucl. Part. Phys. 47(9), 095201 (2020). https://doi.org/10.1088/1361-6471/ab8e94
Article ADS Google Scholar
A. Abdulaziz, J. Zhou, A. Di Fulvio et al., Semi-supervised gaussian mixture variational autoencoder for pulse shape discrimination. In: ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3538–3542 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747313
A.J. Zöller, Artificial neural network based pulse-shape analysis for cryogenic detectors operated in CRESST-II. Dissertation, Technische Universität München, München, (2016). http://mediatum.ub.tum.de/?id=1303343
S. Delaquis, M. Jewell, I. Ostrovskiy et al., Deep neural networks for energy and position reconstruction in EXO-200. J. Instrum. 13(08), P08023–P08023 (2018). https://doi.org/10.1088/1748-0221/13/08/p08023
Article Google Scholar
C.Mühlmann, Pulse-shape dicrimination with deep learning in CRESST, (2019). http://hdl.handle.net/20.500.12708/14865
F. Wagner, Machine learning methods for the raw data analysis of cryogenic dark matter experiments, (2020). https://doi.org/10.34726/hss.2020.77322
G. Fantini, A. Armatol, E. Armengaud et al., Machine learning techniques for pile-up rejection in cryogenic calorimeters. J. Low Temp. Phys. (2022). https://doi.org/10.1007/s10909-022-02741-9
Article Google Scholar
F. Wagner, Nonlinear pile-up separation with lstm neural networks for cryogenic particle detectors, (2021). arXiv:2112.06792
Y. Ichinohe, S. Yamada, R. Hayakawa et al., Application of deep learning to the evaluation of goodness in the waveform processing of transition-edge sensor calorimeters. J. Low Temp. Phys. (2022). https://doi.org/10.1007/s10909-022-02719-7
Article Google Scholar
R. Huang, E. Armengaud, C. Augier et al., Pulse shape discrimination in CUPID-mo using principal component analysis. J. Instrum. 16(03), P03032 (2021). https://doi.org/10.1088/1748-0221/16/03/p03032
Article Google Scholar
F. Wagner, D. Bartolot, D. Rizvanovic, et al., Cait: analysis toolkit for cryogenic particle detectors in python, (2022). arXiv:2207.02187
W. Seidel, G. Forster, W. Christen, et al., Phase transition thermometers with high temperature resolution for calorimetric particle detectors employing dielectric absorbers, https://doi.org/10.1016/0370-2693(90)90388-MPhysics Letters B236 no. 4, 483–487 (1990). https://www.sciencedirect.com/science/article/pii/037026939090388M
F. Pröbst, M. Frank, S. Cooper et al., Model for cryogenic particle detectors with superconducting phase transition thermometers. J. Low Temp. Phys. 100(1), 69–104 (1995). https://doi.org/10.1007/BF00753837
Article ADS Google Scholar
K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, https://doi.org/10.1016/0893-6080(89)90020-8Neural Networks2 no. 5, 359–366 (1989). https://www.sciencedirect.com/science/article/pii/0893608089900208
, A high-bias, low-variance introduction to machine learning for physicists, https://doi.org/10.1016/j.physrep.2019.03.001Physics Reports810 1–124 (2019). https://www.sciencedirect.com/science/article/pii/S0370157319300766. A high-bias, low-variance introduction to Machine Learning for physicists
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput.9 no. 8, 1735–1780 (Nov., 1997). https://doi.org/10.1162/neco.1997.9.8.1735._eprint: https://direct.mit.edu/neco/article-pdf/9/8/1735/813796/neco.1997.9.8.1735.pdf
A. Vaswani, N. Shazeer, N. Parmar, et al., Attention is all you need, In: advances in neural information processing systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., eds., vol. 30. Curran Associates, Inc., (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
G. Zerveas, S. Jayaraman, D. Patel, et al., A transformer-based framework for multivariate time series representation learning, (2020). arXiv:2010.02803
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, (2018). arXiv:1810.04805
G. Zerveas, Multivariate time series transformer framework. https://github.com/gzerveas/mvts_transformer, (2021)
A. Paszke, S. Gross, F. Massa, et al., Pytorch: An imperative style, high-performance deep learning library, In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, (2014). arXiv:1412.6980
L. N. Smith, Cyclical learning rates for training neural networks, (2015). arXiv:1506.01186
M. Carrettoni, O. Cremonesi, Generation of noise time series with arbitrary power spectrum. Communications 181(12), 1982–1985 (2010). https://doi.org/10.1016/j.cpc.2010.09.003ComputerPhysics https://www.sciencedirect.com/science/article/pii/S0010465510003486
A. Baevski, H. Zhou, A. Mohamed, M. Auli, Wav2vec 2.0: A framework for self-supervised learning of speech representations, In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc., Red Hook, NY, USA, (2020)

Download references

Acknowledgements

We are grateful to LNGS for their generous support of CRESST. This work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC 2094 – 390783311 and through the Sonderforschungsbereich (Collaborative Research Center) SFB1258 ‘Neutrinos and Dark Matter in Astro- and Particle Physics’, by the BMBF 05A20WO1 and 05A20VTA and by the Austrian Science Fund (FWF): I5420-N, W1252-N27 and FG1 and by the Austrian research promotion agency (FFG), project ML4CPD. The Bratislava group acknowledges a partial support provided by the Slovak Research and Development Agency (project APVV-15-0576). The computational results presented were obtained using the Vienna CLIP cluster.

Funding

Open access funding provided by Austrian Science Fund (FWF).

Author information

Authors and Affiliations

Max-Planck-Institut für Physik, D-80805, München, Germany
G. Angloher, A. Bento, A. Bertolini, L. Canonica, N. Ferreiro Iachellini, D. Fuchs, A. Garai, D. Hauff, M. Lackner, A. Langenkämper, M. Mancuso, A. Nilima, F. Petricca, F. Pröbst, F. Pucci, K. Schäffner, M. Stahlberg, L. Stodolsky & V. Zema
Institut für Hochenergiephysik der Österreichischen Akademie der Wissenschaften, A-1050, Wien, Austria
S. Banik, D. Bartolot, J. Burkhart, L. Einfalt, S. Fichtinger, A. Fuss, V. M. Ghete, S. Gupta, H. Kluck, V. Mokina, F. Reindl, D. Rizvanovic, J. Schieck, D. Schmiedmayer, C. Schwertner, F. Wagner & W. Waltenberger
Atominstitut, Technische Universität Wien, A-1020, Wien, Austria
S. Banik, L. Einfalt, A. Fuss, F. Reindl, J. Schieck, D. Schmiedmayer & C. Schwertner
INFN, Laboratori Nazionali del Gran Sasso, I-67100, Assergi, Italy
G. Benato, C. Bucci, A. D’Addabbo, S. Di Lorenzo, P. Gorla, P. V. Guillaumon, L. Marini, M. Olmi, C. Pagliarone & L. Pattavina
Faculty of Mathematics, Physics and Informatics, Comenius University, 84248, Bratislava, Slovakia
R. Breier, M. Ješkovský & P. Povinec
Physik-Department, Technische Universität München, D-85747, Garching, Germany
A. Erb, F. v. Feilitzsch, M. Kaznacheeva, A. Kinast, A. Langenkämper, T. Ortmann, L. Pattavina, W. Potzel, J. Rothe, S. Schönert, R. Strauss & M. Willers
Eberhard-Karls-Universität Tübingen, D-72076, Tübingen, Germany
S. Gerster, J. Jochum, L. Meyer, C. Strandhagen & I. Usherov
Department of Physics, University of Oxford, Oxford, OX1 3RH, UK
H. Kraus
LIBPhys-UC, Departamento de Fisica, Universidade de Coimbra, P3004 516, Coimbra, Portugal
A. Bento
Walther-Meißner-Institut für Tieftemperaturforschung, D-85748, Garching, Germany
A. Erb
GSSI-Gran Sasso Science Institute, I-67100, L’Aquila, Italy
L. Marini
Dipartimento di Ingegneria Civile e Meccanica, Universitá degli Studi di Cassino e del Lazio Meridionale, I-03043, Cassino, Italy
C. Pagliarone

Authors

G. Angloher
View author publications
You can also search for this author in PubMed Google Scholar
S. Banik
View author publications
You can also search for this author in PubMed Google Scholar
D. Bartolot
View author publications
You can also search for this author in PubMed Google Scholar
G. Benato
View author publications
You can also search for this author in PubMed Google Scholar
A. Bento
View author publications
You can also search for this author in PubMed Google Scholar
A. Bertolini
View author publications
You can also search for this author in PubMed Google Scholar
R. Breier
View author publications
You can also search for this author in PubMed Google Scholar
C. Bucci
View author publications
You can also search for this author in PubMed Google Scholar
J. Burkhart
View author publications
You can also search for this author in PubMed Google Scholar
L. Canonica
View author publications
You can also search for this author in PubMed Google Scholar
A. D’Addabbo
View author publications
You can also search for this author in PubMed Google Scholar
S. Di Lorenzo
View author publications
You can also search for this author in PubMed Google Scholar
L. Einfalt
View author publications
You can also search for this author in PubMed Google Scholar
A. Erb
View author publications
You can also search for this author in PubMed Google Scholar
F. v. Feilitzsch
View author publications
You can also search for this author in PubMed Google Scholar
N. Ferreiro Iachellini
View author publications
You can also search for this author in PubMed Google Scholar
S. Fichtinger
View author publications
You can also search for this author in PubMed Google Scholar
D. Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
A. Fuss
View author publications
You can also search for this author in PubMed Google Scholar
A. Garai
View author publications
You can also search for this author in PubMed Google Scholar
V. M. Ghete
View author publications
You can also search for this author in PubMed Google Scholar
S. Gerster
View author publications
You can also search for this author in PubMed Google Scholar
P. Gorla
View author publications
You can also search for this author in PubMed Google Scholar
P. V. Guillaumon
View author publications
You can also search for this author in PubMed Google Scholar
S. Gupta
View author publications
You can also search for this author in PubMed Google Scholar
D. Hauff
View author publications
You can also search for this author in PubMed Google Scholar
M. Ješkovský
View author publications
You can also search for this author in PubMed Google Scholar
J. Jochum
View author publications
You can also search for this author in PubMed Google Scholar
M. Kaznacheeva
View author publications
You can also search for this author in PubMed Google Scholar
A. Kinast
View author publications
You can also search for this author in PubMed Google Scholar
H. Kluck
View author publications
You can also search for this author in PubMed Google Scholar
H. Kraus
View author publications
You can also search for this author in PubMed Google Scholar
M. Lackner
View author publications
You can also search for this author in PubMed Google Scholar
A. Langenkämper
View author publications
You can also search for this author in PubMed Google Scholar
M. Mancuso
View author publications
You can also search for this author in PubMed Google Scholar
L. Marini
View author publications
You can also search for this author in PubMed Google Scholar
L. Meyer
View author publications
You can also search for this author in PubMed Google Scholar
V. Mokina
View author publications
You can also search for this author in PubMed Google Scholar
A. Nilima
View author publications
You can also search for this author in PubMed Google Scholar
M. Olmi
View author publications
You can also search for this author in PubMed Google Scholar
T. Ortmann
View author publications
You can also search for this author in PubMed Google Scholar
C. Pagliarone
View author publications
You can also search for this author in PubMed Google Scholar
L. Pattavina
View author publications
You can also search for this author in PubMed Google Scholar
F. Petricca
View author publications
You can also search for this author in PubMed Google Scholar
W. Potzel
View author publications
You can also search for this author in PubMed Google Scholar
P. Povinec
View author publications
You can also search for this author in PubMed Google Scholar
F. Pröbst
View author publications
You can also search for this author in PubMed Google Scholar
F. Pucci
View author publications
You can also search for this author in PubMed Google Scholar
F. Reindl
View author publications
You can also search for this author in PubMed Google Scholar
D. Rizvanovic
View author publications
You can also search for this author in PubMed Google Scholar
J. Rothe
View author publications
You can also search for this author in PubMed Google Scholar
K. Schäffner
View author publications
You can also search for this author in PubMed Google Scholar
J. Schieck
View author publications
You can also search for this author in PubMed Google Scholar
D. Schmiedmayer
View author publications
You can also search for this author in PubMed Google Scholar
S. Schönert
View author publications
You can also search for this author in PubMed Google Scholar
C. Schwertner
View author publications
You can also search for this author in PubMed Google Scholar
M. Stahlberg
View author publications
You can also search for this author in PubMed Google Scholar
L. Stodolsky
View author publications
You can also search for this author in PubMed Google Scholar
C. Strandhagen
View author publications
You can also search for this author in PubMed Google Scholar
R. Strauss
View author publications
You can also search for this author in PubMed Google Scholar
I. Usherov
View author publications
You can also search for this author in PubMed Google Scholar
F. Wagner
View author publications
You can also search for this author in PubMed Google Scholar
M. Willers
View author publications
You can also search for this author in PubMed Google Scholar
V. Zema
View author publications
You can also search for this author in PubMed Google Scholar
W. Waltenberger
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

CRESST Collaboration

Corresponding author

Correspondence to F. Wagner.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Appendices

Appendix

A Model architectures and learning rates

The names of our tested architectures are chosen arbitrarily. The architecture of the CNN, LSTM and TSCN are is listed in Tables 3, 4 and 5 respectively. The hyperparameters for the TST are listed in Table 6, for a detailed list of the architecture we refer to Ref. [23] and [26]. The used learning rates are summarized in Table 7.

Table 3 The details of the CNN architecture

Full size table

Table 4 The details of the LSTM architecture

Full size table

Table 5 The details of the TSCN architecture

Full size table

Table 6 The hyperparameters of the TST architecture

Full size table

Table 7 The learning rates used for the training process of the models

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Angloher, G., Banik, S., Bartolot, D. et al. Towards an automated data cleaning with deep learning in CRESST. Eur. Phys. J. Plus 138, 100 (2023). https://doi.org/10.1140/epjp/s13360-023-03674-2

Download citation

Received: 01 November 2022
Accepted: 02 January 2023
Published: 30 January 2023
DOI: https://doi.org/10.1140/epjp/s13360-023-03674-2

Towards an automated data cleaning with deep learning in CRESST

Abstract

Similar content being viewed by others

Detector Monitoring with Artificial Neural Networks at the CMS Experiment at the CERN Large Hadron Collider

Use of neural networks to analyze pulse shape data in low-background detectors

Autoencoder-Based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter

1 Introduction

2 Used data and pre-processing

3 Models and training process