-
The large-scale structure around the Fornax-Eridanus Complex
Authors:
Maria Angela Raj,
Petra Awad,
Reynier F. Peletier,
Rory Smith,
Ulrike Kuchner,
Rien van de Weygaert,
Noam I. Libeskind,
Marco Canducci,
Peter Tino,
Kerstin Bunte
Abstract:
Our objectives are to map the filamentary network around the Fornax-Eridanus Complex and probe the influence of the local environment on galaxy morphology. We employ the novel machine-learning tool, 1-DREAM (1-Dimensional, Recovery, Extraction, and Analysis of Manifolds) to detect and model filaments around the Fornax cluster. We then use the morphology-density relation of galaxies to examine the…
▽ More
Our objectives are to map the filamentary network around the Fornax-Eridanus Complex and probe the influence of the local environment on galaxy morphology. We employ the novel machine-learning tool, 1-DREAM (1-Dimensional, Recovery, Extraction, and Analysis of Manifolds) to detect and model filaments around the Fornax cluster. We then use the morphology-density relation of galaxies to examine the variation in the galaxies' morphology with respect to their distance from the central axis of the detected filaments. We detect 27 filaments that vary in length and galaxy-number density around the Fornax-Eridanus Complex. These filaments showcase a variety of environments; some filaments encompass groups/clusters, while others are only inhabited by galaxies in pristine filamentary environments. We also reveal a well-known structure -- the Fornax Wall, that passes through the Dorado group, Fornax cluster, and Eridanus supergroup. Regarding the morphology of galaxies, we find that early-type galaxies (ETGs) populate high-density filaments and high-density regions of the Fornax Wall. Furthermore, the fraction of ETGs decreases as the distance to the filament spine increases. Of the total galaxy population in filaments, ~7% are ETGs and ~24% are late-type galaxies (LTGs) located in pristine environments of filaments, while ~27% are ETGs and ~42% are LTGs in groups/clusters within filaments. This study reveals the Cosmic Web around the Fornax Cluster and asserts that filamentary environments are heterogeneous in nature. When investigating the role of the environment on galaxy morphology, it is essential to consider both, the local number-density and a galaxy's proximity to the filament spine. Within this framework, we ascribe the observed morphological segregation in the Fornax Wall to pre-processing of galaxies within groups embedded in it.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Swarming in stellar streams: Unveiling the structure of the Jhelum stream with ant colony-inspired computation
Authors:
Petra Awad,
Marco Canducci,
Eduardo Balbinot,
Akshara Viswanathan,
Hanneke C. Woudenberg,
Orlin Koop,
Reynier Peletier,
Peter Tino,
Else Starkenburg,
Rory Smith,
Kerstin Bunte
Abstract:
The halo of the Milky Way galaxy hosts multiple dynamically coherent substructures known as stellar streams that are remnants of tidally disrupted systems such as globular clusters (GCs) and dwarf galaxies (DGs). A particular case is that of the Jhelum stream, which is known for its complex morphology. Using the available data from Gaia DR3, we extracted a region on the sky that contains Jhelum. W…
▽ More
The halo of the Milky Way galaxy hosts multiple dynamically coherent substructures known as stellar streams that are remnants of tidally disrupted systems such as globular clusters (GCs) and dwarf galaxies (DGs). A particular case is that of the Jhelum stream, which is known for its complex morphology. Using the available data from Gaia DR3, we extracted a region on the sky that contains Jhelum. We then applied the novel Locally Aligned Ant Technique (LAAT) on the position and proper motion space of stars belonging to the selected region to highlight the stars that are closely aligned with a local manifold in the data and the stars belonging to regions of high local density. We find that the overdensity representing the stream in proper motion space is composed of two components, and show the correspondence of these two signals to the previously reported narrow and broad spatial components of Jhelum. We made use of the radial velocity measurements provided by the $S^5$ survey to confirm, for the first time, a separation between the two components in radial velocity. We show that the narrow and broad components have velocity dispersions of $4.84^{+1.23}_{-0.79}$~km/s and $19.49^{+2.19}_{-1.84}$~km/s, and metallicity dispersions of $0.15^{+0.18}_{-0.10}$ and $0.34^{+0.13}_{-0.09}$, respectively. These measurements, and the difference in component widths, could be explained with a scenario where Jhelum is the remnant of a GC embedded within a DG that were accreted onto the Milky Way during their infall. Although the properties of Jhelum can be explained with this merger scenario, other progenitors of the narrow component remain possible such as a nuclear star cluster or a DG. To rule these possibilities out, we would need more observational data of member stars of the stream. Our analysis highlights the importance of the internal structure of streams with regards to their formation history.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Range-Only Bearing Estimator for Localization and Mapping
Authors:
Matteo Marcantoni,
Bayu Jayawardhana,
Kerstin Bunte
Abstract:
Navigation and exploration within unknown environments are typical examples in which simultaneous localization and mapping (SLAM) algorithms are applied. When mobile agents deploy only range sensors without bearing information, the agents must estimate the bearing using the online distance measurement for the localization and mapping purposes. In this paper, we propose a scalable dynamic bearing e…
▽ More
Navigation and exploration within unknown environments are typical examples in which simultaneous localization and mapping (SLAM) algorithms are applied. When mobile agents deploy only range sensors without bearing information, the agents must estimate the bearing using the online distance measurement for the localization and mapping purposes. In this paper, we propose a scalable dynamic bearing estimator to obtain the relative bearing of the static landmarks in the local coordinate frame of a moving agent in real-time. Using contraction theory, we provide convergence analysis of the proposed range-only bearing estimator and present upper and lower-bound for the estimator gain. Numerical simulations demonstrate the effectiveness of the proposed method.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Swarm Intelligence-based Extraction and Manifold Crawling Along the Large-Scale Structure
Authors:
Petra Awad,
Reynier Peletier,
Marco Canducci,
Rory Smith,
Abolfazl Taghribi,
Mohammad Mohammadi,
Jihye Shin,
Peter Tino,
Kerstin Bunte
Abstract:
The distribution of galaxies and clusters of galaxies on the mega-parsec scale of the Universe follows an intricate pattern now famously known as the Large-Scale Structure or the Cosmic Web. To study the environments of this network, several techniques have been developed that are able to describe its properties and the properties of groups of galaxies as a function of their environment. In this w…
▽ More
The distribution of galaxies and clusters of galaxies on the mega-parsec scale of the Universe follows an intricate pattern now famously known as the Large-Scale Structure or the Cosmic Web. To study the environments of this network, several techniques have been developed that are able to describe its properties and the properties of groups of galaxies as a function of their environment. In this work we analyze the previously introduced framework: 1-Dimensional Recovery, Extraction, and Analysis of Manifolds (1-DREAM) on N-body cosmological simulation data of the Cosmic Web. The 1-DREAM toolbox consists of five Machine Learning methods, whose aim is the extraction and modelling of 1-dimensional structures in astronomical big data settings. We show that 1-DREAM can be used to extract structures of different density ranges within the Cosmic Web and to create probabilistic models of them. For demonstration, we construct a probabilistic model of an extracted filament and move through the structure to measure properties such as local density and velocity. We also compare our toolbox with a collection of methodologies which trace the Cosmic Web. We show that 1-DREAM is able to split the network into its various environments with results comparable to the state-of-the-art methodologies. A detailed comparison is then made with the public code DisPerSE, in which we find that 1-DREAM is robust against changes in sample size making it suitable for analyzing sparse observational data, and finding faint and diffuse manifolds in low density regions.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
An Industry 4.0 example: real-time quality control for steel-based mass production using Machine Learning on non-invasive sensor data
Authors:
Michiel Straat,
Kevin Koster,
Nick Goet,
Kerstin Bunte
Abstract:
Insufficient steel quality in mass production can cause extremely costly damage to tooling, production downtimes and low quality products. Automatic, fast and cheap strategies to estimate essential material properties for quality control, risk mitigation and the prediction of faults are highly desirable. In this work we analyse a high throughput production line of steel-based products. Currently,…
▽ More
Insufficient steel quality in mass production can cause extremely costly damage to tooling, production downtimes and low quality products. Automatic, fast and cheap strategies to estimate essential material properties for quality control, risk mitigation and the prediction of faults are highly desirable. In this work we analyse a high throughput production line of steel-based products. Currently, the material quality is checked using manual destructive testing, which is slow, wasteful and covers only a tiny fraction of the material. To achieve complete testing coverage our industrial collaborator developed a contactless, non-invasive, electromagnetic sensor to measure all material during production in real-time. Our contribution is three-fold: 1) We show in a controlled experiment that the sensor can distinguish steel with deliberately altered properties. 2) 48 steel coils were fully measured non-invasively and additional destructive tests were conducted on samples to serve as ground truth. A linear model is fitted to predict from the non-invasive measurements two key material properties (yield strength and tensile strength) that normally are obtained by destructive tests. The performance is evaluated in leave-one-coil-out cross-validation. 3) The resulting model is used to analyse the material properties and the relationship with logged product faults on real production data of ~108 km of processed material measured with the non-invasive sensor. The model achieves an excellent performance (F3-score of 0.95) predicting material running out of specifications for the tensile strength. The combination of model predictions and logged product faults shows that if a significant percentage of estimated yield stress values is out of specification, the risk of product faults is high. Our analysis demonstrates promising directions for real-time quality control, risk monitoring and fault detection.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
Interpretable Models Capable of Handling Systematic Missingness in Imbalanced Classes and Heterogeneous Datasets
Authors:
Sreejita Ghosh,
Elizabeth S. Baranowski,
Michael Biehl,
Wiebke Arlt,
Peter Tino,
Kerstin Bunte
Abstract:
Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which h…
▽ More
Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Secure Formation Control via Edge Computing Enabled by Fully Homomorphic Encryption and Mixed Uniform-Logarithmic Quantization
Authors:
Matteo Marcantoni,
Bayu Jayawardhana,
Mariano Perez Chaher,
Kerstin Bunte
Abstract:
Recent developments in communication technologies, such as 5G, together with innovative computing paradigms, such as edge computing, provide further possibilities for the implementation of real-time networked control systems. However, privacy and cyber-security concerns arise when sharing private data between sensors, agents and a third-party computing facility. In this paper, a secure version of…
▽ More
Recent developments in communication technologies, such as 5G, together with innovative computing paradigms, such as edge computing, provide further possibilities for the implementation of real-time networked control systems. However, privacy and cyber-security concerns arise when sharing private data between sensors, agents and a third-party computing facility. In this paper, a secure version of the distributed formation control is presented, analyzed and simulated, where gradient-based formation control law is implemented in the edge, with sensor and actuator information being secured by fully homomorphic encryption method based on learning with error (FHE-LWE) combined with a proposed mixed uniform-logarithmic quantizer (MULQ). The novel quantizer is shown to be suitable for realizing secure control systems with FHE-LWE where the critical real-time information can be quantized into a prescribed bounded space of plaintext while satisfying a sector bound condition whose lower and upper-bound can be made sufficiently close to an identity. An absolute stability analysis is presented, that shows the asymptotic stability of the closed-loop secure control system.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Detection of extragalactic Ultra-Compact Dwarfs and Globular Clusters using Explainable AI techniques
Authors:
Mohammad Mohammadi,
Jarvin Mutatiina,
Teymoor Saifollahi,
Kerstin Bunte
Abstract:
Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular Clusters (GCs) around galaxies are known to be the tracers of the merger events that have been forming these galaxies. Therefore, identifying such systems allows to study galaxies mass assembly, formation and evolution. However, in the lack of spectroscopic information detecting UCDs/GCs using imaging data is very uncertain.…
▽ More
Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular Clusters (GCs) around galaxies are known to be the tracers of the merger events that have been forming these galaxies. Therefore, identifying such systems allows to study galaxies mass assembly, formation and evolution. However, in the lack of spectroscopic information detecting UCDs/GCs using imaging data is very uncertain. Here, we aim to train a machine learning model to separate these objects from the foreground stars and background galaxies using the multi-wavelength imaging data of the Fornax galaxy cluster in 6 filters, namely u, g, r, i, J and Ks. The classes of objects are highly imbalanced which is problematic for many automatic classification techniques. Hence, we employ Synthetic Minority Over-sampling to handle the imbalance of the training data. Then, we compare two classifiers, namely Localized Generalized Matrix Learning Vector Quantization (LGMLVQ) and Random Forest (RF). Both methods are able to identify UCDs/GCs with a precision and a recall of >93 percent and provide relevances that reflect the importance of each feature dimension %(colors and angular sizes) for the classification. Both methods detect angular sizes as important markers for this classification problem. While it is astronomical expectation that color indices of u-i and i-Ks are the most important colors, our analysis shows that colors such as g-r are more informative, potentially because of higher signal-to-noise ratio. Besides the excellent performance the LGMLVQ method allows further interpretability by providing the feature importance for each individual class, class-wise representative samples and the possibility for non-linear visualization of the data as demonstrated in this contribution. We conclude that employing machine learning techniques to identify UCDs/GCs can lead to promising results.
△ Less
Submitted 7 January, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
LAAT: Locally Aligned Ant Technique for discovering multiple faint low dimensional structures of varying density
Authors:
Abolfazl Taghribi,
Kerstin Bunte,
Rory Smith,
Jihye Shin,
Michele Mastropietro,
Reynier F. Peletier,
Peter Tino
Abstract:
Dimensionality reduction and clustering are often used as preliminary steps for many complex machine learning tasks. The presence of noise and outliers can deteriorate the performance of such preprocessing and therefore impair the subsequent analysis tremendously. In manifold learning, several studies indicate solutions for removing background noise or noise close to the structure when the density…
▽ More
Dimensionality reduction and clustering are often used as preliminary steps for many complex machine learning tasks. The presence of noise and outliers can deteriorate the performance of such preprocessing and therefore impair the subsequent analysis tremendously. In manifold learning, several studies indicate solutions for removing background noise or noise close to the structure when the density is substantially higher than that exhibited by the noise. However, in many applications, including astronomical datasets, the density varies alongside manifolds that are buried in a noisy background. We propose a novel method to extract manifolds in the presence of noise based on the idea of Ant colony optimization. In contrast to the existing random walk solutions, our technique captures points that are locally aligned with major directions of the manifold. Moreover, we empirically show that the biologically inspired formulation of ant pheromone reinforces this behavior enabling it to recover multiple manifolds embedded in extremely noisy data clouds. The algorithm performance in comparison to state-of-the-art approaches for noise reduction in manifold detection and clustering is demonstrated, on several synthetic and real datasets, including an N-body simulation of a cosmological volume.
△ Less
Submitted 12 June, 2022; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Visualisation and knowledge discovery from interpretable models
Authors:
Sreejita Ghosh,
Peter Tino,
Kerstin Bunte
Abstract:
Increasing number of sectors which affect human lives, are using Machine Learning (ML) tools. Hence the need for understanding their working mechanism and evaluating their fairness in decision-making, are becoming paramount, ushering in the era of Explainable AI (XAI). In this contribution we introduced a few intrinsically interpretable models which are also capable of dealing with missing values,…
▽ More
Increasing number of sectors which affect human lives, are using Machine Learning (ML) tools. Hence the need for understanding their working mechanism and evaluating their fairness in decision-making, are becoming paramount, ushering in the era of Explainable AI (XAI). In this contribution we introduced a few intrinsically interpretable models which are also capable of dealing with missing values, in addition to extracting knowledge from the dataset and about the problem. These models are also capable of visualisation of the classifier and decision boundaries: they are the angle based variants of Learning Vector Quantization. We have demonstrated the algorithms on a synthetic dataset and a real-world one (heart disease dataset from the UCI repository). The newly developed classifiers helped in investigating the complexities of the UCI dataset as a multiclass problem. The performance of the developed classifiers were comparable to those reported in literature for this dataset, with additional value of interpretability, when the dataset was treated as a binary class problem.
△ Less
Submitted 8 May, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Sparse group factor analysis for biclustering of multiple data sources
Authors:
Kerstin Bunte,
Eemeli Leppäaho,
Inka Saarinen,
Samuel Kaski
Abstract:
Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple dat…
▽ More
Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity.
△ Less
Submitted 21 April, 2016; v1 submitted 29 December, 2015;
originally announced December 2015.
-
GEO debris and interplanetary dust: fluxes and charging behavior
Authors:
Amara L. Graps,
Simon F. Green,
Neil McBride,
J. A. M. McDonnell,
Kalle Bunte,
Hakan Svedhem,
Gerhard Drolshagen
Abstract:
In September 1996, a dust/debris detector: GORID was launched into the geostationary (GEO) region as a piggyback instrument on the Russian Express-2 telecommunications spacecraft. The instrument began its normal operation in April 1997 and ended its mission in July 2002. The goal of this work was to use GORID's particle data to identify and separate the space debris to interplanetary dust partic…
▽ More
In September 1996, a dust/debris detector: GORID was launched into the geostationary (GEO) region as a piggyback instrument on the Russian Express-2 telecommunications spacecraft. The instrument began its normal operation in April 1997 and ended its mission in July 2002. The goal of this work was to use GORID's particle data to identify and separate the space debris to interplanetary dust particles (IDPs) in GEO, to more finely determine the instrument's measurement characteristics and to derive impact fluxes. While the physical characteristics of the GORID impacts alone are insufficient for a reliable distinction between debris and interplanetary dust, the temporal behavior of the impacts are strong enough indicators to separate the populations based on clustering. Non-cluster events are predominantly interplanetary, while cluster events are debris. The GORID mean flux distributions (at mass thresholds which are impact speed dependent) for IDPs, corrected for dead time, are 1.35x10^{-4} m^{-2} s^{-1} using a mean detection rate: 0.54 d^{-1}, and for space debris are 6.1x10^{-4} m^{-2} s^{-1} using a mean detection rate: 2.5 d^{-1}. Beta-meteoroids were not detected. Clusters could be a closely-packed debris cloud or a particle breaking up due to electrostatic fragmentation after high charging.
△ Less
Submitted 12 September, 2006;
originally announced September 2006.