subscribe to arXiv mailings

Panopticon: a telescope for our times

Authors: Will Saunders, Timothy Chin, Michael Goodwin

Abstract: We present a design for a wide-field spectroscopic telescope. The only large powered mirror is spherical, the resulting spherical aberration is corrected for each target separately, giving exceptional image quality. The telescope is a transit design, but still allows all-sky coverage. Three simultaneous modes are proposed: (a) natural seeing multi-object spectroscopy with 12m aperture over 3dg FoV… ▽ More We present a design for a wide-field spectroscopic telescope. The only large powered mirror is spherical, the resulting spherical aberration is corrected for each target separately, giving exceptional image quality. The telescope is a transit design, but still allows all-sky coverage. Three simultaneous modes are proposed: (a) natural seeing multi-object spectroscopy with 12m aperture over 3dg FoV with ~25,000 targets; (b) multi-object AO with 12m aperture over 3dg FoV with ~100 AO-corrected Integral Field Units each with 4 arcsec FoV; (c) ground layer AO-corrected integral field spectroscopy with 15m aperture and 13 arcmin FoV. Such a telescope would be uniquely powerful for large-area follow-up of imaging surveys; in each mode, the AOmega and survey speed exceed all existing facilities combined. The expected cost of this design is relatively modest, much closer to $500M than $1000M. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 10 pages. SPIE 13094-191, Ground-based and Airborne Telescopes X, Yokohama 2024

arXiv:2406.20025 [pdf, ps, other]

Monogamous subvarieties of the nilpotent cone

Authors: Simon M. Goodwin, Rachel Pengelly, David I. Stewart, Adam R. Thomas

Abstract: Let $G$ be a reductive algebraic group over an algebraically closed field $k$ of prime characteristic not $2$, whose Lie algebra is denoted $\mathfrak{g}$. We call a subvariety $\mathfrak{X}$ of the nilpotent cone $N \subset \mathfrak{g}$ monogamous if for every $e\in \mathfrak{X}$, the $\mathfrak{sl}_2$-triples $(e,h,f)$ with $f\in \mathfrak{X}$ are conjugate under the centraliser $C_G(e)$. Build… ▽ More Let $G$ be a reductive algebraic group over an algebraically closed field $k$ of prime characteristic not $2$, whose Lie algebra is denoted $\mathfrak{g}$. We call a subvariety $\mathfrak{X}$ of the nilpotent cone $N \subset \mathfrak{g}$ monogamous if for every $e\in \mathfrak{X}$, the $\mathfrak{sl}_2$-triples $(e,h,f)$ with $f\in \mathfrak{X}$ are conjugate under the centraliser $C_G(e)$. Building on work by the first two authors, we show there is a unique maximal closed $G$-stable monogamous subvariety $V \subset N$ and that it is an orbit closure, hence irreducible. We show that $V$ can also be characterised in terms of Serre's $G$-complete reducibility. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 15 pages

MSC Class: 17B45 (Primary) 17B50 (Secondary)

arXiv:2406.13914 [pdf, other]

The Blue Multi Unit Spectroscopic Explorer (BlueMUSE) on the VLT: science drivers and overview of instrument design

Authors: Johan Richard, Rémi Giroud, Florence Laurent, Davor Krajnović, Alexandre Jeanneau, Roland Bacon, Manuel Abreu, Angela Adamo, Ricardo Araujo, Nicolas Bouché, Jarle Brinchmann, Zhemin Cai, Norberto Castro, Ariadna Calcines, Diane Chapuis, Adélaïde Claeyssens, Luca Cortese, Emanuele Daddi, Christopher Davison, Michael Goodwin, Robert Harris, Matthew Hayes, Mathilde Jauzac, Andreas Kelz, Jean-Paul Kneib , et al. (24 additional authors not shown)

Abstract: BlueMUSE is a blue-optimised, medium spectral resolution, panoramic integral field spectrograph under development for the Very Large Telescope (VLT). With an optimised transmission down to 350 nm, spectral resolution of R$\sim$3500 on average across the wavelength range, and a large FoV (1 arcmin$^2$), BlueMUSE will open up a new range of galactic and extragalactic science cases facilitated by its… ▽ More BlueMUSE is a blue-optimised, medium spectral resolution, panoramic integral field spectrograph under development for the Very Large Telescope (VLT). With an optimised transmission down to 350 nm, spectral resolution of R$\sim$3500 on average across the wavelength range, and a large FoV (1 arcmin$^2$), BlueMUSE will open up a new range of galactic and extragalactic science cases facilitated by its specific capabilities. The BlueMUSE consortium includes 9 institutes located in 7 countries and is led by the Centre de Recherche Astrophysique de Lyon (CRAL). The BlueMUSE project development is currently in Phase A, with an expected first light at the VLT in 2031. We introduce here the Top Level Requirements (TLRs) derived from the main science cases, and then present an overview of the BlueMUSE system and its subsystems fulfilling these TLRs. We specifically emphasize the tradeoffs that are made and the key distinctions compared to the MUSE instrument, upon which the system architecture is built. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, 10 figures, proceedings of the SPIE astronomical telescopes and instrumentation conference, Yokohama, 16-21 June

arXiv:2405.13848 [pdf, other]

Maximum Manifold Capacity Representations in State Representation Learning

Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

Abstract: The expanding research on manifold-based self-supervised learning (SSL) builds on the manifold hypothesis, which suggests that the inherent complexity of high-dimensional data can be unraveled through lower-dimensional manifold embeddings. Capitalizing on this, DeepInfomax with an unbalanced atlas (DIM-UA) has emerged as a powerful tool and yielded impressive results for state representations in r… ▽ More The expanding research on manifold-based self-supervised learning (SSL) builds on the manifold hypothesis, which suggests that the inherent complexity of high-dimensional data can be unraveled through lower-dimensional manifold embeddings. Capitalizing on this, DeepInfomax with an unbalanced atlas (DIM-UA) has emerged as a powerful tool and yielded impressive results for state representations in reinforcement learning. Meanwhile, Maximum Manifold Capacity Representation (MMCR) presents a new frontier for SSL by optimizing class separability via manifold compression. However, MMCR demands extensive input views, resulting in significant computational costs and protracted pre-training durations. Bridging this gap, we present an innovative integration of MMCR into existing SSL methods, incorporating a discerning regularization strategy that enhances the lower bound of mutual information. We also propose a novel state representation learning method extending DIM-UA, embedding a nuclear norm loss to enforce manifold consistency robustly. On experimentation with the Atari Annotated RAM Interface, our method improves DIM-UA significantly with the same number of target encoding dimensions. The mean F1 score averaged over categories is 78% compared to 75% of DIM-UA. There are also compelling gains when implementing SimCLR and Barlow Twins. This supports our SSL innovation as a paradigm shift, enabling more nuanced high-dimensional data representations. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2403.17165 [pdf, other]

Building an Open-Source Community to Enhance Autonomic Nervous System Signal Analysis: DBDP-Autonomic

Authors: Jessilyn Dunn, Varun Mishra, Md Mobashir Hasan Shandhi, Hayoung Jeong, Natasha Yamane, Yuna Watanabe, Bill Chen, Matthew S. Goodwin

Abstract: Smartphones and wearable sensors offer an unprecedented ability to collect peripheral psychophysiological signals across diverse timescales, settings, populations, and modalities. However, open-source software development has yet to keep pace with rapid advancements in hardware technology and availability, creating an analytical barrier that limits the scientific usefulness of acquired data. We pr… ▽ More Smartphones and wearable sensors offer an unprecedented ability to collect peripheral psychophysiological signals across diverse timescales, settings, populations, and modalities. However, open-source software development has yet to keep pace with rapid advancements in hardware technology and availability, creating an analytical barrier that limits the scientific usefulness of acquired data. We propose a community-driven, open-source peripheral psychophysiological signal pre-processing and analysis software framework that could advance biobehavioral health by enabling more robust, transparent, and reproducible inferences involving autonomic nervous system data. △ Less

Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.06877 [pdf, other]

doi 10.1093/mnras/stae458

The SAMI Galaxy Survey: galaxy spin is more strongly correlated with stellar population age than mass or environment

Authors: S. M. Croom, J. van de Sande, S. P. Vaughan, T. H. Rutherford, C. P. Lagos, S. Barsanti, J. Bland-Hawthorn, S. Brough, J. J. Bryant, M. Colless, L. Cortese, F. D'Eugenio, A. Fraser-McKelvie, M. Goodwin, N. P. F. Lorente, S. N. Richards, A. Ristea, S. M. Sweet, S. K. Yi, T. Zafar

Abstract: We use the SAMI Galaxy Survey to examine the drivers of galaxy spin, $λ_{R_e}$, in a multi-dimensional parameter space including stellar mass, stellar population age (or specific star formation rate) and various environmental metrics (local density, halo mass, satellite vs. central). Using a partial correlation analysis we consistently find that age or specific star formation rate is the primary p… ▽ More We use the SAMI Galaxy Survey to examine the drivers of galaxy spin, $λ_{R_e}$, in a multi-dimensional parameter space including stellar mass, stellar population age (or specific star formation rate) and various environmental metrics (local density, halo mass, satellite vs. central). Using a partial correlation analysis we consistently find that age or specific star formation rate is the primary parameter correlating with spin. Light-weighted age and specific star formation rate are more strongly correlated with spin than mass-weighted age. In fact, across our sample, once the relation between light-weighted age and spin is accounted for, there is no significant residual correlation between spin and mass, or spin and environment. This result is strongly suggestive that present-day environment only indirectly influences spin, via the removal of gas and star formation quenching. That is, environment affects age, then age affects spin. Older galaxies then have lower spin, either due to stars being born dynamically hotter at high redshift, or due to secular heating. Our results appear to rule out environmentally dependent dynamical heating (e.g. galaxy-galaxy interactions) being important, at least within $1R_e$ where our kinematic measurements are made. The picture is more complex when we only consider high-mass galaxies ($M_*\gtrsim 10^{11}$M$_{\odot}$). While the age-spin relation is still strong for these high-mass galaxies, there is a residual environmental trend with central galaxies preferentially having lower spin, compared to satellites of the same age and mass. We argue that this trend is likely due to central galaxies being a preferred location for mergers. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 24 pages, 9 figures. Accepted for publication in MNRAS

Journal ref: MNRAS, Volume 529, Issue 4, April 2024, Pages 3446-3468

arXiv:2402.06683 [pdf, other]

Sound Source Separation Using Latent Variational Block-Wise Disentanglement

Authors: Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin

Abstract: While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solutions. In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highligh… ▽ More While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solutions. In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highlighting the theoretical link between variational autoencoder and classical approaches to SS. We propose a system that transforms the single channel under-determined SS task to an equivalent multichannel over-determined SS problem in a properly designed latent space. The separation task in the latent space is treated as finding a variational block-wise disentangled representation of the mixture. We show empirically, that the design choices and the variational formulation of the task at hand motivated by the classical signal processing theoretical results lead to robustness to unseen out-of-distribution data and reduction of the overfitting risk. To address the resulting permutation issue we explicitly incorporate a novel differentiable permutation loss function and augment the model with a memory mechanism to keep track of the statistics of the individual sources. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03676 [pdf, other]

The SAMI galaxy survey: predicting kinematic morphology with logistic regression

Authors: Sam P. Vaughan, Jesse van de Sande, A. Fraser-McKelvie, Scott Croom, Richard McDermid, Benoit Liquet-Weiland, Stefania Barsanti, Luca Cortese, Sarah Brough, Sarah Sweet, Julia J. Bryant, Michael Goodwin, Jon Lawrence

Abstract: We use the SAMI galaxy survey to study the the kinematic morphology-density relation: the observation that the fraction of slow rotator galaxies increases towards dense environments. We build a logistic regression model to quantitatively study the dependence of kinematic morphology (whether a galaxy is a fast rotator or slow rotator) on a wide range of parameters, without resorting to binning the… ▽ More We use the SAMI galaxy survey to study the the kinematic morphology-density relation: the observation that the fraction of slow rotator galaxies increases towards dense environments. We build a logistic regression model to quantitatively study the dependence of kinematic morphology (whether a galaxy is a fast rotator or slow rotator) on a wide range of parameters, without resorting to binning the data. Our model uses a combination of stellar mass, star-formation rate (SFR), $r$-band half-light radius and a binary variable based on whether the galaxy's observed ellipticity ($ε$) is less than 0.4. We show that, at fixed mass, size, SFR and $ε$, a galaxy's local environmental surface density ($\log_{10}(Σ_5/\mathrm{Mpc}^{-2})$) gives no further information about whether a galaxy is a slow rotator, i.e. the observed kinematic-morphology density relation can be entirely explained by the well-known correlations between environment and other quantities. We show how our model can be applied to different galaxy surveys to predict the fraction of slow rotators which would be observed and discuss its implications for the formation pathways of slow rotators. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 12 pages, 6 figures. Accepted for publication in MNRAS

arXiv:2402.02728 [pdf, other]

doi 10.1093/mnras/stae398

The SAMI Galaxy Survey: Using Tidal Streams and Shells to Trace the Dynamical Evolution of Massive Galaxies

Authors: Tomas H. Rutherford, Jesse van de Sande, Scott M. Croom, Lucas M. Valenzuela, Rhea-Silvia Remus, Francesco D'Eugenio, Sam P. Vaughan, Henry R. M. Zovaro, Sarah Casura, Stefania Barsanti, Joss Bland-Hawthorn, Sarah Brough, Julia J. Bryant, Michael Goodwin, Nuria Lorente, Sree Oh, Andrei Ristea

Abstract: Slow rotator galaxies are distinct amongst galaxy populations, with simulations suggesting that a mix of minor and major mergers are responsible for their formation. A promising path to resolve outstanding questions on the type of merger responsible, is by investigating deep imaging of massive galaxies for signs of potential merger remnants. We utilise deep imaging from the Subaru-Hyper Suprime Ca… ▽ More Slow rotator galaxies are distinct amongst galaxy populations, with simulations suggesting that a mix of minor and major mergers are responsible for their formation. A promising path to resolve outstanding questions on the type of merger responsible, is by investigating deep imaging of massive galaxies for signs of potential merger remnants. We utilise deep imaging from the Subaru-Hyper Suprime Cam Wide data to search for tidal features in massive ($\log_{10}(M_*/M_{\odot}) > 10$) early-type galaxies (ETGs) in the SAMI Galaxy Survey. We perform a visual check for tidal features on images where the galaxy has been subtracted using a Multi-Gauss Expansion (MGE) model. We find that $31\pm 2$ percent of our sample show tidal features. When comparing galaxies with and without features, we find that the distributions in stellar mass, light-weighted mean stellar population age and H$α$ equivalent width are significantly different, whereas spin ($λ_{R_e}$), ellipticity and bulge to total ratio have similar distributions. When splitting our sample in age, we find that galaxies below the median age (10.8 Gyr) show a correlation between the presence of shells and lower $λ_{R_e}$, as expected from simulations. We also find these younger galaxies which are classified as having "strong" shells have lower $λ_{R_e}$. However, simulations suggest that merger features become undetectable within $\sim 2-4$ Gyr post-merger. This implies that the relationship between tidal features and merger history disappears for galaxies with older stellar ages, i.e. those that are more likely to have merged long ago. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted for publication in MNRAS. 22 pages, 14 figures

arXiv:2402.00534 [pdf, other]

A Manifold Representation of the Key in Vision Transformers

Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

Abstract: Vision Transformers implement multi-head self-attention via stacking multiple attention blocks. The query, key, and value are often intertwined and generated within those blocks via a single, shared linear transformation. This paper explores the concept of disentangling the key from the query and value, and adopting a manifold representation for the key. Our experiments reveal that decoupling and… ▽ More Vision Transformers implement multi-head self-attention via stacking multiple attention blocks. The query, key, and value are often intertwined and generated within those blocks via a single, shared linear transformation. This paper explores the concept of disentangling the key from the query and value, and adopting a manifold representation for the key. Our experiments reveal that decoupling and endowing the key with a manifold structure can enhance the model's performance. Specifically, ViT-B exhibits a 0.87% increase in top-1 accuracy, while Swin-T sees a boost of 0.52% in top-1 accuracy on the ImageNet-1K dataset, with eight charts in the manifold key. Our approach also yields positive results in object detection and instance segmentation tasks on the COCO dataset. We establish that these performance gains are not merely due to the simplicity of adding more parameters and computations. Future research may investigate strategies for cutting the budget of such representations and aim for further performance improvements based on our findings. △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00337 [pdf, other]

Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure

Authors: Masahito Togami, Jean-Marc Valin, Karim Helwani, Ritwik Giri, Umut Isik, Michael M. Goodwin

Abstract: We introduce a real-time, multichannel speech enhancement algorithm which maintains the spatial cues of stereo recordings including two speech sources. Recognizing that each source has unique spatial information, our method utilizes a dual-path structure, ensuring the spatial cues remain unaffected during enhancement by applying source-specific common-band gain. This method also seamlessly integra… ▽ More We introduce a real-time, multichannel speech enhancement algorithm which maintains the spatial cues of stereo recordings including two speech sources. Recognizing that each source has unique spatial information, our method utilizes a dual-path structure, ensuring the spatial cues remain unaffected during enhancement by applying source-specific common-band gain. This method also seamlessly integrates pretrained monaural speech enhancement, eliminating the need for retraining on stereo inputs. Source separation from stereo mixtures is achieved via spatial beamforming, with the steering vector for each source being adaptively updated using post-enhancement output signal. This ensures accurate tracking of the spatial information. The final stereo output is derived by merging the spatial images of the enhanced sources, with its efficacy not heavily reliant on the separation performance of the beamforming. The algorithm runs in real-time on 10-ms frames with a 40 ms of look-ahead. Evaluations reveal its effectiveness in enhancing speech and preserving spatial cues in both fully and sparsely overlapped mixtures. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: Accepted for ICASSP 2024, 5 pages

arXiv:2401.04406 [pdf, other]

doi 10.5617/nmi.9849

MapAI: Precision in Building Segmentation

Authors: Sander Riisøen Jyhne, Morten Goodwin, Per Arne Andersen, Ivar Oveland, Alexander Salveson Nossum, Karianne Ormseth, Mathilde Ørstavik, Andrew C. Flatman

Abstract: MapAI: Precision in Building Segmentation is a competition arranged with the Norwegian Artificial Intelligence Research Consortium (NORA) in collaboration with Centre for Artificial Intelligence Research at the University of Agder (CAIR), the Norwegian Mapping Authority, AI:Hub, Norkart, and the Danish Agency for Data Supply and Infrastructure. The competition will be held in the fall of 2022. It… ▽ More MapAI: Precision in Building Segmentation is a competition arranged with the Norwegian Artificial Intelligence Research Consortium (NORA) in collaboration with Centre for Artificial Intelligence Research at the University of Agder (CAIR), the Norwegian Mapping Authority, AI:Hub, Norkart, and the Danish Agency for Data Supply and Infrastructure. The competition will be held in the fall of 2022. It will be concluded at the Northern Lights Deep Learning conference focusing on the segmentation of buildings using aerial images and laser data. We propose two different tasks to segment buildings, where the first task can only utilize aerial images, while the second must use laser data (LiDAR) with or without aerial images. Furthermore, we use IoU and Boundary IoU to properly evaluate the precision of the models, with the latter being an IoU measure that evaluates the results' boundaries. We provide the participants with a training dataset and keep a test dataset for evaluation. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 5 pages, 4 figures, competition

arXiv:2311.17656 [pdf, other]

Multiple Toddler Tracking in Indoor Videos

Authors: Somaieh Amraee, Bishoy Galoaa, Matthew Goodwin, Elaheh Hatamimajoumerd, Sarah Ostadabbas

Abstract: Multiple toddler tracking (MTT) involves identifying and differentiating toddlers in video footage. While conventional multi-object tracking (MOT) algorithms are adept at tracking diverse objects, toddlers pose unique challenges due to their unpredictable movements, various poses, and similar appearance. Tracking toddlers in indoor environments introduces additional complexities such as occlusions… ▽ More Multiple toddler tracking (MTT) involves identifying and differentiating toddlers in video footage. While conventional multi-object tracking (MOT) algorithms are adept at tracking diverse objects, toddlers pose unique challenges due to their unpredictable movements, various poses, and similar appearance. Tracking toddlers in indoor environments introduces additional complexities such as occlusions and limited fields of view. In this paper, we address the challenges of MTT and propose MTTSort, a customized method built upon the DeepSort algorithm. MTTSort is designed to track multiple toddlers in indoor videos accurately. Our contributions include discussing the primary challenges in MTT, introducing a genetic algorithm to optimize hyperparameters, proposing an accurate tracking algorithm, and curating the MTTrack dataset using unbiased AI co-labeling techniques. We quantitatively compare MTTSort to state-of-the-art MOT methods on MTTrack, DanceTrack, and MOT15 datasets. In our evaluation, the proposed method outperformed other MOT methods, achieving 0.98, 0.68, and 0.98 in multiple object tracking accuracy (MOTA), higher order tracking accuracy (HOTA), and iterative and discriminative framework 1 (IDF1) metrics, respectively. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.10603 [pdf, other]

On induced completely prime primitive ideals in enveloping algebras of classical Lie algebras

Authors: Simon M. Goodwin, Lewis Topley, Matthew Westaway

Abstract: A distinguished family of completely prime primitive ideals in the universal enveloping algebra of a reductive Lie algebra ${\mathfrak g}$ over ${\mathbb C}$ are those ideals constructed from one-dimensional representations of finite $W$-algebras. We refer to these ideals as Losev--Premet ideals. For ${\mathfrak g}$ simple of classical type, we prove that for a Losev-Premet ideal $I$ in… ▽ More A distinguished family of completely prime primitive ideals in the universal enveloping algebra of a reductive Lie algebra ${\mathfrak g}$ over ${\mathbb C}$ are those ideals constructed from one-dimensional representations of finite $W$-algebras. We refer to these ideals as Losev--Premet ideals. For ${\mathfrak g}$ simple of classical type, we prove that for a Losev-Premet ideal $I$ in $U({\mathfrak g})$, there exists a Losev-Premet ideal $I_0$ for a certain Levi subalgebra ${\mathfrak g}_0$ of ${\mathfrak g}$ such that associated variety of $I_0$ is the closure of a rigid nilpotent orbit in ${\mathfrak g}_0$ and $I$ is obtained from $I_0$ by parabolic induction. This is deduced from the corresponding statement about one-dimensional representations of finite $W$-algebras. △ Less

Submitted 4 December, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 24 pages

MSC Class: 17B35 (Primary); 16D60; 17B08; 17B10 (Secondary)

arXiv:2310.14837 [pdf, ps, other]

Harnessing Attention Mechanisms: Efficient Sequence Reduction using Attention-based Autoencoders

Authors: Daniel Biermann, Fabrizio Palumbo, Morten Goodwin, Ole-Christoffer Granmo

Abstract: Many machine learning models use the manipulation of dimensions as a driving force to enable models to identify and learn important features in data. In the case of sequential data this manipulation usually happens on the token dimension level. Despite the fact that many tasks require a change in sequence length itself, the step of sequence length reduction usually happens out of necessity and in… ▽ More Many machine learning models use the manipulation of dimensions as a driving force to enable models to identify and learn important features in data. In the case of sequential data this manipulation usually happens on the token dimension level. Despite the fact that many tasks require a change in sequence length itself, the step of sequence length reduction usually happens out of necessity and in a single step. As far as we are aware, no model uses the sequence length reduction step as an additional opportunity to tune the models performance. In fact, sequence length manipulation as a whole seems to be an overlooked direction. In this study we introduce a novel attention-based method that allows for the direct manipulation of sequence lengths. To explore the method's capabilities, we employ it in an autoencoder model. The autoencoder reduces the input sequence to a smaller sequence in latent space. It then aims to reproduce the original sequence from this reduced form. In this setting, we explore the methods reduction performance for different input and latent sequence lengths. We are able to show that the autoencoder retains all the significant information when reducing the original sequence to half its original size. When reducing down to as low as a quarter of its original size, the autoencoder is still able to reproduce the original sequence with an accuracy of around 90%. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 8 pages, 5 images, 1 table

arXiv:2310.07032 [pdf, other]

Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing

Authors: Karim Helwani, Erfan Soltanmohammadi, Michael M. Goodwin

Abstract: Improving the interpretability of deep neural networks has recently gained increased attention, especially when the power of deep learning is leveraged to solve problems in physics. Interpretability helps us understand a model's ability to generalize and reveal its limitations. In this paper, we introduce a causal interpretable deep structure for modeling dynamic systems. Our proposed model makes… ▽ More Improving the interpretability of deep neural networks has recently gained increased attention, especially when the power of deep learning is leveraged to solve problems in physics. Interpretability helps us understand a model's ability to generalize and reveal its limitations. In this paper, we introduce a causal interpretable deep structure for modeling dynamic systems. Our proposed model makes use of the harmonic analysis by modeling the system in a time-frequency domain while maintaining high temporal and spectral resolution. Moreover, the model is built in an order recursive manner which allows for fast, robust, and exact second order optimization without the need for an explicit Hessian calculation. To circumvent the resulting high dimensionality of the building blocks of our system, a neural network is designed to identify the frequency interdependencies. The proposed model is illustrated and validated on nonlinear system identification problems as required for audio signal processing tasks. Crowd-sourced experimentation contrasting the performance of the proposed approach to other state-of-the-art solutions on an acoustic echo cancellation scenario confirms the effectiveness of our method for real-life applications. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.14521 [pdf, other]

NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping

Authors: Jan Büthe, Ahmed Mustafa, Jean-Marc Valin, Karim Helwani, Michael M. Goodwin

Abstract: Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this… ▽ More Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE significantly outperforms both the Opus baseline and an enlarged LACE model at 6, 9 and 12 kb/s. We also show that LACE and NoLACE are well-behaved when used with an ASR system. △ Less

Submitted 12 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: final version, accepted at ICASSP 2024

arXiv:2309.14507 [pdf, other]

Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity

Authors: Krishna Subramani, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Mike Goodwin

Abstract: Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement. Recently, pitch estimators based on deep neural networks (DNNs) have have been outperforming well-established DSP-based techniques. Unfortunately, these new estimators can be impractical to deploy in real-time systems, both because of their relatively high complexity, an… ▽ More Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement. Recently, pitch estimators based on deep neural networks (DNNs) have have been outperforming well-established DSP-based techniques. Unfortunately, these new estimators can be impractical to deploy in real-time systems, both because of their relatively high complexity, and the fact that some require significant lookahead. We show that a hybrid estimator using a small deep neural network (DNN) with traditional DSP-based features can match or exceed the performance of pure DNN-based models, with a complexity and algorithmic delay comparable to traditional DSP-based algorithms. We further demonstrate that this hybrid approach can provide benefits for a neural vocoding task. △ Less

Submitted 16 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP 2024, 5 pages

arXiv:2309.02091 [pdf, other]

DeNISE: Deep Networks for Improved Segmentation Edges

Authors: Sander Riisøen Jyhne, Per-Arne Andersen, Morten Goodwin

Abstract: This paper presents Deep Networks for Improved Segmentation Edges (DeNISE), a novel data enhancement technique using edge detection and segmentation models to improve the boundary quality of segmentation masks. DeNISE utilizes the inherent differences in two sequential deep neural architectures to improve the accuracy of the predicted segmentation edge. DeNISE applies to all types of neural networ… ▽ More This paper presents Deep Networks for Improved Segmentation Edges (DeNISE), a novel data enhancement technique using edge detection and segmentation models to improve the boundary quality of segmentation masks. DeNISE utilizes the inherent differences in two sequential deep neural architectures to improve the accuracy of the predicted segmentation edge. DeNISE applies to all types of neural networks and is not trained end-to-end, allowing rapid experiments to discover which models complement each other. We test and apply DeNISE for building segmentation in aerial images. Aerial images are known for difficult conditions as they have a low resolution with optical noise, such as reflections, shadows, and visual obstructions. Overall the paper demonstrates the potential for DeNISE. Using the technique, we improve the baseline results with a building IoU of 78.9%. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.16126 [pdf, other]

CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a Novel Metric

Authors: Karl Audun Kagnes Borgersen, Morten Goodwin, Jivitesh Sharma, Tobias Aasmoe, Mari Leonhardsen, Gro Herredsvela Rørvik

Abstract: Detecting visually similar images is a particularly useful attribute to look to when calculating product recommendations. Embedding similarity, which utilizes pre-trained computer vision models to extract high-level image features, has demonstrated remarkable efficacy in identifying images with similar compositions. However, there is a lack of methods for evaluating the embeddings generated by the… ▽ More Detecting visually similar images is a particularly useful attribute to look to when calculating product recommendations. Embedding similarity, which utilizes pre-trained computer vision models to extract high-level image features, has demonstrated remarkable efficacy in identifying images with similar compositions. However, there is a lack of methods for evaluating the embeddings generated by these models, as conventional loss and performance metrics do not adequately capture their performance in image similarity search tasks. In this paper, we evaluate the viability of the image embeddings from numerous pre-trained computer vision models using a novel approach named CorrEmbed. Our approach computes the correlation between distances in image embeddings and distances in human-generated tag vectors. We extensively evaluate numerous pre-trained Torchvision models using this metric, revealing an intuitive relationship of linear scaling between ImageNet1k accuracy scores and tag-correlation scores. Importantly, our method also identifies deviations from this pattern, providing insights into how different models capture high-level image features. By offering a robust performance evaluation of these pre-trained models, CorrEmbed serves as a valuable tool for researchers and practitioners seeking to develop effective, data-driven approaches to similar item recommendations in fashion retail. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted to AI-2023 Forty-third SGAI International Conference on Artificial Intelligence

arXiv:2305.10267 [pdf, other]

State Representation Learning Using an Unbalanced Atlas

Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

Abstract: The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold… ▽ More The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves existing algorithms significantly as the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units. △ Less

Submitted 24 June, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Journal ref: ICLR 2024

arXiv:2304.12778 [pdf, ps, other]

Loss and Reward Weighing for increased learning in Distributed Reinforcement Learning

Authors: Martin Holen, Per-Arne Andersen, Kristian Muri Knausgård, Morten Goodwin

Abstract: This paper introduces two learning schemes for distributed agents in Reinforcement Learning (RL) environments, namely Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. The R/L weighted methods replace standard practices for training multiple agents, such as summing or averaging the gradients. The core of our methods is to scale the gradient of each actor based on how hig… ▽ More This paper introduces two learning schemes for distributed agents in Reinforcement Learning (RL) environments, namely Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. The R/L weighted methods replace standard practices for training multiple agents, such as summing or averaging the gradients. The core of our methods is to scale the gradient of each actor based on how high the reward (for R-Weighted) or the loss (for L-Weighted) is compared to the other actors. During training, each agent operates in differently initialized versions of the same environment, which gives different gradients from different actors. In essence, the R-Weights and L-Weights of each agent inform the other agents of its potential, which again reports which environment should be prioritized for learning. This approach of distributed learning is possible because environments that yield higher rewards, or low losses, have more critical information than environments that yield lower rewards or higher losses. We empirically demonstrate that the R-Weighted methods work superior to the state-of-the-art in multiple RL environments. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2303.16867 [pdf, other]

A Video-based End-to-end Pipeline for Non-nutritive Sucking Action Recognition and Segmentation in Young Infants

Authors: Shaotong Zhu, Michael Wan, Elaheh Hatamimajoumerd, Kashish Jain, Samuel Zlota, Cholpady Vikram Kamath, Cassandra B. Rowan, Emma C. Grace, Matthew S. Goodwin, Marie J. Hayes, Rebecca A. Schwartz-Mette, Emily Zimmerman, Sarah Ostadabbas

Abstract: We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of… ▽ More We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS) -- an infant sucking pattern with no nutrition delivered -- as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity. Our NNS activity segmentation algorithm solves this problem by identifying periods of NNS with high certainty -- up to 94.0\% average precision and 84.9\% average recall across 30 heterogeneous 60 s clips, drawn from our manually annotated NNS clinical in-crib dataset of 183 hours of overnight baby monitor footage from 19 infants. Our method is based on an underlying NNS action recognition algorithm, which uses spatiotemporal deep learning networks and infant-specific pose estimation, achieving 94.9\% accuracy in binary classification of 960 2.5 s balanced NNS vs. non-NNS clips. Tested on our second, independent, and public NNS in-the-wild dataset, NNS recognition classification reaches 92.3\% accuracy, and NNS segmentation achieves 90.8\% precision and 84.2\% recall. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.14806 [pdf, other]

A Contrastive Learning Scheme with Transformer Innate Patches

Authors: Sander Riisøen Jyhne, Per-Arne Andersen, Morten Goodwin

Abstract: This paper presents Contrastive Transformer, a contrastive learning scheme using the Transformer innate patches. Contrastive Transformer enables existing contrastive learning techniques, often used for image classification, to benefit dense downstream prediction tasks such as semantic segmentation. The scheme performs supervised patch-level contrastive learning, selecting the patches based on the… ▽ More This paper presents Contrastive Transformer, a contrastive learning scheme using the Transformer innate patches. Contrastive Transformer enables existing contrastive learning techniques, often used for image classification, to benefit dense downstream prediction tasks such as semantic segmentation. The scheme performs supervised patch-level contrastive learning, selecting the patches based on the ground truth mask, subsequently used for hard-negative and hard-positive sampling. The scheme applies to all vision-transformer architectures, is easy to implement, and introduces minimal additional memory footprint. Additionally, the scheme removes the need for huge batch sizes, as each patch is treated as an image. We apply and test Contrastive Transformer for the case of aerial image segmentation, known for low-resolution data, large class imbalance, and similar semantic classes. We perform extensive experiments to show the efficacy of the Contrastive Transformer scheme on the ISPRS Potsdam aerial image segmentation dataset. Additionally, we show the generalizability of our scheme by applying it to multiple inherently different Transformer architectures. Ultimately, the results show a consistent increase in mean IoU across all classes. △ Less

Submitted 8 January, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: 7 pages, 3 figures

arXiv:2303.07437 [pdf, other]

Unsupervised Representation Learning in Partially Observable Atari Games

Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

Abstract: State representation learning aims to capture latent factors of an environment. Contrastive methods have performed better than generative models in previous state representation learning research. Although some researchers realize the connections between masked image modeling and contrastive representation learning, the effort is focused on using masks as an augmentation technique to represent the… ▽ More State representation learning aims to capture latent factors of an environment. Contrastive methods have performed better than generative models in previous state representation learning research. Although some researchers realize the connections between masked image modeling and contrastive representation learning, the effort is focused on using masks as an augmentation technique to represent the latent generative factors better. Partially observable environments in reinforcement learning have not yet been carefully studied using unsupervised state representation learning methods. In this article, we create an unsupervised state representation learning scheme for partially observable states. We conducted our experiment on a previous Atari 2600 framework designed to evaluate representation learning models. A contrastive method called Spatiotemporal DeepInfomax (ST-DIM) has shown state-of-the-art performance on this benchmark but remains inferior to its supervised counterpart. Our approach improves ST-DIM when the environment is not fully observable and achieves higher F1 scores and accuracy scores than the supervised learning counterpart. The mean accuracy score averaged over categories of our approach is ~66%, compared to ~38% of supervised learning. The mean F1 score is ~64% to ~33%. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2302.11768 [pdf, other]

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

Authors: Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis

Abstract: In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presen… ▽ More In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presence or absence of speech activity and applying augmentations to speaker embeddings. By training under a multi-task learning setting, we empirically show that the proposed unified model obtains promising results on both personalized and non-personalized speech enhancement benchmarks and reaches similar performance to models that are trained specialized for either task. The strong performance of the proposed method demonstrates that the unified model is a more economical alternative compared to keeping separate task-specific models during inference. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2301.00596 [pdf, other]

A contrastive learning approach for individual re-identification in a wild fish population

Authors: Ørjan Langøy Olsen, Tonje Knutsen Sørdalen, Morten Goodwin, Ketil Malde, Kristian Muri Knausgård, Kim Tallaksen Halvorsen

Abstract: In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, suppo… ▽ More In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset. △ Less

Submitted 2 January, 2023; originally announced January 2023.

ACM Class: I.2.6; I.4.9; I.5.4; J.3

arXiv:2212.10136 [pdf, other]

doi 10.7557/18.6807

A Comparison Between Tsetlin Machines and Deep Neural Networks in the Context of Recommendation Systems

Authors: Karl Audun Borgersen, Morten Goodwin, Jivitesh Sharma

Abstract: Recommendation Systems (RSs) are ubiquitous in modern society and are one of the largest points of interaction between humans and AI. Modern RSs are often implemented using deep learning models, which are infamously difficult to interpret. This problem is particularly exasperated in the context of recommendation scenarios, as it erodes the user's trust in the RS. In contrast, the newly introduced… ▽ More Recommendation Systems (RSs) are ubiquitous in modern society and are one of the largest points of interaction between humans and AI. Modern RSs are often implemented using deep learning models, which are infamously difficult to interpret. This problem is particularly exasperated in the context of recommendation scenarios, as it erodes the user's trust in the RS. In contrast, the newly introduced Tsetlin Machines (TM) possess some valuable properties due to their inherent interpretability. TMs are still fairly young as a technology. As no RS has been developed for TMs before, it has become necessary to perform some preliminary research regarding the practicality of such a system. In this paper, we develop the first RS based on TMs to evaluate its practicality in this application domain. This paper compares the viability of TMs with other machine learning models prevalent in the field of RS. We train and investigate the performance of the TM compared with a vanilla feed-forward deep learning model. These comparisons are based on model performance, interpretability/explainability, and scalability. Further, we provide some benchmark performance comparisons to similar machine learning solutions relevant to RSs. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted to NLDL 2023

arXiv:2212.04532 [pdf, other]

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

Authors: Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin

Abstract: GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models. However, most of their architectures require dozens of billion floating-point operations per second (GFLOPS) to generate speech waveforms in samplewise manner. This makes GAN vocoders still challenging to run on normal CPUs without accelerators or parallel computers. In this… ▽ More GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models. However, most of their architectures require dozens of billion floating-point operations per second (GFLOPS) to generate speech waveforms in samplewise manner. This makes GAN vocoders still challenging to run on normal CPUs without accelerators or parallel computers. In this work, we propose a new architecture for GAN vocoders that mainly depends on recurrent and fully-connected networks to directly generate the time domain signal in framewise manner. This results in considerable reduction of the computational cost and enables very fast generation on both GPUs and low-complexity CPUs. Experimental results show that our Framewise WaveGAN vocoder achieves significantly higher quality than auto-regressive maximum-likelihood vocoders such as LPCNet at a very low complexity of 1.2 GFLOPS. This makes GAN vocoders more practical on edge and low-power devices. △ Less

Submitted 1 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: Accepted to ICASSP 2023, demo: https://ahmed-fau.github.io/fwgan_demo/

arXiv:2210.01805 [pdf, other]

doi 10.1007/978-3-030-63799-6_7

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Authors: Per-Arne Andersen, Morten Goodwin, Ole-Christoffer Granmo

Abstract: Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as epsilon-greedy. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. Model-based RL learns a model of the envi… ▽ More Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment. The learning typically happens through trial and error using explorative methods, such as epsilon-greedy. There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines. Model-based RL learns a model of the environment for learning the policy while model-free approaches are fully explorative and exploitative without considering the underlying environment dynamics. Model-free RL works conceptually well in simulated environments, and empirical evidence suggests that trial and error lead to a near-optimal behavior with enough training. On the other hand, model-based RL aims to be sample efficient, and studies show that it requires far less training in the real environment for learning a good policy. A significant challenge with RL is that it relies on a well-defined reward function to work well for complex environments and such a reward function is challenging to define. Goal-Directed RL is an alternative method that learns an intrinsic reward function with emphasis on a few explored trajectories that reveals the path to the goal state. This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process. The learned distance function works as an intrinsic reward that fuels the agent's learning. Using the distance-metric as a reward, we show that the algorithm performs comparably to model-free RL while having significantly better sample-efficiently in several test environments. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 14 pages, 5 figures, In Proceedings of the International Conference on Innovative Techniques and Applications of Artificial Intelligence, SGAI2020

Journal ref: 2020 Springer Nature Switzerland AG

arXiv:2210.01235 [pdf, other]

doi 10.1109/CoG51982.2022.9893661

CaiRL: A High-Performance Reinforcement Learning Environment Toolkit

Authors: Per-Arne Andersen, Morten Goodwin, Ole-Christoffer Granmo

Abstract: This paper addresses the dire need for a platform that efficiently provides a framework for running reinforcement learning (RL) experiments. We propose the CaiRL Environment Toolkit as an efficient, compatible, and more sustainable alternative for training learning agents and propose methods to develop more efficient environment simulations. There is an increasing focus on developing sustainable… ▽ More This paper addresses the dire need for a platform that efficiently provides a framework for running reinforcement learning (RL) experiments. We propose the CaiRL Environment Toolkit as an efficient, compatible, and more sustainable alternative for training learning agents and propose methods to develop more efficient environment simulations. There is an increasing focus on developing sustainable artificial intelligence. However, little effort has been made to improve the efficiency of running environment simulations. The most popular development toolkit for reinforcement learning, OpenAI Gym, is built using Python, a powerful but slow programming language. We propose a toolkit written in C++ with the same flexibility level but works orders of magnitude faster to make up for Python's inefficiency. This would drastically cut climate emissions. CaiRL also presents the first reinforcement learning toolkit with a built-in JVM and Flash support for running legacy flash games for reinforcement learning research. We demonstrate the effectiveness of CaiRL in the classic control benchmark, comparing the execution speed to OpenAI Gym. Furthermore, we illustrate that CaiRL can act as a drop-in replacement for OpenAI Gym to leverage significantly faster training speeds because of the reduced environment computation time. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: Published in 2022 IEEE Conference on Games (CoG)

Journal ref: IEEE 2022

arXiv:2210.01231 [pdf, other]

doi 10.1007/978-3-030-71711-7_11

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

Authors: Per-Arne Andersen, Ole-Christoffer Granmo, Morten Goodwin

Abstract: Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it… ▽ More Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted. Our proposal, the Deep Variational Q-Network (DVQN), combines deep generative- and reinforcement learning. The algorithm finds good policies from a Gaussian distributed latent-space, which is especially useful for defining options. The DVQN algorithm uses MSE with KL-divergence as regularization, combined with traditional Q-Learning updates. The algorithm learns a latent-space that represents good policies with state clusters for options. We show that the DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow and can maintain stability when trained for extended periods after convergence. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 12 pages, 5 figures, Proceedings of the 3rd International Conference on Intelligent Technologies and Applications

Journal ref: 2021 Springer Nature Switzerland AG

arXiv:2209.00290 [pdf, other]

doi 10.1093/mnras/stac2428

The SAMI Galaxy Survey: Using concentrated star-formation and stellar population ages to understand environmental quenching

Authors: Di Wang, Scott M. Croom, Julia J. Bryant, Sam P. Vaughan, Adam L. Schaefer, Francesco D'Eugenio, Stefania Barsanti, Sarah Brough, Claudia del P. Lagos, Anne M. Medling, Sree Oh, Jesse van de Sande, Giulia Santucci, Joss Bland-Hawthorn, Michael Goodwin, Brent Groves, Jon Lawrence, Matt S. Owers, Samuel Richards

Abstract: We study environmental quenching using the spatial distribution of current star-formation and stellar population ages with the full SAMI Galaxy Survey. By using a star-formation concentration index [C-index, defined as log10(r_{50,Halpha}/r_{50,cont})], we separate our sample into regular galaxies (C-index>-0.2) and galaxies with centrally concentrated star-formation (SF-concentrated; C-index<-0.2… ▽ More We study environmental quenching using the spatial distribution of current star-formation and stellar population ages with the full SAMI Galaxy Survey. By using a star-formation concentration index [C-index, defined as log10(r_{50,Halpha}/r_{50,cont})], we separate our sample into regular galaxies (C-index>-0.2) and galaxies with centrally concentrated star-formation (SF-concentrated; C-index<-0.2). Concentrated star-formation is a potential indicator of galaxies currently undergoing `outside-in' quenching. Our environments cover ungrouped galaxies, low-mass groups (M_200<10^12.5 M_sun), high-mass groups (M_200 in the range 10^{12.5-14} M_sun) and clusters (M_200>10^14 M_sun). We find the fraction of SF-concentrated galaxies increases as halo mass increases with 9\pm2 per cent, 8\pm3 per cent, 19\pm4 per cent and 29\pm4 per cent for ungrouped galaxies, low-mass groups, high-mass groups and clusters, respectively. We interpret these results as evidence for `outside-in' quenching in groups and clusters. To investigate the quenching time-scale in SF-concentrated galaxies, we calculate light-weighted age (Age_L) and mass-weighted age (Age_M) using full spectral fitting, as well as the Dn4000 and Hdelta_A indices. We assume that the average galaxy age radial profile before entering a group or cluster is similar to ungrouped regular galaxies. At large radius (1-2 R_e), SF-concentrated galaxies in high-mass groups have older ages than ungrouped regular galaxies with an age difference of 1.83\pm0.38 Gyr for Age_L and 1.34\pm0.56 Gyr for Age_M. This suggests that while `outside-in' quenching can be effective in groups, the process will not quickly quench the entire galaxy. In contrast, the ages at 1-2 R_e of cluster SF-concentrated galaxies and ungrouped regular galaxies are consistent (0.19\pm0.21 Gyr for Age_L, 0.40\pm0.61 Gyr for Age_M), suggesting the quenching process must be rapid. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 20 pages, 18 figures

Journal ref: Monthly Notices of the Royal Astronomical Society, 2022

arXiv:2207.03752 [pdf, other]

doi 10.1093/mnras/stac1841

The SAMI Galaxy Survey: The relationship between galaxy rotation and the motion of neighbours

Authors: Yifan Mai, Sam P. Vaughan, Scott M. Croom, Jesse van de Sande, Stefania Barsanti, Joss Bland-Hawthorn, Sarah Brough, Julia J. Bryant, Matthew Colless, Michael Goodwin, Brent Groves, Iraklis S. Konstantopoulos, Jon S. Lawrence, Nuria P. F. Lorente, Samuel N. Richards

Abstract: Using data from the SAMI Galaxy Survey, we investigate the correlation between the projected stellar kinematic spin vector of 1397 SAMI galaxies and the line-of-sight motion of their neighbouring galaxies. We calculate the luminosity-weighted mean velocity difference between SAMI galaxies and their neighbours in the direction perpendicular to the SAMI galaxies angular momentum axes. The luminosity… ▽ More Using data from the SAMI Galaxy Survey, we investigate the correlation between the projected stellar kinematic spin vector of 1397 SAMI galaxies and the line-of-sight motion of their neighbouring galaxies. We calculate the luminosity-weighted mean velocity difference between SAMI galaxies and their neighbours in the direction perpendicular to the SAMI galaxies angular momentum axes. The luminosity-weighted mean velocity offsets between SAMI and neighbours, which indicates the signal of coherence between the rotation of the SAMI galaxies and the motion of neighbours, is 9.0 $\pm$ 5.4 km s$^{-1}$ (1.7 $σ$) for neighbours within 1 Mpc. In a large-scale analysis, we find that the average velocity offsets increase for neighbours out to 2 Mpc. However, the velocities are consistent with zero or negative for neighbours outside 3 Mpc. The negative signals for neighbours at distance around 10 Mpc are also significant at $\sim 2$ $σ$ level, which indicate that the positive signals within 2 Mpc might come from the variance of large-scale structure. We also calculate average velocities of different subsamples, including galaxies in different regions of the sky, galaxies with different stellar masses, galaxy type, $λ_{Re}$ and inclination. Although low-mass, high-mass, early-type and low-spin galaxies subsamples show 2 - 3 $σ$ signal of coherence for the neighbours within 2 Mpc, the results for different inclination subsamples and large-scale results suggest that the $\sim 2 σ$ signals might result from coincidental scatter or variance of large-scale structure. Overall, the modest evidence of coherence signals for neighbouring galaxies within 2 Mpc needs to be confirmed by larger samples of observations and simulation studies. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 14 pages, 9 figures, accepted for publication in MNRAS

arXiv:2206.15269 [pdf, other]

Deep Reinforcement Learning with Swin Transformers

Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

Abstract: Transformers are neural network models that utilize multiple layers of self-attention heads and have exhibited enormous potential in natural language processing tasks. Meanwhile, there have been efforts to adapt transformers to visual tasks of machine learning, including Vision Transformers and Swin Transformers. Although some researchers use Vision Transformers for reinforcement learning tasks, t… ▽ More Transformers are neural network models that utilize multiple layers of self-attention heads and have exhibited enormous potential in natural language processing tasks. Meanwhile, there have been efforts to adapt transformers to visual tasks of machine learning, including Vision Transformers and Swin Transformers. Although some researchers use Vision Transformers for reinforcement learning tasks, their experiments remain at a small scale due to the high computational cost. This article presents the first online reinforcement learning scheme that is based on Swin Transformers: Swin DQN. In contrast to existing research, our novel approach demonstrate the superior performance with experiments on 49 games in the Arcade Learning Environment. The results show that our approach achieves significantly higher maximal evaluation scores than the baseline method in 45 of all the 49 games (92%), and higher mean evaluation scores than the baseline method in 40 of all the 49 games (82%). △ Less

Submitted 24 June, 2024; v1 submitted 30 June, 2022; originally announced June 2022.

arXiv:2206.09072 [pdf, other]

Semi-supervised Time Domain Target Speaker Extraction with Attention

Authors: Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy

Abstract: In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It consists of a pre-trained speaker embedder network and a separator network based on transformer encoder blocks. We study multiple methods to combine speaker information with the input mixture, and the resulting Exformer architecture obtains superior extraction performance compared to prior time-domain n… ▽ More In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It consists of a pre-trained speaker embedder network and a separator network based on transformer encoder blocks. We study multiple methods to combine speaker information with the input mixture, and the resulting Exformer architecture obtains superior extraction performance compared to prior time-domain networks. Furthermore, we investigate a two-stage procedure to train the model using mixtures without reference signals upon a pre-trained supervised model. Experimental results show that the proposed semi-supervised learning procedure improves the performance of the supervised baselines. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2204.12630 [pdf, other]

doi 10.1093/mnras/stac1221

The SAMI Galaxy Survey: The Link Between [$α$/Fe] and Kinematic Morphology

Authors: Peter J. Watson, Roger L. Davies, Jesse van de Sande, Sarah Brough, Scott M. Croom, Francesco D'Eugenio, Karl Glazebrook, Brent Groves, Ángel R. López-Sánchez, Nicholas Scott, Sam P. Vaughan, C. Jakob Walcher, Joss Bland-Hawthorn, Julia J. Bryant, Michael Goodwin, Jon S. Lawrence, Nuria P. F. Lorente, Matt S. Owers, Samuel Richards

Abstract: We explore a sample of 1492 galaxies with measurements of the mean stellar population properties and the spin parameter proxy, $λ_{R_{\rm{e}}}$, drawn from the SAMI Galaxy Survey. We fit a global $\left[α/\rm{Fe}\right]$-$σ$ relation, finding that $\left[α/\rm{Fe}\right]=(0.395\pm0.010)\rm{log}_{10}\left(σ\right)-(0.627\pm0.002)$. We observe an anti-correlation between the residuals… ▽ More We explore a sample of 1492 galaxies with measurements of the mean stellar population properties and the spin parameter proxy, $λ_{R_{\rm{e}}}$, drawn from the SAMI Galaxy Survey. We fit a global $\left[α/\rm{Fe}\right]$-$σ$ relation, finding that $\left[α/\rm{Fe}\right]=(0.395\pm0.010)\rm{log}_{10}\left(σ\right)-(0.627\pm0.002)$. We observe an anti-correlation between the residuals $Δ\left[α/\rm{Fe}\right]$ and the inclination-corrected $λ_{\,R_{\rm{e}}}^{\rm{\,eo}}$, which can be expressed as $Δ\left[α/\rm{Fe}\right]=(-0.057\pm0.008)λ_{\,R_{\rm{e}}}^{\rm{\,eo}}+(0.020\pm0.003)$. The anti-correlation appears to be driven by star-forming galaxies, with a gradient of $Δ\left[α/\rm{Fe}\right]\sim(-0.121\pm0.015)λ_{\,R_{\rm{e}}}^{\rm{\,eo}}$, although a weak relationship persists for the subsample of galaxies for which star formation has been quenched. We take this to be confirmation that disk-dominated galaxies have an extended duration of star formation. At a reference velocity dispersion of 200 km s$^{-1}$, we estimate an increase in half-mass formation time from $\sim$0.5 Gyr to $\sim$1.2 Gyr from low- to high-$λ_{\,R_{\rm{e}}}^{\rm{\,eo}}$ galaxies. Slow rotators do not appear to fit these trends. Their residual $α$-enhancement is indistinguishable from other galaxies with $λ_{\,R_{\rm{e}}}^{\rm{\,eo}}\lessapprox0.4$, despite being both larger and more massive. This result shows that galaxies with $λ_{\,R_{\rm{e}}}^{\rm{\,eo}}\lessapprox0.4$ experience a similar range of star formation histories, despite their different physical structure and angular momentum. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: 12 pages, 9 figures

arXiv:2203.15092 [pdf, other]

Improved singing voice separation with chromagram-based pitch-aware remixing

Authors: Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy

Abstract: Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixing have been shown to make better use of existing data and mildly improve model performance. We propose a novel data augmentation technique, chromagram… ▽ More Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixing have been shown to make better use of existing data and mildly improve model performance. We propose a novel data augmentation technique, chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed. By performing controlled experiments in both supervised and semi-supervised settings, we demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR) △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: To appear at ICASSP 2022, 5 pages, 1 figure

arXiv:2203.12537 [pdf, other]

Socially Fair Mitigation of Misinformation on Social Networks via Constraint Stochastic Optimization

Authors: Ahmed Abouzeid, Ole-Christoffer Granmo, Christian Webersik, Morten Goodwin

Abstract: Recent social networks' misinformation mitigation approaches tend to investigate how to reduce misinformation by considering a whole-network statistical scale. However, unbalanced misinformation exposures among individuals urge to study fair allocation of mitigation resources. Moreover, the network has random dynamics which change over time. Therefore, we introduce a stochastic and non-stationary… ▽ More Recent social networks' misinformation mitigation approaches tend to investigate how to reduce misinformation by considering a whole-network statistical scale. However, unbalanced misinformation exposures among individuals urge to study fair allocation of mitigation resources. Moreover, the network has random dynamics which change over time. Therefore, we introduce a stochastic and non-stationary knapsack problem, and we apply its resolution to mitigate misinformation in social network campaigns. We further propose a generic misinformation mitigation algorithm that is robust to different social networks' misinformation statistics, allowing a promising impact in real-world scenarios. A novel loss function ensures fair mitigation among users. We achieve fairness by intelligently allocating a mitigation incentivization budget to the knapsack, and optimizing the loss function. To this end, a team of Learning Automata (LA) drives the budget allocation. Each LA is associated with a user and learns to minimize its exposure to misinformation by performing a non-stationary and stochastic walk over its state space. Our results show how our LA-based method is robust and outperforms similar misinformation mitigation methods in how the mitigation is fairly influencing the network users. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: These 14-pages paper is a version with appendices so that I can cite appendices in the original version of the paper which was accepted and submitted to AAAI22

arXiv:2203.01004 [pdf, other]

doi 10.1109/TG.2022.3185330

Improving the Diversity of Bootstrapped DQN by Replacing Priors With Noise

Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

Abstract: Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state,… ▽ More Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state, analogous to the definition of the exploration ratio in RL. Thus, the performance of Bootstrapped Deep Q-Learning Network is deeply connected with the level of diversity within the algorithm. In the original research, it was pointed out that a random prior could improve the performance of the model. In this article, we further explore the possibility of replacing priors with noise and sample the noise from a Gaussian distribution to introduce more diversity into this algorithm. We conduct our experiment on the Atari benchmark and compare our algorithm to both the original and other related algorithms. The results show that our modification of the Bootstrapped Deep Q-Learning algorithm achieves significantly higher evaluation scores across different types of Atari games. Thus, we conclude that replacing priors with noise can improve Bootstrapped Deep Q-Learning's performance by ensuring the integrity of diversities. △ Less

Submitted 24 June, 2024; v1 submitted 2 March, 2022; originally announced March 2022.

Journal ref: IEEE Journal of Transactions on Games, 2022

arXiv:2109.14737 [pdf, other]

Unlocking the potential of deep learning for marine ecology: overview, applications, and outlook

Authors: Morten Goodwin, Kim Tallaksen Halvorsen, Lei Jiao, Kristian Muri Knausgård, Angela Helen Martin, Marta Moyano, Rebekah A. Oomen, Jeppe Have Rasmussen, Tonje Knutsen Sørdalen, Susanna Huneide Thorbjørnsen

Abstract: The deep learning revolution is touching all scientific disciplines and corners of our lives as a means of harnessing the power of big data. Marine ecology is no exception. These new methods provide analysis of data from sensors, cameras, and acoustic recorders, even in real time, in ways that are reproducible and rapid. Off-the-shelf algorithms can find, count, and classify species from digital i… ▽ More The deep learning revolution is touching all scientific disciplines and corners of our lives as a means of harnessing the power of big data. Marine ecology is no exception. These new methods provide analysis of data from sensors, cameras, and acoustic recorders, even in real time, in ways that are reproducible and rapid. Off-the-shelf algorithms can find, count, and classify species from digital images or video and detect cryptic patterns in noisy data. Using these opportunities requires collaboration across ecological and data science disciplines, which can be challenging to initiate. To facilitate these collaborations and promote the use of deep learning towards ecosystem-based management of the sea, this paper aims to bridge the gap between marine ecologists and computer scientists. We provide insight into popular deep learning approaches for ecological data analysis in plain language, focusing on the techniques of supervised learning with deep neural networks, and illustrate challenges and opportunities through established and emerging applications of deep learning to marine ecology. We use established and future-looking case studies on plankton, fishes, marine mammals, pollution, and nutrient cycling that involve object detection, classification, tracking, and segmentation of visualized data. We conclude with a broad outlook of the field's opportunities and challenges, including potential technological advances and issues with managing complex data sets. △ Less

Submitted 29 September, 2021; originally announced September 2021.

Comments: 44 pages, 4 figures

arXiv:2108.04286 [pdf, other]

On $\mathfrak{sl}_2$-triples for classical algebraic groups in positive characteristic

Authors: Simon M. Goodwin, Rachel Pengelly

Abstract: Let $k$ be an algebraically closed field of characteristic $p > 2$, let $n \in \mathbb Z_{>0}$, and take $G$ to be one of the classical algebraic groups $\mathrm{GL}_n(k)$, $\mathrm{SL}_n(k)$, $\mathrm{Sp}_n(k)$, $\mathrm O_n(k)$ or $\mathrm{SO}_n(k)$, with $\mathfrak g = \operatorname{Lie} G$. We determine the maximal $G$-stable closed subvariety $\mathcal V$ of the nilpotent cone $\mathcal N$ of… ▽ More Let $k$ be an algebraically closed field of characteristic $p > 2$, let $n \in \mathbb Z_{>0}$, and take $G$ to be one of the classical algebraic groups $\mathrm{GL}_n(k)$, $\mathrm{SL}_n(k)$, $\mathrm{Sp}_n(k)$, $\mathrm O_n(k)$ or $\mathrm{SO}_n(k)$, with $\mathfrak g = \operatorname{Lie} G$. We determine the maximal $G$-stable closed subvariety $\mathcal V$ of the nilpotent cone $\mathcal N$ of $\mathfrak g$ such that the $G$-orbits in $\mathcal V$ are in bijection with the $G$-orbits of $\mathfrak{sl}_2$-triples $(e,h,f)$ with $e,f \in \mathcal V$. This result determines to what extent the theorems of Jacobson--Morozov and Kostant on $\mathfrak{sl}_2$-triples hold for classical algebraic groups over an algebraically closed field of "small" odd characteristic. △ Less

Submitted 16 March, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: 20 pages, minor changes

MSC Class: 17B50 (Primary) 17B10; 17B45 (Secondary)

arXiv:2108.00547 [pdf, other]

Detection of subtle cartilage and bone tissue degeneration in the equine joint using polarisation-sensitive optical coherence tomography

Authors: Matthew Goodwin, Marie Klufts, Joshua Workman, Ashvin Thambyah, Frédérique Vanholsbeeck

Abstract: Objective: To explore the ability of polarisation-sensitive optical coherence tomography (PS-OCT) to rapidly identify subtle signs of tissue degeneration in the equine joint. Design: Polarisation-sensitive optical coherence tomography (PS-OCT) images were systematically acquired in four locations along the medial and lateral condyles of the third metacarpal bone in 5 equine specimens. Intensity… ▽ More Objective: To explore the ability of polarisation-sensitive optical coherence tomography (PS-OCT) to rapidly identify subtle signs of tissue degeneration in the equine joint. Design: Polarisation-sensitive optical coherence tomography (PS-OCT) images were systematically acquired in four locations along the medial and lateral condyles of the third metacarpal bone in 5 equine specimens. Intensity and retardation PS-OCT images, and anomalies observed therein, were then compared and validated with high resolution images of the tissue sections obtained using Differential Interference contrast (DIC) optical light microscopy. Results: The PS-OCT system was capable of imaging the entire equine osteochondral unit, and allowed delineation of the three structurally differentiated zones of the joint, that is, the articular cartilage matrix, zone of calcified cartilage and underlying subchondral bone. Importantly, PS-OCT imaging was able to detect underlying matrix and bone changes not visible without dissection and/or microscopy. Conclusion: PS-OCT has substantial potential to detect, non-invasively, sub-surface microstructural changes that are known to be associated with the early stages of joint tissue degeneration. △ Less

Submitted 1 August, 2021; originally announced August 2021.

arXiv:2107.10806 [pdf]

Self-transfer learning via patches: A prostate cancer triage approach based on bi-parametric MRI

Authors: Alvaro Fernandez-Quilez, Trygve Eftestøl, Morten Goodwin, Svein Reidar Kjosavik, Ketil Oppedal

Abstract: Prostate cancer (PCa) is the second most common cancer diagnosed among men worldwide. The current PCa diagnostic pathway comes at the cost of substantial overdiagnosis, leading to unnecessary treatment and further testing. Bi-parametric magnetic resonance imaging (bp-MRI) based on apparent diffusion coefficient maps (ADC) and T2-weighted (T2w) sequences has been proposed as a triage test to differ… ▽ More Prostate cancer (PCa) is the second most common cancer diagnosed among men worldwide. The current PCa diagnostic pathway comes at the cost of substantial overdiagnosis, leading to unnecessary treatment and further testing. Bi-parametric magnetic resonance imaging (bp-MRI) based on apparent diffusion coefficient maps (ADC) and T2-weighted (T2w) sequences has been proposed as a triage test to differentiate between clinically significant (cS) and non-clinically significant (ncS) prostate lesions. However, analysis of the sequences relies on expertise, requires specialized training, and suffers from inter-observer variability. Deep learning (DL) techniques hold promise in tasks such as classification and detection. Nevertheless, they rely on large amounts of annotated data which is not common in the medical field. In order to palliate such issues, existing works rely on transfer learning (TL) and ImageNet pre-training, which has been proven to be sub-optimal for the medical imaging domain. In this paper, we present a patch-based pre-training strategy to distinguish between cS and ncS lesions which exploit the region of interest (ROI) of the patched source domain to efficiently train a classifier in the full-slice target domain which does not require annotations by making use of transfer learning (TL). We provide a comprehensive comparison between several CNNs architectures and different settings which are presented as a baseline. Moreover, we explore cross-domain TL which exploits both MRI modalities and improves single modality results. Finally, we show how our approaches outperform the standard approaches by a considerable margin △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 13 pages. Article under review

arXiv:2107.01054 [pdf, other]

doi 10.1093/mnras/stac705

The LEGA-C and SAMI Galaxy Surveys: Quiescent Stellar Populations and the Mass-Size Plane across 6 Gyr

Authors: Tania M. Barone, Francesco D'Eugenio, Nicholas Scott, Matthew Colless, Sam P. Vaughan, Arjen van der Wel, Amelia Fraser-McKelvie, Anna de Graaff, Jesse van de Sande, Po-Feng Wu, Rachel Bezanson, Sarah Brough, Eric Bell, Scott M. Croom, Luca Cortese, Simon Driver, Anna R. Gallazzi, Adam Muzzin, David Sobral, Joss Bland-Hawthorn, Julia J. Bryant, Michael Goodwin, Jon S. Lawrence, Nuria P. F. Lorente, Matt S. Owers

Abstract: We investigate the change in mean stellar population age and metallicity ([Z/H]) scaling relations for quiescent galaxies from intermediate redshift ($0.60\leq z\leq0.76$) using the LEGA-C Survey, to low redshift ($0.014\leq z\leq0.10$) using the SAMI Galaxy Survey. We find that, similarly to their low-redshift counterparts, the stellar metallicity of quiescent galaxies at $0.60\leq z\leq 0.76$ cl… ▽ More We investigate the change in mean stellar population age and metallicity ([Z/H]) scaling relations for quiescent galaxies from intermediate redshift ($0.60\leq z\leq0.76$) using the LEGA-C Survey, to low redshift ($0.014\leq z\leq0.10$) using the SAMI Galaxy Survey. We find that, similarly to their low-redshift counterparts, the stellar metallicity of quiescent galaxies at $0.60\leq z\leq 0.76$ closely correlates with $M_*/R_\mathrm{e}$ (a proxy for the gravitational potential or escape velocity), in that galaxies with deeper potential wells are more metal-rich. This supports the hypothesis that the relation arises due to the gravitational potential regulating the retention of metals, by determining the escape velocity required by metal-rich stellar and supernova ejecta to escape the system and avoid being recycled into later stellar generations. On the other hand, we find no correlation between stellar age and $M_*/R_\mathrm{e}^2$ (stellar mass surface density $Σ$) in the LEGA-C sample, despite this being a strong relation at low redshift. We consider this change in the age--$Σ$ relation in the context of the redshift evolution of the star-forming and quiescent populations in the mass--size plane, and find our results can be explained as a consequence of galaxies forming more compactly at higher redshifts, and remaining compact throughout their evolution. Furthermore, galaxies appear to quench at a characteristic surface density that decreases with decreasing redshift. The $z\sim 0$ age--$Σ$ relation is therefore a result of building up the quiescent and star-forming populations with galaxies that formed at a range of redshifts and so a range of surface densities. △ Less

Submitted 11 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: 18 pages, 11 figures, accepted to MNRAS

arXiv:2106.14642 [pdf, other]

doi 10.7557/18.6237

Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

Authors: Li Meng, Anis Yazidi, Morten Goodwin, Paal Engelstad

Abstract: In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three dis… ▽ More In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm exhibits unstable and suboptimal behavior in non-deterministic settings, whereas Expert Q-learning demonstrates more robust performance with higher scores, illustrating that our algorithm is indeed suitable to integrate state values from expert examples into Q-learning. △ Less

Submitted 25 June, 2024; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Camera-ready version

Journal ref: Septentrio Academic, Tromso, Norway, 2022

arXiv:2106.01928 [pdf, other]

doi 10.1093/mnras/stab3477

The SAMI Galaxy Survey: Trends in [α/Fe] as a Function of Morphology and Environment

Authors: Peter J. Watson, Roger L. Davies, Sarah Brough, Scott M. Croom, Francesco D'Eugenio, Karl Glazebrook, Brent Groves, Ángel R. López-Sánchez, Jesse van de Sande, Nicholas Scott, Sam P. Vaughan, Jakob Walcher, Joss Bland-Hawthorn, Julia J. Bryant, Michael Goodwin, Jon S. Lawrence, Nuria P. F. Lorente, Matt S. Owers, Samuel Richards

Abstract: We present a new set of index-based measurements of [$α$/Fe] for a sample of 2093 galaxies in the SAMI Galaxy Survey. Following earlier work, we fit a global relation between [$α$/Fe] and the galaxy velocity dispersion $σ$ for red sequence galaxies, [$α$/Fe]=(0.378$\pm$0.009)log($σ$/100)+(0.155$\pm$0.003). We observe a correlation between the residuals and the local environmental surface density,… ▽ More We present a new set of index-based measurements of [$α$/Fe] for a sample of 2093 galaxies in the SAMI Galaxy Survey. Following earlier work, we fit a global relation between [$α$/Fe] and the galaxy velocity dispersion $σ$ for red sequence galaxies, [$α$/Fe]=(0.378$\pm$0.009)log($σ$/100)+(0.155$\pm$0.003). We observe a correlation between the residuals and the local environmental surface density, whereas no such relation exists for blue cloud galaxies. In the full sample, we find that elliptical galaxies in high-density environments are $α$-enhanced by up to 0.057$\pm$0.014 dex at velocity dispersions $σ$<100 km/s, compared with those in low-density environments. This $α$-enhancement is morphology-dependent, with the offset decreasing along the Hubble sequence towards spirals, which have an offset of 0.019$\pm$0.014 dex. At low velocity dispersion and controlling for morphology, we estimate that star formation in high-density environments is truncated $\sim1$ Gyr earlier than in low-density environments. For elliptical galaxies only, we find support for a parabolic relationship between [$α$/Fe] and $σ$, with an environmental $α$-enhancement of at least 0.03 dex. This suggests strong contributions from both environment and mass-based quenching mechanisms. However, there is no evidence for this behaviour in later morphological types. △ Less

Submitted 24 August, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: 15 pages, 9 figures. Revised after comments from referee

arXiv:2105.10179 [pdf, other]

doi 10.1093/mnras/stab1494

The SAMI Galaxy Survey: The role of disc fading and progenitor bias in kinematic transitions

Authors: S. M. Croom, D. S. Taranu, J. van de Sande, C. D. P. Lagos, K. E. Harborne, J. Bland-Hawthorn, S. Brough, J. J. Bryant, L. Cortese, C. Foster, M. Goodwin, B. Groves, A. Khalid, J. Lawrence, A. M. Medling, S. N. Richards, M. S. Owers, N. Scott, S. P. Vaughan

Abstract: We use comparisons between the SAMI Galaxy Survey and equilibrium galaxy models to infer the importance of disc fading in the transition of spirals into lenticular (S0) galaxies. The local S0 population has both higher photometric concentration and lower stellar spin than spiral galaxies of comparable mass and we test whether this separation can be accounted for by passive aging alone. We construc… ▽ More We use comparisons between the SAMI Galaxy Survey and equilibrium galaxy models to infer the importance of disc fading in the transition of spirals into lenticular (S0) galaxies. The local S0 population has both higher photometric concentration and lower stellar spin than spiral galaxies of comparable mass and we test whether this separation can be accounted for by passive aging alone. We construct a suite of dynamically self--consistent galaxy models, with a bulge, disc and halo using the GalactICS code. The dispersion-dominated bulge is given a uniformly old stellar population, while the disc is given a current star formation rate putting it on the main sequence, followed by sudden instantaneous quenching. We then generate mock observables (r-band images, stellar velocity and dispersion maps) as a function of time since quenching for a range of bulge/total (B/T) mass ratios. The disc fading leads to a decline in measured spin as the bulge contribution becomes more dominant, and also leads to increased concentration. However, the quantitative changes observed after 5 Gyr of disc fading cannot account for all of the observed difference. We see similar results if we instead subdivide our SAMI Galaxy Survey sample by star formation (relative to the main sequence). We use EAGLE simulations to also take into account progenitor bias, using size evolution to infer quenching time. The EAGLE simulations suggest that the progenitors of current passive galaxies typically have slightly higher spin than present day star-forming disc galaxies of the same mass. As a result, progenitor bias moves the data further from the disc fading model scenario, implying that intrinsic dynamical evolution must be important in the transition from star-forming discs to passive discs. △ Less

Submitted 21 May, 2021; originally announced May 2021.

Comments: Accepted for publication in MNRAS

arXiv:2104.10167 [pdf, other]

doi 10.1093/mnras/stab1146

The SAMI Galaxy Survey: stellar population and structural trends across the Fundamental Plane

Authors: Francesco D'Eugenio, Matthew Colless, Nicholas Scott, Arjen van der Wel, Roger L. Davies, Jesse van de Sande, Sarah M. Sweet, Sree Oh, Brent Groves, Rob Sharp, Matt S. Owers, Joss Bland-Hawthorn, Scott M. Croom, Sarah Brough, Julia J. Bryant, Michael Goodwin, Jon S. Lawrence, Nuria P. F. Lorente, Samuel N. Richards

Abstract: We study the Fundamental Plane (FP) for a volume- and luminosity-limited sample of 560 early-type galaxies from the SAMI survey. Using r-band sizes and luminosities from new Multi-Gaussian Expansion (MGE) photometric measurements, and treating luminosity as the dependent variable, the FP has coefficients a=1.294$\pm$0.039, b= 0.912$\pm$0.025, and zero-point c= 7.067$\pm$0.078. We leverage the high… ▽ More We study the Fundamental Plane (FP) for a volume- and luminosity-limited sample of 560 early-type galaxies from the SAMI survey. Using r-band sizes and luminosities from new Multi-Gaussian Expansion (MGE) photometric measurements, and treating luminosity as the dependent variable, the FP has coefficients a=1.294$\pm$0.039, b= 0.912$\pm$0.025, and zero-point c= 7.067$\pm$0.078. We leverage the high signal-to-noise of SAMI integral field spectroscopy, to determine how structural and stellar-population observables affect the scatter about the FP. The FP residuals correlate most strongly (8$σ$ significance) with luminosity-weighted simple-stellar-population (SSP) age. In contrast, the structural observables surface mass density, rotation-to-dispersion ratio, Sérsic index and projected shape all show little or no significant correlation. We connect the FP residuals to the empirical relation between age (or stellar mass-to-light ratio $Υ_\star$) and surface mass density, the best predictor of SSP age amongst parameters based on FP observables. We show that the FP residuals (anti-)correlate with the residuals of the relation between surface density and $Υ_\star$. This correlation implies that part of the FP scatter is due to the broad age and $Υ_\star$ distribution at any given surface mass density. Using virial mass and $Υ_\star$ we construct a simulated FP and compare it to the observed FP. We find that, while the empirical relations between observed stellar population relations and FP observables are responsible for most (75%) of the FP scatter, on their own they do not explain the observed tilt of the FP away from the virial plane. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: 36 pages, 23 figures

arXiv:2104.06901 [pdf, other]

Enhancing Interpretable Clauses Semantically using Pretrained Word Representation

Authors: Rohan Kumar Yadav, Lei Jiao, Ole-Christoffer Granmo, Morten Goodwin

Abstract: Tsetlin Machine (TM) is an interpretable pattern recognition algorithm based on propositional logic, which has demonstrated competitive performance in many Natural Language Processing (NLP) tasks, including sentiment analysis, text classification, and Word Sense Disambiguation. To obtain human-level interpretability, legacy TM employs Boolean input features such as bag-of-words (BOW). However, the… ▽ More Tsetlin Machine (TM) is an interpretable pattern recognition algorithm based on propositional logic, which has demonstrated competitive performance in many Natural Language Processing (NLP) tasks, including sentiment analysis, text classification, and Word Sense Disambiguation. To obtain human-level interpretability, legacy TM employs Boolean input features such as bag-of-words (BOW). However, the BOW representation makes it difficult to use any pre-trained information, for instance, word2vec and GloVe word representations. This restriction has constrained the performance of TM compared to deep neural networks (DNNs) in NLP. To reduce the performance gap, in this paper, we propose a novel way of using pre-trained word representations for TM. The approach significantly enhances the performance and interpretability of TM. We achieve this by extracting semantically related words from pre-trained word representations as input features to the TM. Our experiments show that the accuracy of the proposed approach is significantly higher than the previous BOW-based TM, reaching the level of DNN-based models. △ Less

Submitted 10 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: BlackboxNLP 2021

Showing 1–50 of 174 results for author: Goodwin, M