subscribe to arXiv mailings

Segmentation of Prostate Tumour Volumes from PET Images is a Different Ball Game

Authors: Shrajan Bhandary, Dejan Kuhn, Zahra Babaiee, Tobias Fechter, Simon K. B. Spohn, Constantinos Zamboglou, Anca-Ligia Grosu, Radu Grosu

Abstract: Accurate segmentation of prostate tumours from PET images presents a formidable challenge in medical image analysis. Despite considerable work and improvement in delineating organs from CT and MR modalities, the existing standards do not transfer well and produce quality results in PET related tasks. Particularly, contemporary methods fail to accurately consider the intensity-based scaling applied… ▽ More Accurate segmentation of prostate tumours from PET images presents a formidable challenge in medical image analysis. Despite considerable work and improvement in delineating organs from CT and MR modalities, the existing standards do not transfer well and produce quality results in PET related tasks. Particularly, contemporary methods fail to accurately consider the intensity-based scaling applied by the physicians during manual annotation of tumour contours. In this paper, we observe that the prostate-localised uptake threshold ranges are beneficial for suppressing outliers. Therefore, we utilize the intensity threshold values, to implement a new custom-feature-clipping normalisation technique. We evaluate multiple, established U-Net variants under different normalisation schemes, using the nnU-Net framework. All models were trained and tested on multiple datasets, obtained with two radioactive tracers: [68-Ga]Ga-PSMA-11 and [18-F]PSMA-1007. Our results show that the U-Net models achieve much better performance when the PET scans are preprocessed with our novel clipping technique. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.01650 [pdf, other]

Accurate measurement of telescope filter bandpasses with a Collimated Beam Projector and impact on cosmological parameters

Authors: Jérémy Neveu, Dylan Kuhn, Thierry Souverin, LEMAITRE collaboration

Abstract: The measurement of magnitudes with different filters in photometric surveys gives access to cosmological distances and parameters. However, for current and future large surveys like the ZTF, DES, HSC or LSST, the photometric calibration uncertainties are almost comparable to statistical uncertainties in the error budget of type Ia cosmology analysis, which limits our ability to use type Ia superno… ▽ More The measurement of magnitudes with different filters in photometric surveys gives access to cosmological distances and parameters. However, for current and future large surveys like the ZTF, DES, HSC or LSST, the photometric calibration uncertainties are almost comparable to statistical uncertainties in the error budget of type Ia cosmology analysis, which limits our ability to use type Ia supernovae for precision cosmology. The knowledge of the bandpasses of the survey filters at the per-mill level can help reach the sub-percent precision for magnitudes. We show how a misknowledge of the bandpasses central wavelengths or of the presence of out-of-band leakages leads to biased cosmological measurements. Then, we present how to measure the filter throughputs at the required precision with a Collimated Beam Projector. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 2 pages, 2 figures, contribution to the 2024 Cosmology session of the 58th Rencontres de Moriond

arXiv:2406.13579 [pdf]

Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data

Authors: Michael Doell, Dominik Kuehn, Vanessa Suessle, Matthew J. Burnett, Colleen T. Downs, Andreas Weinmann, Elke Hergenroether

Abstract: Analyses for biodiversity monitoring based on passive acoustic monitoring (PAM) recordings is time-consuming and challenged by the presence of background noise in recordings. Existing models for sound event detection (SED) worked only on certain avian species and the development of further models required labeled data. The developed framework automatically extracted labeled data from available pla… ▽ More Analyses for biodiversity monitoring based on passive acoustic monitoring (PAM) recordings is time-consuming and challenged by the presence of background noise in recordings. Existing models for sound event detection (SED) worked only on certain avian species and the development of further models required labeled data. The developed framework automatically extracted labeled data from available platforms for selected avian species. The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models. The models were evaluated on unprocessed real world data recorded in urban KwaZulu-Natal habitats. The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions. The proposed approach to automatically extract labeled data for chosen avian species enables an easy adaption of PAM to other species and habitats for future conservation projects. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: preprint

Journal ref: International Conferences in Central Europe on Computer Graphics, Visualization and Computer Vision 2024

arXiv:2406.02073 [pdf, other]

ZTF SN Ia DR2: Study of Type Ia Supernova lightcurve fits

Authors: M. Rigault, M. Smith, N. Regnault, D. W. Kenworthy, K. Maguire, A. Goobar, G. Dimitriadis, M. Amenouche, M. Aubert, C. Barjou-Delayre, C. E. Bellm, U. Burgaz, B. Carreres, Y. Copin, M. Deckers, T. de Jaeger, S. Dhawan, F. Feinstein, D. Fouchez, L. Galbany, M. Ginolin, J. M. Graham, Y. -L. Kim, M. Kowalski, D. Kuhn , et al. (12 additional authors not shown)

Abstract: Type Ia supernova (SN Ia) cosmology relies on the estimation of lightcurve parameters to derive precision distances that leads to the estimation of cosmological parameters. The empirical SALT2 lightcurve modeling that relies on only two parameters, a stretch x1, and a color c, has been used by the community for almost two decades. In this paper we study the ability of the SALT2 model to fit the ne… ▽ More Type Ia supernova (SN Ia) cosmology relies on the estimation of lightcurve parameters to derive precision distances that leads to the estimation of cosmological parameters. The empirical SALT2 lightcurve modeling that relies on only two parameters, a stretch x1, and a color c, has been used by the community for almost two decades. In this paper we study the ability of the SALT2 model to fit the nearly 3000 cosmology-grade SN Ia lightcurves from the second release of the Zwicky Transient Facility (ZTF) cosmology science working group. While the ZTF data was not used to train SALT2, the algorithm is modeling the ZTF SN Ia optical lightcurves remarkably well, except for lightcurve points prior to -10 d from maximum, where the training critically lacks statistics. We find that the lightcurve fitting is robust against the considered choice of phase-range, but we show the [-10; +40] d range to be optimal in terms of statistics and accuracy. We do not detect any significant features in the lightcurve fit residuals that could be connected to the host environment. Potential systematic population differences related to the SN Ia host properties might thus not be accountable for by the addition of extra lightcurve parameters. However, a small but significant inconsistency between residuals of blue- and red-SN Ia strongly suggests the existence of a phase-dependent color term, with potential implications for the use of SNe Ia in precision cosmology. We thus encourage modellers to explore this avenue and we emphasize the importance that SN Ia cosmology must include a SALT2 retraining to accurately model the lightcurves and avoid biasing the derivation of cosmological parameters. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 10 pages, 9 figures. Submitted to Astronomy and Astrophysics

arXiv:2406.02072 [pdf, other]

ZTF SN Ia DR2: Colour standardisation of Type Ia Supernovae and its dependence on environment

Authors: M. Ginolin, M. Rigault, Y. Copin, B. Popovic, G. Dimitriadis, A. Goobar, J. Johansson, K. Maguire, J. Nordin, M. Smith, M. Aubert, C. Barjou-Delayre, U. Burgaz, B. Carreres, S. Dhawan, M. Deckers, F. Feinstein, D. Fouchez, L. Galbany, C. Ganot, T. de Jaeger, Y. -L. Kim, D. Kuhn, L. Lacroix, T. E. Müller-Bravo , et al. (15 additional authors not shown)

Abstract: As Type Ia supernova cosmology transitions from a statistics dominated to a systematics dominated era, it is crucial to understand leftover unexplained uncertainties affecting their luminosity, such as the ones stemming from astrophysical biases. Indeed, SNe Ia are standardisable candles, whose absolute magnitude reach a 0.15~mag scatter once empirical correlations with their lightcurve stretch an… ▽ More As Type Ia supernova cosmology transitions from a statistics dominated to a systematics dominated era, it is crucial to understand leftover unexplained uncertainties affecting their luminosity, such as the ones stemming from astrophysical biases. Indeed, SNe Ia are standardisable candles, whose absolute magnitude reach a 0.15~mag scatter once empirical correlations with their lightcurve stretch and colour and with their environment are accounted for. In this paper, we investigate how the standardisation process of SNe Ia depends on environment, to ultimately reduce their scatter in magnitude, focusing on colour standardisation. We use the volume-limited ZTF SN Ia DR2 sample, which offers unprecedented statistics for the low redshift ($z<0.06$) range. We first study the colour distribution, focusing on the effects of dust, to then select a dustless subsample of objects from low stellar mass environments and from the outskirts of their host galaxies. We then look at the colour-residuals relation and its associated parameter $β$. Finally, we investigate the colour dependency of the environment-dependent magnitude offsets (steps), to try to disentangle intrinsic and extrinsic colour origin. Our sample probes well the red tail of the colour distribution, up to $c=0.8$. The dustless sample exhibits a significantly lower red tail ($4.6σ$) in comparison to the whole sample. This suggests that reddening above $c\geq0.2$ is dominated by host interstellar dust absorption. Looking at the colour-residuals relation, we find it to be linear with lightcurve colour. We show hints of a potential evolution of $β$ with host stellar mass at a $2.5σ$ level. Finally, unlike recent claims from the literature, we see no evolution of steps as a function of lightcurve colour, suggesting that dust may not be the dominating mechanism responsible for the environmental dependency of SNe Ia magnitude. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 10 pages, 7 figures, submitted to Astronomy and Astrophysics

arXiv:2405.20965 [pdf, other]

ZTF SN Ia DR2: Environmental dependencies of stretch and luminosity of a volume limited sample of 1,000 Type Ia Supernovae

Authors: M. Ginolin, M. Rigault, M. Smith, Y. Copin, F. Ruppin, G. Dimitriadis, A. Goobar, J. Johansson, K. Maguire, J. Nordin, M. Amenouche, M. Aubert, C. Barjou-Delayre, M. Betoule, U. Burgaz, B. Carreres, M. Deckers, S. Dhawan, F. Feinstein, D. Fouchez, L. Galbany, C. Ganot, L. Harvey, T. de Jaeger, W. D. Kenworthy , et al. (21 additional authors not shown)

Abstract: To get distances, Type Ia Supernovae magnitudes are corrected for their correlation with lightcurve width and colour. Here we investigate how this standardisation is affected by the SN environment, with the aim to reduce scatter and improve standardisation. We first study the SN Ia stretch distribution, as well as its dependence on environment, as characterised by local and global (g-z) colour and… ▽ More To get distances, Type Ia Supernovae magnitudes are corrected for their correlation with lightcurve width and colour. Here we investigate how this standardisation is affected by the SN environment, with the aim to reduce scatter and improve standardisation. We first study the SN Ia stretch distribution, as well as its dependence on environment, as characterised by local and global (g-z) colour and stellar mass. We then look at the standardisation parameter $α$, which accounts for the correlation between residuals and stretch, along with its environment dependence and linearity. We finally compute magnitude offsets between SNe in different astrophysical environments after colour and stretch standardisation, aka steps. This analysis is made possible due to the unprecedented statistics of the ZTF SN Ia DR2 volume-limited sample. The stretch distribution exhibits a bimodal behaviour, as previously found in literature. However, we find the distribution means to decrease with host stellar mass at a 9.0$σ$ significance. We demonstrate, at the 14.3$σ$ level, that the stretch-magnitude relation is non-linear, challenging the usual linear stretch-residuals relation. Fitting for a broken-$α$ model, we indeed find two different slopes between stretch regimes ($x_1<-0.49\pm0.06$): $α_{low}=0.28\pm0.01$ and $α_{high}=0.09\pm0.01$, a $Δ_α=-0.19\pm0.01$ difference. As the relative proportion of SNe Ia in the high-/low-stretch modes evolves with redshift and environment, this implies that a linear $α$ also evolves with redshift and environment. Concerning the environmental magnitude offset $γ$, we find it to be greater than 0.14 mag regardless of the considered environmental tracer used (local or global colour and stellar mass), all measured at the $\geq 6σ$ level, increased to $\sim0.18\pm0.01$ mag when accounting for the stretch-non linearity. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures, submitted to Astronomy and Astrophysics

arXiv:2405.20409 [pdf, other]

ZTF SN Ia DR2: Peculiar velocities impact on the Hubble diagram

Authors: B. Carreres, D. Rosselli, J. E. Bautista, F. Feinstein, D. Fouchez, B. Racine, C. Ravoux, B. Sanchez, G. Dimitriadis, A. Goobar, J. Johansson, J. Nordin, M. Rigault, M. Smith, M. Amenouche, M. Aubert, C. Barjou-Delayre, U. Burgaz, W. D'Arcy Kenworthy, T. De Jaeger, S. Dhawan, L. Galbany, M. Ginolin, D. Kuhn, M. Kowalski , et al. (13 additional authors not shown)

Abstract: SNe Ia are used to determine the distance-redshift relation and build the Hubble diagram. Neglecting their host-galaxy peculiar velocities (PVs) may bias the measurement of cosmological parameters. The smaller the redshift, the larger the effect is. We use realistic simulations of SNe Ia observed by the Zwicky Transient Facility (ZTF) to investigate the effect of different methods to take into acc… ▽ More SNe Ia are used to determine the distance-redshift relation and build the Hubble diagram. Neglecting their host-galaxy peculiar velocities (PVs) may bias the measurement of cosmological parameters. The smaller the redshift, the larger the effect is. We use realistic simulations of SNe Ia observed by the Zwicky Transient Facility (ZTF) to investigate the effect of different methods to take into account PVs. We study the impact of neglecting galaxy PVs and their correlations in an analysis of the SNe Ia Hubble diagram. We find that it is necessary to use the PV full covariance matrix computed from the velocity power spectrum to take into account the sample variance. Considering the results we have obtained using simulations, we determine the PV systematic effects in the context of the ZTF DR2 SNe Ia sample. We determine the PV impact on the intercept of the Hubble diagram, $a_B$, which is directly linked to the measurement of $H_0$. We show that not taking into account PVs and their correlations results in a shift of the $H_0$ value of about $1.0$km.s$^{-1}$.Mpc$^{-1}$ and a slight underestimation of the $H_0$ error bar. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 12 pages, 4 figures

arXiv:2405.20124 [pdf, other]

A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

Authors: Man-Chung Yue, Yves Rychener, Daniel Kuhn, Viet Anh Nguyen

Abstract: The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a pri… ▽ More The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a principled approach to construct covariance estimators without imposing restrictive assumptions. That is, we study distributionally robust covariance estimation problems that minimize the worst-case Frobenius error with respect to all data distributions close to a nominal distribution, where the proximity of distributions is measured via a divergence on the space of covariance matrices. We identify mild conditions on this divergence under which the resulting minimizers represent shrinkage estimators. We show that the corresponding shrinkage transformations are intimately related to the geometrical properties of the underlying divergence. We also prove that our robust estimators are efficiently computable and asymptotically consistent and that they enjoy finite-sample performance guarantees. We exemplify our general methodology by synthesizing explicit estimators induced by the Kullback-Leibler, Fisher-Rao, and Wasserstein divergences. Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2401.03913 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447922

A Wasserstein Graph Distance Based on Distributions of Probabilistic Node Embeddings

Authors: Michael Scholkemper, Damin Kühn, Gerion Nabbefeld, Simon Musall, Björn Kampa, Michael T. Schaub

Abstract: Distance measures between graphs are important primitives for a variety of learning tasks. In this work, we describe an unsupervised, optimal transport based approach to define a distance between graphs. Our idea is to derive representations of graphs as Gaussian mixture models, fitted to distributions of sampled node embeddings over the same space. The Wasserstein distance between these Gaussian… ▽ More Distance measures between graphs are important primitives for a variety of learning tasks. In this work, we describe an unsupervised, optimal transport based approach to define a distance between graphs. Our idea is to derive representations of graphs as Gaussian mixture models, fitted to distributions of sampled node embeddings over the same space. The Wasserstein distance between these Gaussian mixture distributions then yields an interpretable and easily computable distance measure, which can further be tailored for the comparison at hand by choosing appropriate embeddings. We propose two embeddings for this framework and show that under certain assumptions about the shape of the resulting Gaussian mixture components, further computational improvements of this Wasserstein distance can be achieved. An empirical validation of our findings on synthetic data and real-world Functional Brain Connectivity networks shows promising performance compared to existing embedding methods. △ Less

Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2311.07411 [pdf, ps, other]

A Large Deviations Perspective on Policy Gradient Algorithms

Authors: Wouter Jongeneel, Daniel Kuhn, Mengmeng Li

Abstract: Motivated by policy gradient methods in the context of reinforcement learning, we identify a large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence prop… ▽ More Motivated by policy gradient methods in the context of reinforcement learning, we identify a large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective can be naturally extended to a wide spectrum of other policy parametrizations. △ Less

Submitted 3 June, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: v3; comments are welcome

MSC Class: 60F10; 90C26

arXiv:2310.18535 [pdf, other]

Contextual Stochastic Bilevel Optimization

Authors: Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn

Abstract: We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the u… ▽ More We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds. For meta-learning, the complexity of our method does not depend on the number of tasks. Numerical experiments further validate our theoretical results. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: The paper is accepted by NeurIPS 2023

arXiv:2308.05414 [pdf, other]

Unifying Distributionally Robust Optimization via Optimal Transport Theory

Authors: Jose Blanchet, Daniel Kuhn, Jiajin Li, Bahar Taskesen

Abstract: In the past few years, there has been considerable interest in two prominent approaches for Distributionally Robust Optimization (DRO): Divergence-based and Wasserstein-based methods. The divergence approach models misspecification in terms of likelihood ratios, while the latter models it through a measure of distance or cost in actual outcomes. Building upon these advances, this paper introduces… ▽ More In the past few years, there has been considerable interest in two prominent approaches for Distributionally Robust Optimization (DRO): Divergence-based and Wasserstein-based methods. The divergence approach models misspecification in terms of likelihood ratios, while the latter models it through a measure of distance or cost in actual outcomes. Building upon these advances, this paper introduces a novel approach that unifies these methods into a single framework based on optimal transport (OT) with conditional moment constraints. Our proposed approach, for example, makes it possible for optimal adversarial distributions to simultaneously perturb likelihood and outcomes, while producing an optimal (in an optimal transport sense) coupling between the baseline model and the adversarial model.Additionally, the paper investigates several duality results and presents tractable reformulations that enhance the practical applicability of this unified framework. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2306.04174 [pdf, other]

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Authors: Yves Rychener, Daniel Kuhn, Tobias Sutter

Abstract: We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and d… ▽ More We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance. △ Less

Submitted 11 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted at ICML 2023

arXiv:2306.02987 [pdf, other]

doi 10.1016/j.ejor.2024.03.022

Frequency Regulation with Storage: On Losses and Profits

Authors: Dirk Lauinger, François Vuille, Daniel Kuhn

Abstract: Low-carbon societies will need to store vast amounts of electricity to balance intermittent generation from wind and solar energy, for example, through frequency regulation. Here, we derive an analytical solution to the decision-making problem of storage operators who sell frequency regulation power to grid operators and trade electricity on day-ahead markets. Mathematically, we treat future frequ… ▽ More Low-carbon societies will need to store vast amounts of electricity to balance intermittent generation from wind and solar energy, for example, through frequency regulation. Here, we derive an analytical solution to the decision-making problem of storage operators who sell frequency regulation power to grid operators and trade electricity on day-ahead markets. Mathematically, we treat future frequency deviation trajectories as functional uncertainties in a receding horizon robust optimization problem. We constrain the expected terminal state-of-charge to be equal to some target to allow storage operators to make good decisions not only for the present but also the future. Thanks to this constraint, the amount of electricity traded on day-ahead markets is an implicit function of the regulation power sold to grid operators. The implicit function quantifies the amount of power that needs to be purchased to cover the expected energy loss that results from providing frequency regulation. We show how the marginal cost associated with the expected energy loss decreases with roundtrip efficiency and increases with frequency deviation dispersion. We find that the profits from frequency regulation over the lifetime of energy-constrained storage devices are roughly inversely proportional to the length of time for which regulation power must be committed. △ Less

Submitted 26 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.19004 [pdf, ps, other]

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Authors: Mengmeng Li, Daniel Kuhn, Tobias Sutter

Abstract: We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be s… ▽ More We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. We first present a randomized projected Langevin dynamics algorithm that solves the robust policy evaluation problem to global optimality but is inefficient. We also propose a deterministic policy gradient method that is efficient but solves the robust policy evaluation problem only approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Finally, we describe an actor-critic algorithm that finds an $ε$-optimal solution for the robust policy improvement problem in $\mathcal{O}(1/ε^4)$ iterations. We thus present the first complete solution scheme for robust MDPs with non-rectangular uncertainty sets offering global optimality guarantees. Numerical experiments show that our algorithms compare favorably against state-of-the-art methods. △ Less

Submitted 23 January, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: comments are welcome

MSC Class: 90C17; 90C26

arXiv:2305.17037 [pdf, other]

Distributionally Robust Linear Quadratic Control

Authors: Bahar Taşkesen, Dan A. Iancu, Çağıl Koçyiğit, Daniel Kuhn

Abstract: Linear-Quadratic-Gaussian (LQG) control is a fundamental control paradigm that is studied in various fields such as engineering, computer science, economics, and neuroscience. It involves controlling a system with linear dynamics and imperfect observations, subject to additive noise, with the goal of minimizing a quadratic cost function for the state and control variables. In this work, we conside… ▽ More Linear-Quadratic-Gaussian (LQG) control is a fundamental control paradigm that is studied in various fields such as engineering, computer science, economics, and neuroscience. It involves controlling a system with linear dynamics and imperfect observations, subject to additive noise, with the goal of minimizing a quadratic cost function for the state and control variables. In this work, we consider a generalization of the discrete-time, finite-horizon LQG problem, where the noise distributions are unknown and belong to Wasserstein ambiguity sets centered at nominal (Gaussian) distributions. The objective is to minimize a worst-case cost across all distributions in the ambiguity set, including non-Gaussian distributions. Despite the added complexity, we prove that a control policy that is linear in the observations is optimal for this problem, as in the classic LQG problem. We propose a numerical solution method that efficiently characterizes this optimal control policy. Our method uses the Frank-Wolfe algorithm to identify the least-favorable distributions within the Wasserstein ambiguity sets and computes the controller's optimal policy using Kalman filter estimation under these distributions. △ Less

Submitted 1 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.09481 [pdf, other]

Context-enriched molecule representations improve few-shot drug discovery

Authors: Johannes Schimunek, Philipp Seidl, Lukas Friedrich, Daniel Kuhn, Friedrich Rippmann, Sepp Hochreiter, Günter Klambauer

Abstract: A central task in computational drug discovery is to construct models from known active molecules to find further promising molecules for subsequent screening. However, typically only very few active molecules are known. Therefore, few-shot learning methods have the potential to improve the effectiveness of this critical phase of the drug discovery process. We introduce a new method for few-shot d… ▽ More A central task in computational drug discovery is to construct models from known active molecules to find further promising molecules for subsequent screening. However, typically only very few active molecules are known. Therefore, few-shot learning methods have the potential to improve the effectiveness of this critical phase of the drug discovery process. We introduce a new method for few-shot drug discovery. Its main idea is to enrich a molecule representation by knowledge about known context or reference molecules. Our novel concept for molecule representation enrichment is to associate molecules from both the support set and the query set with a large set of reference (context) molecules through a Modern Hopfield Network. Intuitively, this enrichment step is analogous to a human expert who would associate a given molecule with familiar molecules whose properties are known. The enrichment step reinforces and amplifies the covariance structure of the data, while simultaneously removing spurious correlations arising from the decoration of molecules. Our approach is compared with other few-shot methods for drug discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms all compared methods and therefore sets a new state-of-the art for few-shot learning in drug discovery. An ablation study shows that the enrichment step of our method is the key to improve the predictive quality. In a domain shift experiment, we further demonstrate the robustness of our method. Code is available at https://github.com/ml-jku/MHNfs. △ Less

Submitted 24 April, 2023; originally announced May 2023.

arXiv:2304.00290 [pdf, other]

PIQP: A Proximal Interior-Point Quadratic Programming Solver

Authors: Roland Schwan, Yuning Jiang, Daniel Kuhn, Colin N. Jones

Abstract: This paper presents PIQP, a high-performance toolkit for solving generic sparse quadratic programs (QP). Combining an infeasible Interior Point Method (IPM) with the Proximal Method of Multipliers (PMM), the algorithm can handle ill-conditioned convex QP problems without the need for linear independence of the constraints. The open-source implementation is written in C++ with interfaces to C, Pyth… ▽ More This paper presents PIQP, a high-performance toolkit for solving generic sparse quadratic programs (QP). Combining an infeasible Interior Point Method (IPM) with the Proximal Method of Multipliers (PMM), the algorithm can handle ill-conditioned convex QP problems without the need for linear independence of the constraints. The open-source implementation is written in C++ with interfaces to C, Python, Matlab, and R leveraging the Eigen3 library. The method uses a pivoting-free factorization routine and allocation-free updates of the problem data, making the solver suitable for embedded applications. The solver is evaluated on the Maros-Mészáros problem set and optimal control problems, demonstrating state-of-the-art performance for both small and large-scale problems, outperforming commercial and open-source solvers. △ Less

Submitted 15 September, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.03900 [pdf, other]

New Perspectives on Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

Authors: Soroosh Shafieezadeh-Abadeh, Liviu Aolaritei, Florian Dörfler, Daniel Kuhn

Abstract: We study optimal transport-based distributionally robust optimization problems where a fictitious adversary, often envisioned as nature, can choose the distribution of the uncertain problem parameters by reshaping a prescribed reference distribution at a finite transportation cost. In this framework, we show that robustification is intimately related to various forms of variation and Lipschitz reg… ▽ More We study optimal transport-based distributionally robust optimization problems where a fictitious adversary, often envisioned as nature, can choose the distribution of the uncertain problem parameters by reshaping a prescribed reference distribution at a finite transportation cost. In this framework, we show that robustification is intimately related to various forms of variation and Lipschitz regularization even if the transportation cost function fails to be (some power of) a metric. We also derive conditions for the existence and the computability of a Nash equilibrium between the decision-maker and nature, and we demonstrate numerically that nature's Nash strategy can be viewed as a distribution that is supported on remarkably deceptive adversarial samples. Finally, we identify practically relevant classes of optimal transport-based distributionally robust optimization problems that can be addressed with efficient gradient descent algorithms even if the loss function or the transportation cost function are nonconvex (but not both at the same time). △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2211.15122 [pdf, other]

Distributionally Robust Optimal Allocation with Costly Verification

Authors: Halil İbrahim Bayrak, Çağıl Koçyiğit, Daniel Kuhn, Mustafa Çelebi Pınar

Abstract: We consider the mechanism design problem of a principal allocating a single good to one of several agents without monetary transfers. Each agent desires the good and uses it to create value for the principal. We designate this value as the agent's private type. Even though the principal does not know the agents' types, she can verify them at a cost. The allocation of the good thus depends on the a… ▽ More We consider the mechanism design problem of a principal allocating a single good to one of several agents without monetary transfers. Each agent desires the good and uses it to create value for the principal. We designate this value as the agent's private type. Even though the principal does not know the agents' types, she can verify them at a cost. The allocation of the good thus depends on the agents' self-declared types and the results of any verification performed, and the principal's payoff matches her value of the allocation minus the costs of verification. It is known that if the agents' types are independent, then a favored-agent mechanism maximizes her expected payoff. However, this result relies on the unrealistic assumptions that the agents' types follow known independent probability distributions. In contrast, we assume here that the agents' types are governed by an ambiguous joint probability distribution belonging to a commonly known ambiguity set and that the principal maximizes her worst-case expected payoff. We study support-only ambiguity sets, which contain all distributions supported on a rectangle, Markov ambiguity sets, which contain all distributions in a support-only ambiguity set satisfying some first-order moment bounds, and Markov ambiguity sets with independent types, which contain all distributions in a Markov ambiguity set under which the agents' types are mutually independent. In all cases we construct explicit favored-agent mechanisms that are not only optimal but also Pareto-robustly optimal. △ Less

Submitted 25 June, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.01325 [pdf, ps, other]

Perfect matchings in random sparsifications of Dirac hypergraphs

Authors: Dong Yeap Kang, Tom Kelly, Daniela Kühn, Deryk Osthus, Vincent Pfenninger

Abstract: For all integers $n \geq k > d \geq 1$, let $m_{d}(k,n)$ be the minimum integer $D \geq 0$ such that every $k$-uniform $n$-vertex hypergraph $\mathcal H$ with minimum $d$-degree $δ_{d}(\mathcal H)$ at least $D$ has an optimal matching. For every fixed integer $k \geq 3$, we show that for $n \in k \mathbb{N}$ and $p = Ω(n^{-k+1} \log n)$, if $\mathcal H$ is an $n$-vertex $k$-uniform hypergraph with… ▽ More For all integers $n \geq k > d \geq 1$, let $m_{d}(k,n)$ be the minimum integer $D \geq 0$ such that every $k$-uniform $n$-vertex hypergraph $\mathcal H$ with minimum $d$-degree $δ_{d}(\mathcal H)$ at least $D$ has an optimal matching. For every fixed integer $k \geq 3$, we show that for $n \in k \mathbb{N}$ and $p = Ω(n^{-k+1} \log n)$, if $\mathcal H$ is an $n$-vertex $k$-uniform hypergraph with $δ_{k-1}(\mathcal H) \geq m_{k-1}(k,n)$, then a.a.s.\ its $p$-random subhypergraph $\mathcal H_p$ contains a perfect matching. Moreover, for every fixed integer $d < k$ and $γ> 0$, we show that the same conclusion holds if $\mathcal H$ is an $n$-vertex $k$-uniform hypergraph with $δ_d(\mathcal H) \geq m_{d}(k,n) + γ\binom{n - d}{k - d}$. Both of these results strengthen Johansson, Kahn, and Vu's seminal solution to Shamir's problem and can be viewed as ``robust'' versions of hypergraph Dirac-type results. In addition, we also show that in both cases above, $\mathcal H$ has at least $\exp((1-1/k)n \log n - Θ(n))$ many perfect matchings, which is best possible up to an $\exp(Θ(n))$ factor. △ Less

Submitted 16 April, 2024; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Final version, to appear in Combinatorica (26 pages + 2 page appendix); Theorem 1.5 was proved in independent work of Pham, Sah, Sawhney, and Simkin (arxiv:2210.03064)

arXiv:2206.14472 [pdf, other]

Thresholds for Latin squares and Steiner triple systems: Bounds within a logarithmic factor

Authors: Dong Yeap Kang, Tom Kelly, Daniela Kühn, Abhishek Methuku, Deryk Osthus

Abstract: We prove that for $n \in \mathbb N$ and an absolute constant $C$, if $p \geq C\log^2 n / n$ and $L_{i,j} \subseteq [n]$ is a random subset of $[n]$ where each $k\in [n]$ is included in $L_{i,j}$ independently with probability $p$ for each $i, j\in [n]$, then asymptotically almost surely there is an order-$n$ Latin square in which the entry in the $i$th row and $j$th column lies in $L_{i,j}$. The p… ▽ More We prove that for $n \in \mathbb N$ and an absolute constant $C$, if $p \geq C\log^2 n / n$ and $L_{i,j} \subseteq [n]$ is a random subset of $[n]$ where each $k\in [n]$ is included in $L_{i,j}$ independently with probability $p$ for each $i, j\in [n]$, then asymptotically almost surely there is an order-$n$ Latin square in which the entry in the $i$th row and $j$th column lies in $L_{i,j}$. The problem of determining the threshold probability for the existence of an order-$n$ Latin square was raised independently by Johansson, by Luria and Simkin, and by Casselgren and H{ä}ggkvist; our result provides an upper bound which is tight up to a factor of $\log n$ and strengthens the bound recently obtained by Sah, Sawhney, and Simkin. We also prove analogous results for Steiner triple systems and $1$-factorizations of complete graphs, and moreover, we show that each of these thresholds is at most the threshold for the existence of a $1$-factorization of a nearly complete regular bipartite graph. △ Less

Submitted 26 March, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: 32 pages, 1 figure. Final version, to appear in Transactions of the AMS

arXiv:2206.13374 [pdf, other]

Stability Verification of Neural Network Controllers using Mixed-Integer Programming

Authors: Roland Schwan, Colin N. Jones, Daniel Kuhn

Abstract: We propose a framework for the stability verification of Mixed-Integer Linear Programming (MILP) representable control policies. This framework compares a fixed candidate policy, which admits an efficient parameterization and can be evaluated at a low computational cost, against a fixed baseline policy, which is known to be stable but expensive to evaluate. We provide sufficient conditions for the… ▽ More We propose a framework for the stability verification of Mixed-Integer Linear Programming (MILP) representable control policies. This framework compares a fixed candidate policy, which admits an efficient parameterization and can be evaluated at a low computational cost, against a fixed baseline policy, which is known to be stable but expensive to evaluate. We provide sufficient conditions for the closed-loop stability of the candidate policy in terms of the worst-case approximation error with respect to the baseline policy, and we show that these conditions can be checked by solving a Mixed-Integer Quadratic Program (MIQP). Additionally, we demonstrate that an outer and inner approximation of the stability region of the candidate policy can be computed by solving an MILP. The proposed framework is sufficiently general to accommodate a broad range of candidate policies including ReLU Neural Networks (NNs), optimal solution maps of parametric quadratic programs, and Model Predictive Control (MPC) policies. We also present an open-source toolbox in Python based on the proposed framework, which allows for the easy verification of custom NN architectures and MPC formulations. We showcase the flexibility and reliability of our framework in the context of a DC-DC power converter case study and investigate its computational complexity. △ Less

Submitted 31 May, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.00231 [pdf, other]

On Approximations of Data-Driven Chance Constrained Programs over Wasserstein Balls

Authors: Zhi Chen, Daniel Kuhn, Wolfram Wiesemann

Abstract: Distributionally robust chance constrained programs minimize a deterministic cost function subject to the satisfaction of one or more safety conditions with high probability, given that the probability distribution of the uncertain problem parameters affecting the safety condition(s) is only known to belong to some ambiguity set. We study three popular approximation schemes for distributionally ro… ▽ More Distributionally robust chance constrained programs minimize a deterministic cost function subject to the satisfaction of one or more safety conditions with high probability, given that the probability distribution of the uncertain problem parameters affecting the safety condition(s) is only known to belong to some ambiguity set. We study three popular approximation schemes for distributionally robust chance constrained programs over Wasserstein balls, where the ambiguity set contains all probability distributions within a certain Wasserstein distance to a reference distribution. The first approximation replaces the chance constraint with a bound on the conditional value-at-risk, the second approximation decouples different safety conditions via Bonferroni's inequality, and the third approximation restricts the expected violation of the safety condition(s) so that the chance constraint is satisfied. We show that the conditional value-at-risk approximation can be characterized as a tight convex approximation, which complements earlier findings on classical (non-robust) chance constraints, and we offer a novel interpretation in terms of transportation savings. We also show that the three approximations can perform arbitrarily poorly in data-driven settings, and that they are generally incomparable with each other. △ Less

Submitted 20 November, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:1809.00210

arXiv:2205.15049 [pdf, other]

Metrizing Fairness

Authors: Yves Rychener, Bahar Taskesen, Daniel Kuhn

Abstract: We study supervised learning problems that have significant effects on individuals from two demographic groups, and we seek predictors that are fair with respect to a group fairness criterion such as statistical parity (SP). A predictor is SP-fair if the distributions of predictions within the two groups are close in Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of… ▽ More We study supervised learning problems that have significant effects on individuals from two demographic groups, and we seek predictors that are fair with respect to a group fairness criterion such as statistical parity (SP). A predictor is SP-fair if the distributions of predictions within the two groups are close in Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we identify conditions under which hard SP constraints are guaranteed to improve predictive accuracy. We also showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators, which are constructed from random mini-batches of training samples, if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on synthetic and real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster. △ Less

Submitted 11 June, 2024; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2203.01161 [pdf, ps, other]

Discrete Optimal Transport with Independent Marginals is #P-Hard

Authors: Bahar Taşkesen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Karthik Natarajan

Abstract: We study the computational complexity of the optimal transport problem that evaluates the Wasserstein distance between the distributions of two K-dimensional discrete random vectors. The best known algorithms for this problem run in polynomial time in the maximum of the number of atoms of the two distributions. However, if the components of either random vector are independent, then this number ca… ▽ More We study the computational complexity of the optimal transport problem that evaluates the Wasserstein distance between the distributions of two K-dimensional discrete random vectors. The best known algorithms for this problem run in polynomial time in the maximum of the number of atoms of the two distributions. However, if the components of either random vector are independent, then this number can be exponential in K even though the size of the problem description scales linearly with K. We prove that the described optimal transport problem is #P-hard even if all components of the first random vector are independent uniform Bernoulli random variables, while the second random vector has merely two atoms, and even if only approximate solutions are sought. We also develop a dynamic programming-type algorithm that approximates the Wasserstein distance in pseudo-polynomial time when the components of the first random vector follow arbitrary independent discrete distributions, and we identify special problem instances that can be solved exactly in strongly polynomial time. △ Less

Submitted 14 October, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2112.09959 [pdf, other]

Mean-Covariance Robust Risk Measurement

Authors: Viet Anh Nguyen, Soroosh Shafiee, Damir Filipović, Daniel Kuhn

Abstract: We introduce a universal framework for mean-covariance robust risk measurement and portfolio optimization. We model uncertainty in terms of the Gelbrich distance on the mean-covariance space, along with prior structural information about the population distribution. Our approach is related to the theory of optimal transport and exhibits superior statistical and computational properties than existi… ▽ More We introduce a universal framework for mean-covariance robust risk measurement and portfolio optimization. We model uncertainty in terms of the Gelbrich distance on the mean-covariance space, along with prior structural information about the population distribution. Our approach is related to the theory of optimal transport and exhibits superior statistical and computational properties than existing models. We find that, for a large class of risk measures, mean-covariance robust portfolio optimization boils down to the Markowitz model, subject to a regularization term given in closed form. This includes the finance standards, value-at-risk and conditional value-at-risk, and can be solved highly efficiently. △ Less

Submitted 30 November, 2023; v1 submitted 18 December, 2021; originally announced December 2021.

arXiv:2110.06181 [pdf, ps, other]

Solution to a problem of Erdős on the chromatic index of hypergraphs with bounded codegree

Authors: Dong Yeap Kang, Tom Kelly, Daniela Kühn, Abhishek Methuku, Deryk Osthus

Abstract: In 1977, Erdős asked the following question: for any integers $t,n \in \mathbb{N}$, if $G_1 , \dots , G_n$ are complete graphs such that each $G_i$ has at most $n$ vertices and every pair of them shares at most $t$ vertices, what is the largest possible chromatic number of the union $\bigcup_{i=1}^{n} G_i$? The equivalent dual formulation of this question asks for the largest chromatic index of an… ▽ More In 1977, Erdős asked the following question: for any integers $t,n \in \mathbb{N}$, if $G_1 , \dots , G_n$ are complete graphs such that each $G_i$ has at most $n$ vertices and every pair of them shares at most $t$ vertices, what is the largest possible chromatic number of the union $\bigcup_{i=1}^{n} G_i$? The equivalent dual formulation of this question asks for the largest chromatic index of an $n$-vertex hypergraph with maximum degree at most $n$ and codegree at most $t$. For the case $t = 1$, Erdős, Faber, and Lovász famously conjectured that the answer is $n$, which was recently proved by the authors for all sufficiently large $n$. In this paper, we answer this question of Erdős for $t \geq 2$ in a strong sense, by proving that every $n$-vertex hypergraph with maximum degree at most $(1-o(1))tn$ and codegree at most $t$ has chromatic index at most $tn$ for any $t,n \in \mathbb{N}$. Moreover, equality holds if and only if the hypergraph is a $t$-fold projective plane of order $k$, where $n = k^2 + k + 1$. Thus, for every $t \in \mathbb N$, this bound is best possible for infinitely many integers $n$. This result also holds for the list chromatic index. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 23 pages

arXiv:2110.01570 [pdf, ps, other]

Hypergraph regularity and random sampling

Authors: Felix Joos, Jaehoon Kim, Daniela Kühn, Deryk Osthus

Abstract: Suppose a $k$-uniform hypergraph $H$ that satisfies a certain regularity instance (that is, there is a partition of $H$ given by the hypergraph regularity lemma into a bounded number of quasirandom subhypergraphs of prescribed densities). We prove that with high probability a large enough uniform random sample of the vertex set of $H$ also admits the same regularity instance. Here the crucial feat… ▽ More Suppose a $k$-uniform hypergraph $H$ that satisfies a certain regularity instance (that is, there is a partition of $H$ given by the hypergraph regularity lemma into a bounded number of quasirandom subhypergraphs of prescribed densities). We prove that with high probability a large enough uniform random sample of the vertex set of $H$ also admits the same regularity instance. Here the crucial feature is that the error term measuring the quasirandomness of the subhypergraphs requires only an arbitrarily small additive correction. This has applications to combinatorial property testing. The graph case of the sampling result was proved by Alon, Fischer, Newman and Shapira. △ Less

Submitted 11 August, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: 49 pages; we split our paper arXiv:1707.03303 into two, this one and the new version of arXiv:1707.03303. Final version, to appear in Random Structures and Algorithms

arXiv:2109.11438 [pdf, ps, other]

A special case of Vu's conjecture: Coloring nearly disjoint graphs of bounded maximum degree

Authors: Tom Kelly, Daniela Kühn, Deryk Osthus

Abstract: A collection of graphs is \textit{nearly disjoint} if every pair of them intersects in at most one vertex. We prove that if $G_1, \dots, G_m$ are nearly disjoint graphs of maximum degree at most $D$, then the following holds. For every fixed $C$, if each vertex $v \in \bigcup_{i=1}^m V(G_i)$ is contained in at most $C$ of the graphs $G_1, \dots, G_m$, then the (list) chromatic number of… ▽ More A collection of graphs is \textit{nearly disjoint} if every pair of them intersects in at most one vertex. We prove that if $G_1, \dots, G_m$ are nearly disjoint graphs of maximum degree at most $D$, then the following holds. For every fixed $C$, if each vertex $v \in \bigcup_{i=1}^m V(G_i)$ is contained in at most $C$ of the graphs $G_1, \dots, G_m$, then the (list) chromatic number of $\bigcup_{i=1}^m G_i$ is at most $D + o(D)$. This result confirms a special case of a conjecture of Vu and generalizes Kahn's bound on the list chromatic index of linear uniform hypergraphs of bounded maximum degree. In fact, this result holds for the correspondence (or DP) chromatic number and thus implies a recent result of Molloy, and we derive this result from a more general list coloring result in the setting of `color degrees' that also implies a result of Reed and Sudakov. △ Less

Submitted 28 October, 2023; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: 16 pages with one-page appendix; final version, to appear in Combinatorics, Probability, and Computing

arXiv:2106.13733 [pdf, other]

Graph and hypergraph colouring via nibble methods: A survey

Authors: Dong Yeap Kang, Tom Kelly, Daniela Kühn, Abhishek Methuku, Deryk Osthus

Abstract: This paper provides a survey of methods, results, and open problems on graph and hypergraph colourings, with a particular emphasis on semi-random `nibble' methods. We also give a detailed sketch of some aspects of the recent proof of the Erdős-Faber-Lovász conjecture. This paper provides a survey of methods, results, and open problems on graph and hypergraph colourings, with a particular emphasis on semi-random `nibble' methods. We also give a detailed sketch of some aspects of the recent proof of the Erdős-Faber-Lovász conjecture. △ Less

Submitted 16 November, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Final version, to appear in the proceedings of the 8th European Congress of Mathematics; 33 pages, 3 figures

arXiv:2106.06741 [pdf, other]

Distributionally Robust Optimization with Markovian Data

Authors: Mengmeng Li, Tobias Sutter, Daniel Kuhn

Abstract: We study a stochastic program where the probability distribution of the uncertain problem parameters is unknown and only indirectly observed via finitely many correlated samples generated by an unknown Markov chain with $d$ states. We propose a data-driven distributionally robust optimization model to estimate the problem's objective function and optimal solution. By leveraging results from large… ▽ More We study a stochastic program where the probability distribution of the uncertain problem parameters is unknown and only indirectly observed via finitely many correlated samples generated by an unknown Markov chain with $d$ states. We propose a data-driven distributionally robust optimization model to estimate the problem's objective function and optimal solution. By leveraging results from large deviations theory, we derive statistical guarantees on the quality of these estimators. The underlying worst-case expectation problem is nonconvex and involves $\mathcal O(d^2)$ decision variables. Thus, it cannot be solved efficiently for large $d$. By exploiting the structure of this problem, we devise a customized Frank-Wolfe algorithm with convex direction-finding subproblems of size $\mathcal O(d)$. We prove that this algorithm finds a stationary point efficiently under mild conditions. The efficiency of the method is predicated on a dimensionality reduction enabled by a dual reformulation. Numerical experiments indicate that our approach has better computational and statistical properties than the state-of-the-art methods. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: 20 pages

arXiv:2106.04443 [pdf, other]

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Authors: Tobias Sutter, Andreas Krause, Daniel Kuhn

Abstract: Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optim… ▽ More Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes. △ Less

Submitted 26 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: 23 pages, 4 figures

Journal ref: NeurIPS 2021

arXiv:2106.00322 [pdf, other]

Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts

Authors: Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

Abstract: Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are… ▽ More Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are robust with regard to moment conditions. When these moment conditions are specified using Kullback-Leibler or Wasserstein-type divergences, we can find the robust estimators efficiently using convex optimization. We use the Bernstein online aggregation algorithm on the proposed family of robust experts to generate predictions for the sequential stream of target test samples. Numerical experiments on real data show that the robust strategies may outperform non-robust interpolations of the empirical least squares estimators. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.00760 [pdf, ps, other]

A Unified Theory of Robust and Distributionally Robust Optimization via the Primal-Worst-Equals-Dual-Best Principle

Authors: Jianzhe Zhen, Daniel Kuhn, Wolfram Wiesemann

Abstract: Robust and distributionally robust optimization are modeling paradigms for decision-making under uncertainty where the uncertain parameters are only known to reside in an uncertainty set or are governed by any probability distribution from within an ambiguity set, respectively, and a decision is sought that minimizes a cost function under the most adverse outcome of the uncertainty. In this paper,… ▽ More Robust and distributionally robust optimization are modeling paradigms for decision-making under uncertainty where the uncertain parameters are only known to reside in an uncertainty set or are governed by any probability distribution from within an ambiguity set, respectively, and a decision is sought that minimizes a cost function under the most adverse outcome of the uncertainty. In this paper, we develop a rigorous and general theory of robust and distributionally robust nonlinear optimization using the language of convex analysis. Our framework is based on a generalized `primal-worst-equals-dual-best' principle that establishes strong duality between a semi-infinite primal worst and a non-convex dual best formulation, both of which admit finite convex reformulations. This principle offers an alternative formulation for robust optimization problems that obviates the need to mobilize the machinery of abstract semi-infinite duality theory to prove strong duality in distributionally robust optimization. We illustrate the modeling power of our approach through convex reformulations for distributionally robust optimization problems whose ambiguity sets are defined through general optimal transport distances, which generalize earlier results for Wasserstein ambiguity sets. △ Less

Submitted 19 July, 2023; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: Previous title: Mathematical Foundations of Robust and Distributionally Robust Optimization

arXiv:2103.06263 [pdf, other]

Semi-Discrete Optimal Transport: Hardness, Regularization and Numerical Solution

Authors: Bahar Taskesen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn

Abstract: Semi-discrete optimal transport problems, which evaluate the Wasserstein distance between a discrete and a generic (possibly non-discrete) probability measure, are believed to be computationally hard. Even though such problems are ubiquitous in statistics, machine learning and computer vision, however, this perception has not yet received a theoretical justification. To fill this gap, we prove tha… ▽ More Semi-discrete optimal transport problems, which evaluate the Wasserstein distance between a discrete and a generic (possibly non-discrete) probability measure, are believed to be computationally hard. Even though such problems are ubiquitous in statistics, machine learning and computer vision, however, this perception has not yet received a theoretical justification. To fill this gap, we prove that computing the Wasserstein distance between a discrete probability measure supported on two points and the Lebesgue measure on the standard hypercube is already #P-hard. This insight prompts us to seek approximate solutions for semi-discrete optimal transport problems. We thus perturb the underlying transportation cost with an additive disturbance governed by an ambiguous probability distribution, and we introduce a distributionally robust dual optimal transport problem whose objective function is smoothed with the most adverse disturbance distributions from within a given ambiguity set. We further show that smoothing the dual objective function is equivalent to regularizing the primal objective function, and we identify several ambiguity sets that give rise to several known and new regularization schemes. As a byproduct, we discover an intimate relation between semi-discrete optimal transport problems and discrete choice models traditionally studied in psychology and economics. To solve the regularized optimal transport problems efficiently, we use a stochastic gradient descent algorithm with imprecise stochastic gradient oracles. A new convergence analysis reveals that this algorithm improves the best known convergence guarantee for semi-discrete optimal transport problems with entropic regularizers. △ Less

Submitted 29 April, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

arXiv:2103.05478 [pdf, other]

Small errors in random zeroth-order optimization are imaginary

Authors: Wouter Jongeneel, Man-Chung Yue, Daniel Kuhn

Abstract: Most zeroth-order optimization algorithms mimic a first-order algorithm but replace the gradient of the objective function with some gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothin… ▽ More Most zeroth-order optimization algorithms mimic a first-order algorithm but replace the gradient of the objective function with some gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothing parameter $δ$ is reduced. Gradient estimators requiring a smaller number of function evaluations are preferable from a computational point of view. While estimators based on a single function evaluation can be obtained by use of the divergence theorem from vector calculus, their variance explodes as $δ$ tends to $0$. Estimators based on multiple function evaluations, on the other hand, suffer from numerical cancellation when $δ$ tends to $0$. To combat both effects simultaneously, we extend the objective function to the complex domain and construct a gradient estimator that evaluates the objective at a complex point whose coordinates have small imaginary parts of the order $δ$. As this estimator requires only one function evaluation, it is immune to cancellation. In addition, its variance remains bounded as $δ$ tends to $0$. We prove that zeroth-order algorithms that use our estimator offer the same theoretical convergence guarantees as the state-of-the-art methods. Numerical experiments suggest, however, that they often converge faster in practice. △ Less

Submitted 19 March, 2024; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: Final version (33 pages), to appear in the SIAM Journal on Optimization

MSC Class: 65D25; 65G50; 65K05; 65Y04; 65Y20; 90C56

arXiv:2103.03805 [pdf, other]

doi 10.1109/LCSYS.2021.3072814

Topological Linear System Identification via Moderate Deviations Theory

Authors: Wouter Jongeneel, Tobias Sutter, Daniel Kuhn

Abstract: Two dynamical systems are topologically equivalent when their phase-portraits can be morphed into each other by a homeomorphic coordinate transformation on the state space. The induced equivalence classes capture qualitative properties such as stability or the oscillatory nature of the state trajectories, for example. In this paper we develop a method to learn the topological class of an unknown s… ▽ More Two dynamical systems are topologically equivalent when their phase-portraits can be morphed into each other by a homeomorphic coordinate transformation on the state space. The induced equivalence classes capture qualitative properties such as stability or the oscillatory nature of the state trajectories, for example. In this paper we develop a method to learn the topological class of an unknown stable system from a single trajectory of finitely many state observations. Using a moderate deviations principle for the least squares estimator of the unknown system matrix $θ$, we prove that the probability of misclassification decays exponentially with the number of observations at a rate that is proportional to the square of the smallest singular value of $θ$. △ Less

Submitted 9 April, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: updated Section 3.A

arXiv:2103.02806 [pdf, other]

A Planner-Trader Decomposition for Multi-Market Hydro Scheduling

Authors: Kilian Schindler, Napat Rujeerapaiboon, Daniel Kuhn, Wolfram Wiesemann

Abstract: Peak/off-peak spreads on European electricity forward and spot markets are eroding due to the ongoing nuclear phaseout in Germany and the steady growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to recover part of their original profitability on the reserve markets. We propose a bi-layer stochastic programming framework for the optima… ▽ More Peak/off-peak spreads on European electricity forward and spot markets are eroding due to the ongoing nuclear phaseout in Germany and the steady growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to recover part of their original profitability on the reserve markets. We propose a bi-layer stochastic programming framework for the optimal operation of a fleet of interconnected hydropower plants that sells energy on both the spot and the reserve markets. The outer layer (the planner's problem) optimizes end-of-day reservoir filling levels over one year, whereas the inner layer (the trader's problem) selects optimal hourly market bids within each day. Using an information restriction whereby the planner prescribes the end-of-day reservoir targets one day in advance, we prove that the trader's problem simplifies from an infinite-dimensional stochastic program with 25 stages to a finite two-stage stochastic program with only two scenarios. Substituting this reformulation back into the outer layer and approximating the reservoir targets by affine decision rules allows us to simplify the planner's problem from an infinite-dimensional stochastic program with 365 stages to a two-stage stochastic program that can conveniently be solved via the sample average approximation. Numerical experiments based on a cascade in the Salzburg region of Austria demonstrate the effectiveness of the suggested framework. △ Less

Submitted 2 September, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

MSC Class: 90C15; 90C17; 90C90

arXiv:2102.03664 [pdf, other]

doi 10.1109/TAC.2022.3213770

Efficient Learning of a Linear Dynamical System with Stability Guarantees

Authors: Wouter Jongeneel, Tobias Sutter, Daniel Kuhn

Abstract: We propose a principled method for projecting an arbitrary square matrix to the non-convex set of asymptotically stable matrices. Leveraging ideas from large deviations theory, we show that this projection is optimal in an information-theoretic sense and that it simply amounts to shifting the initial matrix by an optimal linear quadratic feedback gain, which can be computed exactly and highly effi… ▽ More We propose a principled method for projecting an arbitrary square matrix to the non-convex set of asymptotically stable matrices. Leveraging ideas from large deviations theory, we show that this projection is optimal in an information-theoretic sense and that it simply amounts to shifting the initial matrix by an optimal linear quadratic feedback gain, which can be computed exactly and highly efficiently by solving a standard linear quadratic regulator problem. The proposed approach allows us to learn the system matrix of a stable linear dynamical system from a single trajectory of correlated state observations. The resulting estimator is guaranteed to be stable and offers explicit statistical bounds on the estimation error. △ Less

Submitted 13 June, 2022; v1 submitted 6 February, 2021; originally announced February 2021.

Comments: Exposition has been updated

Journal ref: IEEE Transactions on Automatic Control (Volume: 68, Issue: 5, May 2023)

arXiv:2102.03420 [pdf]

doi 10.1109/MC.2020.3029975

Understanding and Fixing Complex Faults in Embedded Cyberphysical Systems

Authors: Alexander Weiss, Smitha Gautham, Athira Varma Jayakumar, Carl Elks, D. Richard Kuhn, Raghu N. Kacker, Thomas B. Preusser

Abstract: Understanding fault types can lead to novel approaches to debugging and runtime verification. Dealing with complex faults, particularly in the challenging area of embedded systems, craves for more powerful tools, which are now becoming available to engineers. Understanding fault types can lead to novel approaches to debugging and runtime verification. Dealing with complex faults, particularly in the challenging area of embedded systems, craves for more powerful tools, which are now becoming available to engineers. △ Less

Submitted 5 February, 2021; originally announced February 2021.

arXiv:2101.04698 [pdf, ps, other]

A proof of the Erdős-Faber-Lovász conjecture

Authors: Dong Yeap Kang, Tom Kelly, Daniela Kühn, Abhishek Methuku, Deryk Osthus

Abstract: The Erdős-Faber-Lovász conjecture (posed in 1972) states that the chromatic index of any linear hypergraph on $n$ vertices is at most $n$. In this paper, we prove this conjecture for every large $n$. We also provide stability versions of this result, which confirm a prediction of Kahn. The Erdős-Faber-Lovász conjecture (posed in 1972) states that the chromatic index of any linear hypergraph on $n$ vertices is at most $n$. In this paper, we prove this conjecture for every large $n$. We also provide stability versions of this result, which confirm a prediction of Kahn. △ Less

Submitted 25 January, 2023; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: 47 pages, 3 figures. Final version, to appear in the Annals of Mathematics

arXiv:2012.04800 [pdf, other]

A Statistical Test for Probabilistic Fairness

Authors: Bahar Taskesen, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

Abstract: Algorithms are now routinely used to make consequential decisions that affect human lives. Examples include college admissions, medical interventions or law enforcement. While algorithms empower us to harness all information hidden in vast amounts of data, they may inadvertently amplify existing biases in the available datasets. This concern has sparked increasing interest in fair machine learning… ▽ More Algorithms are now routinely used to make consequential decisions that affect human lives. Examples include college admissions, medical interventions or law enforcement. While algorithms empower us to harness all information hidden in vast amounts of data, they may inadvertently amplify existing biases in the available datasets. This concern has sparked increasing interest in fair machine learning, which aims to quantify and mitigate algorithmic discrimination. Indeed, machine learning models should undergo intensive tests to detect algorithmic biases before being deployed at scale. In this paper, we use ideas from the theory of optimal transport to propose a statistical hypothesis test for detecting unfair classifiers. Leveraging the geometry of the feature space, the test statistic quantifies the distance of the empirical distribution supported on the test samples to the manifold of distributions that render a pre-trained classifier fair. We develop a rigorous hypothesis testing mechanism for assessing the probabilistic fairness of any pre-trained logistic classifier, and we show both theoretically as well as empirically that the proposed test is asymptotically correct. In addition, the proposed framework offers interpretability by identifying the most favorable perturbation of the data so that the given classifier becomes fair. △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:2010.14158 [pdf, ps, other]

doi 10.1112/plms.12480

Path decompositions of tournaments

Authors: António Girão, Bertille Granet, Daniela Kühn, Allan Lo, Deryk Osthus

Abstract: In 1976, Alspach, Mason, and Pullman conjectured that any tournament $T$ of even order can be decomposed into exactly ${\rm ex}(T)$ paths, where ${\rm ex}(T):= \frac{1}{2}\sum_{v\in V(T)}|d_T^+(v)-d_T^-(v)|$. We prove this conjecture for all sufficiently large tournaments. We also prove an asymptotically optimal result for tournaments of odd order. In 1976, Alspach, Mason, and Pullman conjectured that any tournament $T$ of even order can be decomposed into exactly ${\rm ex}(T)$ paths, where ${\rm ex}(T):= \frac{1}{2}\sum_{v\in V(T)}|d_T^+(v)-d_T^-(v)|$. We prove this conjecture for all sufficiently large tournaments. We also prove an asymptotically optimal result for tournaments of odd order. △ Less

Submitted 28 July, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: 73 pages, 2 figures; final version, to appear in the Proceedings of the London Mathematical Society

Journal ref: Proc. London Math. Soc., 126 (2023): 429-517

arXiv:2010.06606 [pdf, other]

A Pareto Dominance Principle for Data-Driven Optimization

Authors: Tobias Sutter, Bart P. G. Van Parys, Daniel Kuhn

Abstract: We propose a statistically optimal approach to construct data-driven decisions for stochastic optimization problems. Fundamentally, a data-driven decision is simply a function that maps the available training data to a feasible action. It can always be expressed as the minimizer of a surrogate optimization model constructed from the data. The quality of a data-driven decision is measured by its ou… ▽ More We propose a statistically optimal approach to construct data-driven decisions for stochastic optimization problems. Fundamentally, a data-driven decision is simply a function that maps the available training data to a feasible action. It can always be expressed as the minimizer of a surrogate optimization model constructed from the data. The quality of a data-driven decision is measured by its out-of-sample risk. An additional quality measure is its out-of-sample disappointment, which we define as the probability that the out-of-sample risk exceeds the optimal value of the surrogate optimization model. An ideal data-driven decision should minimize the out-of-sample risk simultaneously with respect to every conceivable probability measure as the true measure is unkown. Unfortunately, such ideal data-driven decisions are generally unavailable. This prompts us to seek data-driven decisions that minimize the in-sample risk subject to an upper bound on the out-of-sample disappointment. We prove that such Pareto-dominant data-driven decisions exist under conditions that allow for interesting applications: the unknown data-generating probability measure must belong to a parametric ambiguity set, and the corresponding parameters must admit a sufficient statistic that satisfies a large deviation principle. We can further prove that the surrogate optimization model must be a distributionally robust optimization problem constructed from the sufficient statistic and the rate function of its large deviation principle. Hence the optimal method for mapping data to decisions is to solve a distributionally robust optimization model. Maybe surprisingly, this result holds even when the training data is non-i.i.d. Our analysis reveals how the structural properties of the data-generating stochastic process impact the shape of the ambiguity set underlying the optimal distributionally robust model. △ Less

Submitted 14 December, 2023; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: 55 pages

arXiv:2010.04183 [pdf, ps, other]

New bounds on the size of Nearly Perfect Matchings in almost regular hypergraphs

Authors: Dong Yeap Kang, Daniela Kühn, Abhishek Methuku, Deryk Osthus

Abstract: Let $H$ be a $k$-uniform $D$-regular simple hypergraph on $N$ vertices. Based on an analysis of the Rödl nibble, Alon, Kim and Spencer (1997) proved that if $k \ge 3$, then $H$ contains a matching covering all but at most $ND^{-1/(k-1)+o(1)}$ vertices, and asked whether this bound is tight. In this paper we improve their bound by showing that for all $k > 3$, $H$ contains a matching covering all b… ▽ More Let $H$ be a $k$-uniform $D$-regular simple hypergraph on $N$ vertices. Based on an analysis of the Rödl nibble, Alon, Kim and Spencer (1997) proved that if $k \ge 3$, then $H$ contains a matching covering all but at most $ND^{-1/(k-1)+o(1)}$ vertices, and asked whether this bound is tight. In this paper we improve their bound by showing that for all $k > 3$, $H$ contains a matching covering all but at most $ND^{-1/(k-1)-η}$ vertices for some $η= Θ(k^{-3}) > 0$, when $N$ and $D$ are sufficiently large. Our approach consists of showing that the Rödl nibble process not only constructs a large matching but it also produces many well-distributed `augmenting stars' which can then be used to significantly improve the matching constructed by the Rödl nibble process. Based on this, we also improve the results of Kostochka and Rödl (1998) and Vu (2000) on the size of matchings in almost regular hypergraphs with small codegree. As a consequence, we improve the best known bounds on the size of large matchings in combinatorial designs with general parameters. Finally, we improve the bounds of Molloy and Reed (2000) on the chromatic index of hypergraphs with small codegree (which can be applied to improve the best known bounds on the chromatic index of Steiner triple systems and more general designs). △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: 35 pages, 1 figure

arXiv:2008.00926 [pdf, ps, other]

Extremal aspects of graph and hypergraph decomposition problems

Authors: Stefan Glock, Daniela Kühn, Deryk Osthus

Abstract: We survey recent advances in the theory of graph and hypergraph decompositions, with a focus on extremal results involving minimum degree conditions. We also collect a number of intriguing open problems, and formulate new ones. We survey recent advances in the theory of graph and hypergraph decompositions, with a focus on extremal results involving minimum degree conditions. We also collect a number of intriguing open problems, and formulate new ones. △ Less

Submitted 25 June, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: final version as appearing in Surveys in Combinatorics 2021

arXiv:2007.09530 [pdf, other]

A Distributionally Robust Approach to Fair Classification

Authors: Bahar Taskesen, Viet Anh Nguyen, Daniel Kuhn, Jose Blanchet

Abstract: We propose a distributionally robust logistic regression model with an unfairness penalty that prevents discrimination with respect to sensitive attributes such as gender or ethnicity. This model is equivalent to a tractable convex optimization problem if a Wasserstein ball centered at the empirical distribution on the training data is used to model distributional uncertainty and if a new convex u… ▽ More We propose a distributionally robust logistic regression model with an unfairness penalty that prevents discrimination with respect to sensitive attributes such as gender or ethnicity. This model is equivalent to a tractable convex optimization problem if a Wasserstein ball centered at the empirical distribution on the training data is used to model distributional uncertainty and if a new convex unfairness measure is used to incentivize equalized opportunities. We demonstrate that the resulting classifier improves fairness at a marginal loss of predictive accuracy on both synthetic and real datasets. We also derive linear programming-based confidence bounds on the level of unfairness of any pre-trained classifier by leveraging techniques from optimal uncertainty quantification over Wasserstein balls. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2007.02891 [pdf, ps, other]

Hamiltonicity of random subgraphs of the hypercube

Authors: Padraig Condon, Alberto Espuny Díaz, António Girão, Daniela Kühn, Deryk Osthus

Abstract: We study Hamiltonicity in random subgraphs of the hypercube $\mathcal{Q}^n$. Our first main theorem is an optimal hitting time result. Consider the random process which includes the edges of $\mathcal{Q}^n$ according to a uniformly chosen random ordering. Then, with high probability, as soon as the graph produced by this process has minimum degree $2k$, it contains $k$ edge-disjoint Hamilton cycle… ▽ More We study Hamiltonicity in random subgraphs of the hypercube $\mathcal{Q}^n$. Our first main theorem is an optimal hitting time result. Consider the random process which includes the edges of $\mathcal{Q}^n$ according to a uniformly chosen random ordering. Then, with high probability, as soon as the graph produced by this process has minimum degree $2k$, it contains $k$ edge-disjoint Hamilton cycles, for any fixed $k\in\mathbb{N}$. Secondly, we obtain a perturbation result: if $H\subseteq\mathcal{Q}^n$ satisfies $δ(H)\geqαn$ with $α>0$ fixed and we consider a random binomial subgraph $\mathcal{Q}^n_p$ of $\mathcal{Q}^n$ with $p\in(0,1]$ fixed, then with high probability $H\cup\mathcal{Q}^n_p$ contains $k$ edge-disjoint Hamilton cycles, for any fixed $k\in\mathbb{N}$. In particular, both results resolve a long standing conjecture, posed e.g. by Bollobás, that the threshold probability for Hamiltonicity in the random binomial subgraph of the hypercube equals $1/2$. Our techniques also show that, with high probability, for all fixed $p\in(0,1]$ the graph $\mathcal{Q}^n_p$ contains an almost spanning cycle. Our methods involve branching processes, the Rödl nibble, and absorption. △ Less

Submitted 13 August, 2022; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Final version, to appear in Memoirs of the AMS

arXiv:2007.00395 [pdf, ps, other]

Almost all optimally coloured complete graphs contain a rainbow Hamilton path

Authors: Stephen Gould, Tom Kelly, Daniela Kühn, Deryk Osthus

Abstract: A subgraph $H$ of an edge-coloured graph is called rainbow if all of the edges of $H$ have different colours. In 1989, Andersen conjectured that every proper edge-colouring of $K_{n}$ admits a rainbow path of length $n-2$. We show that almost all optimal edge-colourings of $K_{n}$ admit both (i) a rainbow Hamilton path and (ii) a rainbow cycle using all of the colours. This result demonstrates tha… ▽ More A subgraph $H$ of an edge-coloured graph is called rainbow if all of the edges of $H$ have different colours. In 1989, Andersen conjectured that every proper edge-colouring of $K_{n}$ admits a rainbow path of length $n-2$. We show that almost all optimal edge-colourings of $K_{n}$ admit both (i) a rainbow Hamilton path and (ii) a rainbow cycle using all of the colours. This result demonstrates that Andersen's Conjecture holds for almost all optimal edge-colourings of $K_{n}$ and answers a recent question of Ferber, Jain, and Sudakov. Our result also has applications to the existence of transversals in random symmetric Latin squares. △ Less

Submitted 21 April, 2022; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: 30 pages, 5 figures. Final version, to appear in Journal of Combinatorial Theory, Series B

Showing 1–50 of 156 results for author: Kuhn, D