Skip to main content

Showing 1–50 of 82 results for author: Villar, S

  1. arXiv:2407.01055  [pdf, other

    stat.ME

    Exact statistical analysis for response-adaptive clinical trials: a general and computationally tractable approach

    Authors: Stef Baas, Peter Jacko, Sofía S. Villar

    Abstract: Response-adaptive (RA) designs of clinical trials allow targeting a given objective by skewing the allocation of participants to treatments based on observed outcomes. RA designs face greater regulatory scrutiny due to potential type I error inflation, which limits their uptake in practice. Existing approaches to type I error control either only work for specific designs, have a risk of Monte Carl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 35 pages, 6 figures, 11 tables

  2. arXiv:2406.01552  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Learning equivariant tensor functions with applications to sparse vector recovery

    Authors: Wilson G. Gregory, Josué Tonelli-Cueto, Nicholas F. Marshall, Andrew S. Lee, Soledad Villar

    Abstract: This work characterizes equivariant polynomial functions from tuples of tensor inputs to tensor outputs. Loosely motivated by physics, we focus on equivariant functions with respect to the diagonal action of the orthogonal group on tensors. We show how to extend this characterization to other linear algebraic groups, including the Lorentz and symplectic groups. Our goal behind these characteriza… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.18095  [pdf, other

    stat.ML astro-ph.IM cs.LG physics.data-an

    Is machine learning good or bad for the natural sciences?

    Authors: David W. Hogg, Soledad Villar

    Abstract: Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology - in which only the data exist - and a strong epistemology - in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here we identify some… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: A Position Paper accepted for publication in the 2024 International Conference on Machine Learning (ICML)

  4. arXiv:2405.08097  [pdf, other

    cs.LG math.AC

    Learning functions on symmetric matrices and point clouds via lightweight invariant features

    Authors: Ben Blum-Smith, Ningyuan Huang, Marco Cuturi, Soledad Villar

    Abstract: In this work, we present a mathematical formulation for machine learning of (1) functions on symmetric matrices that are invariant with respect to the action of permutations by conjugation, and (2) functions on point clouds that are invariant with respect to rotations, reflections, and permutations of the points. To achieve this, we construct $O(n^2)$ invariant features derived from generators for… ▽ More

    Submitted 15 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 28 pages, 2 figures, 2 tables

    MSC Class: 68P01; 13A50

  5. arXiv:2310.18326  [pdf, other

    cs.AI cs.CY cs.HC cs.LG

    Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

    Authors: Harsh Kumar, Tong Li, Jiakai Shi, Ilya Musabirov, Rachel Kornfield, Jonah Meyerhoff, Ananya Bhattacharjee, Chris Karr, Theresa Nguyen, David Mohr, Anna Rafferty, Sofia Villar, Nina Deliu, Joseph Jay Williams

    Abstract: Digital mental health (DMH) interventions, such as text-message-based lessons and activities, offer immense potential for accessible mental health support. While these interventions can be effective, real-world experimental testing can further enhance their design and impact. Adaptive experimentation, utilizing algorithms like Thompson Sampling for (contextual) multi-armed bandit (MAB) problems, c… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Report number: Volume 38, Issue 21

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence (IAAI) 2024

  6. arXiv:2310.12528  [pdf, other

    astro-ph.IM cs.LG

    Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers

    Authors: D. Huppenkothen, M. Ntampaka, M. Ho, M. Fouesneau, B. Nord, J. E. G. Peek, M. Walmsley, J. F. Wu, C. Avestruz, T. Buck, M. Brescia, D. P. Finkbeiner, A. D. Goulding, T. Kacprzak, P. Melchior, M. Pasquato, N. Ramachandra, Y. -S. Ting, G. van de Ven, S. Villar, V. A. Villar, E. Zinger

    Abstract: Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best pr… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 14 pages, 3 figures; submitted to the Bulletin of the American Astronomical Society

  7. arXiv:2308.10436  [pdf, other

    stat.ML cs.LG

    Approximately Equivariant Graph Networks

    Authors: Ningyuan Huang, Ron Levie, Soledad Villar

    Abstract: Graph neural networks (GNNs) are commonly described as being permutation equivariant with respect to node relabeling in the graph. This symmetry of GNNs is often compared to the translation equivariance of Euclidean convolution neural networks (CNNs). However, these two symmetries are fundamentally different: The translation equivariance of CNNs corresponds to symmetries of the fixed domain acting… ▽ More

    Submitted 17 November, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted at NeurIPS 2023

  8. arXiv:2306.13924  [pdf, other

    cs.LG cs.CV

    Structuring Representation Geometry with Rotationally Equivariant Contrastive Learning

    Authors: Sharut Gupta, Joshua Robinson, Derek Lim, Soledad Villar, Stefanie Jegelka

    Abstract: Self-supervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embedding space. Specifically, i… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 22 pages

  9. arXiv:2306.03698  [pdf, other

    cs.LG cs.DM cs.NE

    Fine-grained Expressivity of Graph Neural Networks

    Authors: Jan Böker, Ron Levie, Ningyuan Huang, Soledad Villar, Christopher Morris

    Abstract: Numerous recent works have analyzed the expressive power of message-passing graph neural networks (MPNNs), primarily utilizing combinatorial techniques such as the $1$-dimensional Weisfeiler-Leman test ($1$-WL) for the graph isomorphism problem. However, the graph isomorphism objective is inherently binary, not giving insights into the degree of similarity between two given graphs. This work resol… ▽ More

    Submitted 2 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  10. arXiv:2305.12585  [pdf, other

    cs.LG

    GeometricImageNet: Extending convolutional neural networks to vector and tensor images

    Authors: Wilson Gregory, David W. Hogg, Ben Blum-Smith, Maria Teresa Arias, Kaze W. K. Wong, Soledad Villar

    Abstract: Convolutional neural networks and their ilk have been very successful for many learning tasks involving images. These methods assume that the input is a scalar image representing the intensity in each pixel, possibly in multiple channels for color images. In natural-science domains however, image-like data sets might have vectors (velocity, say), tensors (polarization, say), pseudovectors (magneti… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  11. arXiv:2305.07587  [pdf, other

    stat.AP

    Global method for gender profile estimation from distribution of first names

    Authors: Manolis Antonoyiannakis, Hugues Chaté, Serena Dalena, Jessica Thomas, Alessandro S. Villar

    Abstract: As social issues related to gender bias attract closer scrutiny, accurate tools to determine the gender profile of large groups become essential. When explicit data is unavailable, gender is often inferred from names. Current methods follow a strategy whereby individuals of the group, one by one, are assigned a gender label or probability based on gender-name correlations observed in the populatio… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: https://ggem.app

  12. arXiv:2301.13724  [pdf, other

    stat.ML astro-ph.IM cs.LG math-ph physics.data-an

    Towards fully covariant machine learning

    Authors: Soledad Villar, David W. Hogg, Weichi Yao, George A. Kevrekidis, Bernhard Schölkopf

    Abstract: Any representation of data involves arbitrary investigator choices. Because those choices are external to the data-generating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry, and units covariance, all of which have led t… ▽ More

    Submitted 28 June, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: substantial revision from v1; submitted to TMLR

  13. arXiv:2301.01107  [pdf

    stat.CO cs.LG

    Computing the Performance of A New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

    Authors: James K. He, Sofía S. Villar, Lida Mavrogonatou

    Abstract: Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP tha… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Accepted by Computing Conference, London 2023

  14. arXiv:2211.15744  [pdf, other

    cs.LG cs.DS cs.IT math.OC math.ST stat.ML

    Sketch-and-solve approaches to k-means clustering by semidefinite programming

    Authors: Charles Clum, Dustin G. Mixon, Soledad Villar, Kaiying Xie

    Abstract: We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We prov… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  15. arXiv:2211.03231  [pdf, other

    cs.SI cs.LG eess.SP

    A Spectral Analysis of Graph Neural Networks on Dense and Sparse Graphs

    Authors: Luana Ruiz, Ningyuan Huang, Soledad Villar

    Abstract: In this work we propose a random graph model that can produce graphs at different levels of sparsity. We analyze how sparsity affects the graph spectra, and thus the performance of graph neural networks (GNNs) in node classification on dense and sparse graphs. We compare GNNs with spectral methods known to provide consistent estimators for community detection on dense graphs, a closely related tas… ▽ More

    Submitted 13 September, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: Extended version of ICASSP 2024 submission

  16. arXiv:2210.15083  [pdf, other

    stat.ML cs.LG

    Deep Learning is Provably Robust to Symmetric Label Noise

    Authors: Carey E. Priebe, Ningyuan Huang, Soledad Villar, Cong Mu, Li Chen

    Abstract: Deep neural networks (DNNs) are capable of perfectly fitting the training data, including memorizing noisy data. It is commonly believed that memorization hurts generalization. Therefore, many recent works propose mitigation strategies to avoid noisy data or correct memorization. In this work, we step back and ask the question: Can deep learning be robust against massive label noise without any mi… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  17. arXiv:2209.15608  [pdf, other

    stat.CO cs.LG

    Shuffled linear regression through graduated convex relaxation

    Authors: Efe Onaran, Soledad Villar

    Abstract: The shuffled linear regression problem aims to recover linear relationships in datasets where the correspondence between input and output is unknown. This problem arises in a wide range of applications including survey data, in which one needs to decide whether the anonymity of the responses can be preserved while uncovering significant statistical connections. In this work, we propose a novel opt… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

  18. arXiv:2209.14991  [pdf, ps, other

    stat.ML cs.LG

    Machine learning and invariant theory

    Authors: Ben Blum-Smith, Soledad Villar

    Abstract: Inspired by constraints from physical law, equivariant machine learning restricts the learning to a hypothesis class where all the functions are equivariant with respect to some group action. Irreducible representations or invariant theory are typically used to parameterize the space of such functions. In this article, we introduce the topic and explain a couple of methods to explicitly parameteri… ▽ More

    Submitted 25 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

  19. arXiv:2209.12054  [pdf, other

    stat.ML cs.LG

    From Local to Global: Spectral-Inspired Graph Neural Networks

    Authors: Ningyuan Huang, Soledad Villar, Carey E. Priebe, Da Zheng, Chengyue Huang, Lin Yang, Vladimir Braverman

    Abstract: Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate… ▽ More

    Submitted 4 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at the NeurIPS 2022 GLFrontiers Workshop

  20. arXiv:2207.14106  [pdf, other

    stat.ML cs.LG q-bio.GN

    MarkerMap: nonlinear marker selection for single-cell studies

    Authors: Nabeel Sarwar, Wilson Gregory, George A Kevrekidis, Soledad Villar, Bianca Dumitrascu

    Abstract: Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable w… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  21. arXiv:2205.08875  [pdf, other

    cs.LG cs.CY

    Multi-disciplinary fairness considerations in machine learning for clinical trials

    Authors: Isabel Chien, Nina Deliu, Richard E. Turner, Adrian Weller, Sofia S. Villar, Niki Kilbertus

    Abstract: While interest in the application of machine learning to improve healthcare has grown tremendously in recent years, a number of barriers prevent deployment in medical practice. A notable concern is the potential to exacerbate entrenched biases and existing health disparities in society. The area of fairness in machine learning seeks to address these issues of equity; however, appropriate approache… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: Appeared at ACM FAccT 2022

  22. Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

    Authors: Xijin Chen, Kim May Lee, Sofia S. Villar, David S. Robertson

    Abstract: When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data fo… ▽ More

    Submitted 7 July, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: 30 pages, 6 figures

  23. arXiv:2204.00887  [pdf, other

    stat.ML cs.LG physics.data-an

    Dimensionless machine learning: Imposing exact units equivariance

    Authors: Soledad Villar, Weichi Yao, David W. Hogg, Ben Blum-Smith, Bianca Dumitrascu

    Abstract: Units equivariance (or units covariance) is the exact symmetry that follows from the requirement that relationships among measured quantities of physics relevance must obey self-consistent dimensional scalings. Here, we express this symmetry in terms of a (non-compact) group action, and we employ dimensional analysis and ideas from equivariant machine learning to provide a methodology for exactly… ▽ More

    Submitted 31 December, 2022; v1 submitted 2 April, 2022; originally announced April 2022.

    Journal ref: Journal of Machine Learning Research 24 (2023) 1--32

  24. arXiv:2201.07372  [pdf, other

    cs.LG cs.AI

    Prospective Learning: Principled Extrapolation to the Future

    Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

    Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More

    Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

  25. A Short Tutorial on The Weisfeiler-Lehman Test And Its Variants

    Authors: Ningyuan Huang, Soledad Villar

    Abstract: Graph neural networks are designed to learn functions on graphs. Typically, the relevant target functions are invariant with respect to actions by permutations. Therefore the design of some graph neural network architectures has been inspired by graph-isomorphism algorithms. The classical Weisfeiler-Lehman algorithm (WL) -- a graph-isomorphism test based on color refinement -- became relevant to t… ▽ More

    Submitted 1 November, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

  26. arXiv:2112.08507  [pdf, other

    cs.LG stat.ML

    Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

    Authors: Tong Li, Jacob Nogas, Haochen Song, Harsh Kumar, Audrey Durand, Anna Rafferty, Nina Deliu, Sofia S. Villar, Joseph J. Williams

    Abstract: Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to conduct adaptive experiments, in which maximizing reward means that data is used to progressively assign participants to more effective arms. Such assignment strategies increase the risk of statistical hypothesis tests identifying a difference between arms when there is not one, and failing to conclude there is a difference i… ▽ More

    Submitted 23 November, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

  27. arXiv:2112.02916  [pdf, ps, other

    math.MG

    Three proofs of the Benedetto-Fickus theorem

    Authors: Dustin G. Mixon, Tom Needham, Clayton Shonkwiler, Soledad Villar

    Abstract: In 2003, Benedetto and Fickus introduced a vivid intuition for an objective function called the frame potential, whose global minimizers are fundamental objects known today as unit norm tight frames. Their main result was that the frame potential exhibits no spurious local minimizers, suggesting local optimization as an approach to construct these objects. Local optimization has since become the w… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  28. arXiv:2111.00137  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

    Authors: Nina Deliu, Joseph J. Williams, Sofia S. Villar

    Abstract: Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference (e.g., biased estimators, inflated type-I error and reduced power). Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit algorithm$-$trading off regret$-$and require large sample sizes to ensure… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: 32 pages including supplementary material

  29. arXiv:2110.03761  [pdf, other

    cs.LG

    A simple equivariant machine learning method for dynamics based on scalars

    Authors: Weichi Yao, Kate Storey-Fisher, David W. Hogg, Soledad Villar

    Abstract: Physical systems obey strict symmetry principles. We expect that machine learning methods that intrinsically respect these symmetries should have higher prediction accuracy and better generalization in prediction of physical dynamics. In this work we implement a principled model based on invariant scalars, and release open-source code. We apply this Scalars method to a simple chaotic dynamical sys… ▽ More

    Submitted 30 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  30. arXiv:2106.06610  [pdf, other

    cs.LG math-ph stat.ML

    Scalars are universal: Equivariant machine learning, structured like classical physics

    Authors: Soledad Villar, David W. Hogg, Kate Storey-Fisher, Weichi Yao, Ben Blum-Smith

    Abstract: There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries,… ▽ More

    Submitted 7 February, 2023; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

    Journal ref: Advances in Neural Information Processing Systems, 34, 28848-28863. 2021

  31. arXiv:2104.00546  [pdf, other

    q-bio.PE stat.AP

    Quantifying efficiency gains of innovative designs of two-arm vaccine trials for COVID-19 using an epidemic simulation model

    Authors: Rob Johnson, Chris Jackson, Anne Presanis, Sofia S. Villar, Daniela De Angelis

    Abstract: Clinical trials of a vaccine during an epidemic face particular challenges, such as the pressure to identify an effective vaccine quickly to control the epidemic, and the effect that time-space-varying infection incidence has on the power of a trial. We illustrate how the operating characteristics of different trial design elements may be evaluated using a network epidemic and trial simulation mod… ▽ More

    Submitted 20 May, 2021; v1 submitted 13 March, 2021; originally announced April 2021.

  32. arXiv:2103.12198  [pdf

    cs.LG stat.AP

    Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments

    Authors: Joseph Jay Williams, Jacob Nogas, Nina Deliu, Hammad Shaikh, Sofia S. Villar, Audrey Durand, Anna Rafferty

    Abstract: Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments. In such experiments, an algorithm varies which arms (e.g. alternative interventions to help students learn) are assigned to participants, with the goal of assigning higher-reward arms to as many participants as possible. We applied the bandit algorithm Thompson Sampling (TS) to run adaptive… ▽ More

    Submitted 26 March, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

  33. arXiv:2101.07256  [pdf, other

    physics.data-an astro-ph.IM cs.LG

    Fitting very flexible models: Linear regression with large numbers of parameters

    Authors: David W. Hogg, Soledad Villar

    Abstract: There are many uses for linear fitting; the context here is interpolation and denoising of data, as when you have calibration data and you want to fit a smooth, flexible function to those data. Or you want to fit a flexible function to de-trend a time series or normalize a spectrum. In these contexts, investigators often choose a polynomial basis, or a Fourier basis, or wavelets, or something equa… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

    Comments: all code used to make the figures is available at https://github.com/davidwhogg/FlexibleLinearModels

  34. arXiv:2011.11477  [pdf, other

    stat.ML cs.LG

    Dimensionality reduction, regularization, and generalization in overparameterized regressions

    Authors: Ningyuan Huang, David W. Hogg, Soledad Villar

    Abstract: Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet often generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a "double-descent" behavior: (1) The risk (expected out-of-sample prediction error) can grow arbitrarily when the number o… ▽ More

    Submitted 19 October, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Journal ref: SIAM Journal on Mathematics of Data Science Vol.4 Iss.1, 2022

  35. arXiv:2011.03270  [pdf, other

    stat.ME stat.AP

    A Novel Statistical Test for Treatment Differences in Clinical Trials using a Response Adaptive Forward Looking Gittins Index Rule

    Authors: Helen Yvette Barnett, Sofia S Villar, Helena Geys, Thomas Jaki

    Abstract: The most common objective for response adaptive clinical trials is to seek to ensure that patients within a trial have a high chance of receiving the best treatment available by altering the chance of allocation on the basis of accumulating data. Approaches which yield good patient benefit properties suffer from low power from a frequentist perspective when testing for a treatment difference at th… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

  36. arXiv:2006.12811  [pdf, other

    stat.AP

    Adding flexibility to clinical trial designs: an example-based guide to the practical use of adaptive designs

    Authors: Thomas Burnett, Pavel Mozgunov, Philip Pallmann, Sofia S. Villar, Graham M. Wheeler, Thomas Jaki

    Abstract: Adaptive designs for clinical trials permit alterations to a study in response to accumulating data in order to make trials more flexible, ethical and efficient. These benefits are achieved while preserving the integrity and validity of the trial, through the pre-specification and proper adjustment for the possible alterations during the course of the trial. Despite much research in the statistica… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 35 pages, 9 figures

  37. arXiv:2006.05026  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints

    Authors: Cong Shen, Zhiyang Wang, Sofia S. Villar, Mihaela van der Schaar

    Abstract: Phase I dose-finding trials are increasingly challenging as the relationship between efficacy and toxicity of new compounds (or combination of them) becomes more complex. Despite this, most commonly used methods in practice focus on identifying a Maximum Tolerated Dose (MTD) by learning only from toxicity events. We present a novel adaptive clinical trial methodology, called Safe Efficacy Explorat… ▽ More

    Submitted 15 June, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted to the 37th International Conference on Machine Learning (ICML 2020)

  38. arXiv:2005.00564  [pdf, other

    stat.ME stat.AP

    Response-adaptive randomization in clinical trials: from myths to practical considerations

    Authors: David S. Robertson, Kim May Lee, Boryana C. Lopez-Kolkovska, Sofia S. Villar

    Abstract: Response-Adaptive Randomization (RAR) is part of a wider class of data-dependent sampling algorithms, for which clinical trials are typically used as a motivating application. In that context, patient allocation to treatments is determined by randomization probabilities that change based on the accrued response data in order to achieve experimental goals. RAR has received abundant theoretical atte… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: Update in response to editor comments

    MSC Class: 62-02

  39. Seabed classification using physics-based modeling and machine learning

    Authors: Christina Frederick, Soledad Villar, Zoi-Heleni Michalopoulou

    Abstract: In this work model-based methods are employed along with machine learning techniques to classify sediments in oceanic environments based on the geoacoustic properties of a two-layer seabed. Two different scenarios are investigated. First, a simple low-frequency case is set up, where the acoustic field is modeled with normal modes. Four different hypotheses are made for seafloor sediment possibilit… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

  40. arXiv:2002.04025  [pdf, other

    cs.LG cs.DM stat.ML

    Can Graph Neural Networks Count Substructures?

    Authors: Zhengdao Chen, Lei Chen, Soledad Villar, Joan Bruna

    Abstract: The ability to detect and count certain substructures in graphs is important for solving many tasks on graph-structured data, especially in the contexts of computational chemistry and biology as well as social network analysis. Inspired by this, we propose to study the expressive power of graph neural networks (GNNs) via their ability to count attributed graph substructures, extending recent works… ▽ More

    Submitted 28 October, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Improved the descriptions of the Local Relational Pooling (LRP) model and its practical implementation; Added more experimental results on synthetic and molecular datasets; Added the LRP-l-1 model in the experiments

    Journal ref: 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  41. arXiv:2001.01666  [pdf, other

    stat.ML cs.LG q-bio.GN

    MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

    Authors: Andrew J. Blumberg, Mathieu Carriere, Michael A. Mandell, Raul Rabadan, Soledad Villar

    Abstract: Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using bl… ▽ More

    Submitted 20 February, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

  42. arXiv:1908.05767  [pdf, ps, other

    math.OC cs.LG

    Experimental performance of graph neural networks on random instances of max-cut

    Authors: Weichi Yao, Afonso S. Bandeira, Soledad Villar

    Abstract: This note explores the applicability of unsupervised machine learning techniques towards hard optimization problems on random inputs. In particular we consider Graph Neural Networks (GNNs) -- a class of neural networks designed to learn functions on graphs -- and we apply them to the max-cut problem on random regular graphs. We focus on the max-cut problem on random regular graphs because it is a… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

  43. arXiv:1905.12560  [pdf, other

    cs.LG stat.ML

    On the equivalence between graph isomorphism testing and function approximation with GNNs

    Authors: Zhengdao Chen, Soledad Villar, Lei Chen, Joan Bruna

    Abstract: Graph Neural Networks (GNNs) have achieved much success on graph-structured data. In light of this, there have been increasing interests in studying their expressive power. One line of work studies the capability of GNNs to approximate permutation-invariant functions on graphs, and another focuses on the their power as tests for graph isomorphism. Our work connects these two perspectives and prove… ▽ More

    Submitted 10 February, 2023; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Strengthened Theorem 4 with a modified proof; Updated Figure 2 to include results from the later literature; Made other minor edits to improve clarity. 22 pages

    Journal ref: Original version published at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  44. arXiv:1812.07377  [pdf, other

    math.CO cs.GT physics.soc-ph

    Utility Ghost: Gamified redistricting with partisan symmetry

    Authors: Dustin G. Mixon, Soledad Villar

    Abstract: Inspired by the word game Ghost, we propose a new protocol for bipartisan redistricting in which partisan players take turns assigning precincts to districts. We prove that in an idealized setting, if both parties have the same number votes, then under optimal play in our protocol, both parties win the same number of seats. We also evaluate our protocol in more realistic settings that show how our… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

  45. arXiv:1812.02768  [pdf, other

    stat.ML cs.LG math.OC

    SqueezeFit: Label-aware dimensionality reduction by semidefinite programming

    Authors: Culver McWhirter, Dustin G. Mixon, Soledad Villar

    Abstract: Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. Taking inspiration from large margin nearest neighbor classification, this paper introduces a semidefinite relaxation of this problem. Unli… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  46. arXiv:1808.08905  [pdf, other

    cs.CC cs.DS

    Fair redistricting is hard

    Authors: Richard Kueng, Dustin G. Mixon, Soledad Villar

    Abstract: Gerrymandering is a long-standing issue within the U.S. political system, and it has received scrutiny recently by the U.S. Supreme Court. In this note, we prove that deciding whether there exists a fair redistricting among legal maps is NP-hard. To make this precise, we use simplified notions of "legal" and "fair" that account for desirable traits such as geographic compactness of districts and s… ▽ More

    Submitted 27 August, 2018; originally announced August 2018.

  47. arXiv:1803.09319  [pdf, other

    cs.LG stat.ML

    SUNLayer: Stable denoising with generative networks

    Authors: Dustin G. Mixon, Soledad Villar

    Abstract: It has been experimentally established that deep neural networks can be used to produce good generative models for real world data. It has also been established that such generative models can be exploited to solve classical inverse problems like compressed sensing and super resolution. In this work we focus on the classical signal processing problem of image denoising. We propose a theoretical se… ▽ More

    Submitted 25 March, 2018; originally announced March 2018.

  48. Hexapartite entanglement in an above-threshold Optical Parametric Oscillator

    Authors: F. A. S. Barbosa, A. S. Coelho, L. F. Muñoz Martínez, L. Ortiz-Gutiérrez, A. S. Villar, P. Nussenzveig, M. Martinelli

    Abstract: We demonstrate, theoretically and experimentally, the generation of hexapartite modal entanglement by the optical parametric oscillator (OPO) operating above the oscillation threshold. We show that the OPO generates a rich structure of entanglement among sets of six optical sideband modes interacting through the non-linear crystal. The class of quantum states thus produced can be controlled by a s… ▽ More

    Submitted 28 June, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: 9 pages, 12 figures

    Journal ref: Phys. Rev. Lett. 121, 073601 (2018)

  49. Exploring six modes of an optical parametric oscillator

    Authors: Luis F. Muñoz-Martínez, Felippe Alexandre Silva Barbosa, Antônio Sales Coelho, Luis Ortiz-Gutiérrez, Marcelo Martinelli, Paulo Nussenzveig, Alessandro S. Villar

    Abstract: We measure the complete quantum state for six modes of the electromagnetic field produced by an optical parametric oscillator. The investigation involves the sideband of the intense pump, signal, and idler fields generated by stimulated parametric downconversion inside a triply resonant optical resonator. We develop a theoretical model to successfully interpret the experimental results. The model… ▽ More

    Submitted 20 February, 2018; v1 submitted 8 October, 2017; originally announced October 2017.

    Comments: 11 pages, 5 figures

    Journal ref: Phys. Rev. A 98, 023823 (2018)

  50. arXiv:1710.00956  [pdf, other

    stat.ML math.OC

    Monte Carlo approximation certificates for k-means clustering

    Authors: Dustin G. Mixon, Soledad Villar

    Abstract: Efficient algorithms for $k$-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect $k$-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for $k$-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of $k$-means. In particular, solv… ▽ More

    Submitted 2 October, 2017; originally announced October 2017.

    Comments: 8 pages