-
Probabilistic Programming with Programmable Variational Inference
Authors:
McCoy R. Becker,
Alexander K. Lew,
Xiaoyan Wang,
Matin Ghavami,
Mathieu Huot,
Martin C. Rinard,
Vikash K. Mansinghka
Abstract:
Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper…
▽ More
Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper, we propose a more modular approach to supporting variational inference in PPLs, based on compositional program transformation. In our approach, variational objectives are expressed as programs, that may employ first-class constructs for computing densities of and expected values under user-defined models and variational families. We then transform these programs systematically into unbiased gradient estimators for optimizing the objectives they define. Our design enables modular reasoning about many interacting concerns, including automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, relative to existing support for VI in PPLs, our design increases expressiveness along three axes: (1) it supports an open-ended set of user-defined variational objectives, rather than a fixed menu of options; (2) it supports a combinatorial space of gradient estimation strategies, many not automated by today's PPLs; and (3) it supports a broader class of models and variational families, because it supports constructs for approximate marginalization and normalization (previously introduced only for Monte Carlo inference). We implement our approach in an extension to the Gen probabilistic programming system (genjax.vi, implemented in JAX), and evaluate on several deep generative modeling tasks, showing minimal performance overhead vs. hand-coded implementations and performance competitive with well-established open-source PPLs.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables
Authors:
Mathieu Huot,
Matin Ghavami,
Alexander K. Lew,
Ulrich Schaechtle,
Cameron E. Freer,
Zane Shelby,
Martin C. Rinard,
Feras A. Saad,
Vikash K. Mansinghka
Abstract:
This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL's query planner rests on a unified programmatic interface for interacting with probabilistic model…
▽ More
This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL's query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data, which makes it possible to use models written in a variety of probabilistic programming languages that are tailored to specific workflows. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. GenSQL is formalized using a novel type system and denotational semantics, which together enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case real-world studies -- an anomaly detection in clinical trials and conditional synthetic data generation for a virtual wet lab -- and show that GenSQL more accurately captures the complexity of the data as compared to common baselines. We also show that the declarative syntax in GenSQL is more concise and less error-prone as compared to several alternatives. Finally, GenSQL delivers a 1.7-6.8x speedup compared to its closest competitor on a representative benchmark set and runs in comparable time to hand-written code, in part due to its reusable optimizations and code specialization.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Differentiating Metropolis-Hastings to Optimize Intractable Densities
Authors:
Gaurav Arya,
Ruben Seyer,
Frank Schäfer,
Kartik Chandra,
Alexander K. Lew,
Mathieu Huot,
Vikash K. Mansinghka,
Jonathan Ragan-Kelley,
Christopher Rackauckas,
Moritz Schauer
Abstract:
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us t…
▽ More
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.
△ Less
Submitted 30 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
$\nabla$SD: Differentiable Programming for Sparse Tensors
Authors:
Amir Shaikhha,
Mathieu Huot,
Shideh Hashemian
Abstract:
Sparse tensors are prevalent in many data-intensive applications, yet existing differentiable programming frameworks are tailored towards dense tensors. This presents a significant challenge for efficiently computing gradients through sparse tensor operations, as their irregular sparsity patterns can result in substantial memory and computational overheads. In this work, we introduce a novel frame…
▽ More
Sparse tensors are prevalent in many data-intensive applications, yet existing differentiable programming frameworks are tailored towards dense tensors. This presents a significant challenge for efficiently computing gradients through sparse tensor operations, as their irregular sparsity patterns can result in substantial memory and computational overheads. In this work, we introduce a novel framework that enables the efficient and automatic differentiation of sparse tensors, addressing this fundamental issue. Our experiments demonstrate the effectiveness of the proposed framework in terms of performance and scalability, outperforming state-of-the-art frameworks across a range of synthetic and real-world datasets. Our approach offers a promising direction for enabling efficient and scalable differentiable programming with sparse tensors, which has significant implications for numerous applications in machine learning, natural language processing, and scientific computing.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
$ω$PAP Spaces: Reasoning Denotationally About Higher-Order, Recursive Probabilistic and Differentiable Programs
Authors:
Mathieu Huot,
Alexander K. Lew,
Vikash K. Mansinghka,
Sam Staton
Abstract:
We introduce a new setting, the category of $ω$PAP spaces, for reasoning denotationally about expressive differentiable and probabilistic programming languages. Our semantics is general enough to assign meanings to most practical probabilistic and differentiable programs, including those that use general recursion, higher-order functions, discontinuous primitives, and both discrete and continuous…
▽ More
We introduce a new setting, the category of $ω$PAP spaces, for reasoning denotationally about expressive differentiable and probabilistic programming languages. Our semantics is general enough to assign meanings to most practical probabilistic and differentiable programs, including those that use general recursion, higher-order functions, discontinuous primitives, and both discrete and continuous sampling. But crucially, it is also specific enough to exclude many pathological denotations, enabling us to establish new results about both deterministic differentiable programs and probabilistic programs. In the deterministic setting, we prove very general correctness theorems for automatic differentiation and its use within gradient descent. In the probabilistic setting, we establish the almost-everywhere differentiability of probabilistic programs' trace density functions, and the existence of convenient base measures for density computation in Monte Carlo inference. In some cases these results were previously known, but required detailed proofs with an operational flavor; by contrast, all our proofs work directly with programs' denotations.
△ Less
Submitted 25 May, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Efficient and Sound Differentiable Programming in a Functional Array-Processing Language
Authors:
Amir Shaikhha,
Mathieu Huot,
Shabnam Ghasemirad,
Andrew Fitzgibbon,
Simon Peyton Jones,
Dimitrios Vytiniotis
Abstract:
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and optimisation software tools. Despite the practicality of this technique, the performance of the differentiated programs, especially for functional languages and in the…
▽ More
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and optimisation software tools. Despite the practicality of this technique, the performance of the differentiated programs, especially for functional languages and in the presence of vectors, is suboptimal. We present an AD system for a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source forward-mode AD and global optimisations such as loop transformations. In combination, gradient computation with forward-mode AD can be as efficient as reverse mode, and the Jacobian matrices required for numerical algorithms such as Gauss-Newton and Levenberg-Marquardt can be efficiently computed.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Denotationally Correct, Purely Functional, Efficient Reverse-mode Automatic Differentiation
Authors:
Mathieu Huot,
Amir Shaikhha
Abstract:
Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely functional language with array operations. It is the first one to deliver a provably efficient, purely functional, and denotationally correct reverse-mode differentia…
▽ More
Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely functional language with array operations. It is the first one to deliver a provably efficient, purely functional, and denotationally correct reverse-mode differentiation. We show that our transformation is semantically correct and verifies the cheap gradient principle. Inspired by PROPs and compilation to categories, we introduce a novel intermediate representation that we call 'unary form'. Our reverse-mode transformation is factored as a compilation scheme through this intermediate representation. We obtain provably efficient gradients by performing general partial evaluation optimizations after our reverse-mode transformation, as opposed to manually derived ones. For simple first-order programs, the obtained output programs resemble static-single-assignment (SSA) code. We emphasize the modularity of our approach and show how our language can easily be enriched with more optimized primitives, as required for some speed-ups in practice.
△ Less
Submitted 26 April, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs
Authors:
Alexander K. Lew,
Mathieu Huot,
Sam Staton,
Vikash K. Mansinghka
Abstract:
Optimizing the expected values of probabilistic processes is a central problem in computer science and its applications, arising in fields ranging from artificial intelligence to operations research to statistical computing. Unfortunately, automatic differentiation techniques developed for deterministic programs do not in general compute the correct gradients needed for widely used solutions based…
▽ More
Optimizing the expected values of probabilistic processes is a central problem in computer science and its applications, arising in fields ranging from artificial intelligence to operations research to statistical computing. Unfortunately, automatic differentiation techniques developed for deterministic programs do not in general compute the correct gradients needed for widely used solutions based on gradient-based optimization.
In this paper, we present ADEV, an extension to forward-mode AD that correctly differentiates the expectations of probabilistic processes represented as programs that make random choices. Our algorithm is a source-to-source program transformation on an expressive, higher-order language for probabilistic computation, with both discrete and continuous probability distributions. The result of our transformation is a new probabilistic program, whose expected return value is the derivative of the original program's expectation. This output program can be run to generate unbiased Monte Carlo estimates of the desired gradient, which can then be used within the inner loop of stochastic gradient descent. We prove ADEV correct using logical relations over the denotations of the source and target probabilistic programs. Because it modularly extends forward-mode AD, our algorithm lends itself to a concise implementation strategy, which we exploit to develop a prototype in just a few dozen lines of Haskell (https://github.com/probcomp/adev).
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Compiling Structured Tensor Algebra
Authors:
Mahdi Ghorbani,
Mathieu Huot,
Shideh Hashemian,
Amir Shaikhha
Abstract:
Tensor algebra is essential for data-intensive workloads in various computational domains. Computational scientists face a trade-off between the specialization degree provided by dense tensor algebra and the algorithmic efficiency that leverages the structure provided by sparse tensors. This paper presents StructTensor, a framework that symbolically computes structure at compilation time. This is…
▽ More
Tensor algebra is essential for data-intensive workloads in various computational domains. Computational scientists face a trade-off between the specialization degree provided by dense tensor algebra and the algorithmic efficiency that leverages the structure provided by sparse tensors. This paper presents StructTensor, a framework that symbolically computes structure at compilation time. This is enabled by Structured Tensor Unified Representation (STUR), an intermediate language that can capture tensor computations as well as their sparsity and redundancy structures. Through a mathematical view of lossless tensor computations, we show that our symbolic structure computation and the related optimizations are sound. Finally, for different tensor computation workloads and structures, we experimentally show how capturing the symbolic structure can result in outperforming state-of-the-art frameworks for both dense and sparse tensor algebra.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Towards Denotational Semantics of AD for Higher-Order, Recursive, Probabilistic Languages
Authors:
Alexander K. Lew,
Mathieu Huot,
Vikash K. Mansinghka
Abstract:
Automatic differentiation (AD) aims to compute derivatives of user-defined functions, but in Turing-complete languages, this simple specification does not fully capture AD's behavior: AD sometimes disagrees with the true derivative of a differentiable program, and when AD is applied to non-differentiable or effectful programs, it is unclear what guarantees (if any) hold of the resulting code. We s…
▽ More
Automatic differentiation (AD) aims to compute derivatives of user-defined functions, but in Turing-complete languages, this simple specification does not fully capture AD's behavior: AD sometimes disagrees with the true derivative of a differentiable program, and when AD is applied to non-differentiable or effectful programs, it is unclear what guarantees (if any) hold of the resulting code. We study an expressive differentiable programming language, with piecewise-analytic primitives, higher-order functions, and general recursion. Our main result is that even in this general setting, a version of Lee et al. [2020]'s correctness theorem (originally proven for a first-order language without partiality or recursion) holds: all programs denote so-called $ω$PAP functions, and AD computes correct intensional derivatives of them. Mazza and Pagani [2021]'s recent theorem, that AD disagrees with the true derivative of a differentiable recursive program at a measure-zero set of inputs, can be derived as a straightforward corollary of this fact. We also apply the framework to study probabilistic programs, and recover a recent result from Mak et al. [2021] via a novel denotational argument.
△ Less
Submitted 6 December, 2021; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Functional Collection Programming with Semi-Ring Dictionaries
Authors:
Amir Shaikhha,
Mathieu Huot,
Jaclyn Smith,
Dan Olteanu
Abstract:
This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using se…
▽ More
This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. SDQL can be competitive with or outperforms a host of systems that are state of the art in their own domain: in-memory DB systems Typer and Tectorwise for (flat, not nested) relational data; SciPy for LA workloads; sparse tensor compiler taco; the Trance nested relational engine; and the in-database machine learning engines LMFAO and Morpheus for hybrid DB/LA workloads over relational data.
△ Less
Submitted 22 March, 2022; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Higher Order Automatic Differentiation of Higher Order Functions
Authors:
Mathieu Huot,
Sam Staton,
Matthijs Vákár
Abstract:
We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets…
▽ More
We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation.
△ Less
Submitted 21 March, 2022; v1 submitted 17 January, 2021;
originally announced January 2021.
-
Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing
Authors:
Mathieu Huot,
Sam Staton,
Matthijs Vákár
Abstract:
We present semantic correctness proofs of Automatic Differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets…
▽ More
We present semantic correctness proofs of Automatic Differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Finally, we sketch how the analysis extends to other AD methods by considering a continuation-based method.
△ Less
Submitted 1 April, 2020; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Quantum channels as a categorical completion
Authors:
Mathieu Huot,
Sam Staton
Abstract:
We propose a categorical foundation for the connection between pure and mixed states in quantum information and quantum computation. The foundation is based on distributive monoidal categories.
First, we prove that the category of all quantum channels is a canonical completion of the category of pure quantum operations (with ancilla preparations). More precisely, we prove that the category of co…
▽ More
We propose a categorical foundation for the connection between pure and mixed states in quantum information and quantum computation. The foundation is based on distributive monoidal categories.
First, we prove that the category of all quantum channels is a canonical completion of the category of pure quantum operations (with ancilla preparations). More precisely, we prove that the category of completely positive trace-preserving maps between finite-dimensional C*-algebras is a canonical completion of the category of finite-dimensional vector spaces and isometries.
Second, we extend our result to give a foundation to the topological relationships between quantum channels. We do this by generalizing our categorical foundation to the topologically-enriched setting. In particular, we show that the operator norm topology on quantum channels is the canonical topology induced by the norm topology on isometries.
△ Less
Submitted 24 April, 2019; v1 submitted 21 April, 2019;
originally announced April 2019.
-
Universal Properties in Quantum Theory
Authors:
Mathieu Huot,
Sam Staton
Abstract:
We argue that notions in quantum theory should have universal properties in the sense of category theory. We consider the completely positive trace preserving (CPTP) maps, the basic notion of quantum channel. Physically, quantum channels are derived from pure quantum theory by allowing discarding. We phrase this in category theoretic terms by showing that the category of CPTP maps is the univer…
▽ More
We argue that notions in quantum theory should have universal properties in the sense of category theory. We consider the completely positive trace preserving (CPTP) maps, the basic notion of quantum channel. Physically, quantum channels are derived from pure quantum theory by allowing discarding. We phrase this in category theoretic terms by showing that the category of CPTP maps is the universal monoidal category with a terminal unit that has a functor from the category of isometries. In other words, the CPTP maps are the affine reflection of the isometries.
△ Less
Submitted 29 January, 2019;
originally announced January 2019.