skip to main content
research-article
Open access

Probabilistic Programming with Programmable Variational Inference

Published: 20 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper, we propose a more modular approach to supporting variational inference in PPLs, based on compositional program transformation. In our approach, variational objectives are expressed as programs, that may employ first-class constructs for computing densities of and expected values under user-defined models and variational families. We then transform these programs systematically into unbiased gradient estimators for optimizing the objectives they define. Our design makes it possible to prove unbiasedness by reasoning modularly about many interacting concerns in PPL implementations of variational inference, including automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, relative to existing support for VI in PPLs, our design increases expressiveness along three axes: (1) it supports an open-ended set of user-defined variational objectives, rather than a fixed menu of options; (2) it supports a combinatorial space of gradient estimation strategies, many not automated by today’s PPLs; and (3) it supports a broader class of models and variational families, because it supports constructs for approximate marginalization and normalization (previously introduced for Monte Carlo inference). We implement our approach in an extension to the Gen probabilistic programming system (genjax.vi, implemented in JAX), and evaluate our automation on several deep generative modeling tasks, showing minimal performance overhead vs. hand-coded implementations and performance competitive with well-established open-source PPLs.

    References

    [1]
    Felix V. Agakov and David Barber. 2004. An Auxiliary Variational Method. In Neural Information Processing (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg. 561–566. isbn:978-3-540-30499-9 https://doi.org/10.1007/978-3-540-30499-9_86
    [2]
    Amal Ahmed. 2006. Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types. In Programming Languages and Systems (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg. 69–83. isbn:978-3-540-33096-7 https://doi.org/10.1007/11693024_6
    [3]
    Gaurav Arya, Moritz Schauer, Frank Schäfer, and Christopher Rackauckas. 2022. Automatic Differentiation of Programs with Discrete Randomness. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.). http://papers.nips.cc/paper_files/paper/2022/hash/43d8e5fc816c692f342493331d5e98fc-Abstract-Conference.html
    [4]
    Sai Praveen Bangaru, Jesse Michel, Kevin Mu, Gilbert Bernstein, Tzu-Mao Li, and Jonathan Ragan-Kelley. 2021. Systematically differentiating parametric discontinuities. ACM Trans. Graph., 40, 4 (2021), 107:1–107:18. https://doi.org/10.1145/3450626.3459775
    [5]
    McCoy R. Becker, Alexander K. Lew, and Xiaoyan Wang. 2024. probcomp/programmable-vi-pldi-2024: v0.1.2. Zenodo. https://doi.org/10.5281/zenodo.10935596
    [6]
    Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. 28:1–28:6 pages. http://jmlr.org/papers/v20/18-403.html
    [7]
    David M Blei and Michael I Jordan. 2006. Variational inference for Dirichlet process mixtures.
    [8]
    David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. 2016. Variational Inference: A Review for Statisticians. CoRR, abs/1601.00670 (2016), arXiv:1601.00670. arxiv:1601.00670
    [9]
    Johannes Borgström, Ugo Dal Lago, Andrew D Gordon, and Marcin Szymczak. 2016. A lambda-calculus foundation for universal probabilistic programming. ACM SIGPLAN Notices, 51, 9 (2016), 33–46.
    [10]
    Jörg Bornschein and Yoshua Bengio. 2015. Reweighted Wake-Sleep. https://doi.org/10.48550/arXiv.1406.2751 arXiv:1406.2751 [cs]
    [11]
    Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. 2016. Importance Weighted Autoencoders. https://doi.org/10.48550/arXiv.1509.00519 arXiv:1509.00519 [cs, stat]
    [12]
    Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software, 76 (2017).
    [13]
    Marco Cusumano-Towner and Vikash K Mansinghka. 2017. AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms. In Advances in Neural Information Processing Systems. 30, Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/acab0116c354964a558e65bdd07ff047-Abstract.html
    [14]
    Marco F. Cusumano-Towner, Feras A. Saad, Alexander K. Lew, and Vikash K. Mansinghka. 2019. Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 221–236. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314642
    [15]
    A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1 (1977), 1–38. issn:0035-9246 https://www.jstor.org/stable/2984875 Publisher: [Royal Statistical Society, Wiley]
    [16]
    Justin Domke. 2021. An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations. https://doi.org/10.48550/arXiv.2103.01030 arXiv:2103.01030 [cs, stat]
    [17]
    S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, and Geoffrey E. Hinton. 2016. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. https://doi.org/10.48550/arXiv.1603.08575 arXiv:1603.08575 [cs]
    [18]
    Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, and Shimon Whiteson. 2018. Dice: The infinitely differentiable Monte Carlo estimator. In International Conference on Machine Learning. 1529–1538.
    [19]
    Charles W Fox and Stephen J Roberts. 2012. A tutorial on variational Bayesian inference. Artificial intelligence review, 38 (2012), 85–95.
    [20]
    Roy Frostig, Matthew James Johnson, and Chris Leary. 2018. Compiling machine learning programs via high-level tracing. Systems for Machine Learning, 4, 9 (2018).
    [21]
    Hong Ge, Kai Xu, and Zoubin Ghahramani. 2018. Turing: a language for flexible probabilistic inference. In International conference on artificial intelligence and statistics. 1682–1690.
    [22]
    Shixiang (Shane) Gu, Zoubin Ghahramani, and Richard E Turner. 2015. Neural Adaptive Sequential Monte Carlo. In Advances in Neural Information Processing Systems. 28, Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2015/hash/99adff456950dd9629a5260c4de21858-Abstract.html
    [23]
    Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. 2017. A convenient category for higher-order probability theory. In Proceedings of the 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS ’17). IEEE Press, Reykjavík, Iceland. 1–12. isbn:978-1-5090-3018-7
    [24]
    Geoffrey E. Hinton, Peter Dayan, Brendan J. Frey, and Radford M. Neal. 1995. The "Wake-Sleep" Algorithm for Unsupervised Neural Networks. Science, 268, 5214 (1995), May, 1158–1161. issn:0036-8075, 1095-9203 https://doi.org/10.1126/science.7761831
    [25]
    Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research.
    [26]
    Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In Foundations of Software Science and Computation Structures (Lecture Notes in Computer Science). Springer International Publishing, Cham. 319–338. isbn:978-3-030-45231-5 https://doi.org/10.1007/978-3-030-45231-5_17
    [27]
    Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. 2021. Variational diffusion models. Advances in Neural Information Processing Systems, 34 (2021), 21696–21707.
    [28]
    Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’14). MIT Press, Cambridge, MA, USA. 3581–3589.
    [29]
    Diederik P. Kingma and Max Welling. 2022. Auto-Encoding Variational Bayes. https://doi.org/10.48550/arXiv.1312.6114 arXiv:1312.6114 [cs, stat]
    [30]
    Justin N Kreikemeyer and Philipp Andelfinger. 2023. Smoothing Methods for Automatic Differentiation Across Conditional Branches. IEEE Access.
    [31]
    Emile Krieken, Jakub Tomczak, and Annette Ten Teije. 2021. Storchastic: A framework for general stochastic automatic differentiation. Advances in Neural Information Processing Systems, 34 (2021), 7574–7587.
    [32]
    Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. 2017. Automatic differentiation variational inference. Journal of machine learning research.
    [33]
    Tuan Anh Le, Adam R. Kosiorek, N. Siddharth, Yee Whye Teh, and Frank Wood. 2019. Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow. 1039–1049 pages. http://proceedings.mlr.press/v115/le20a.html
    [34]
    Wonyeol Lee, Xavier Rival, and Hongseok Yang. 2023. Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference. Proceedings of the ACM on Programming Languages, 7, POPL (2023), Jan., 12:335–12:366. https://doi.org/10.1145/3571205
    [35]
    Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2019. Towards verified stochastic variational inference for probabilistic programs. Proceedings of the ACM on Programming Languages, 4, POPL (2019), Dec., 16:1–16:33. https://doi.org/10.1145/3371084
    [36]
    Wonyeol Lee, Hangyeol Yu, and Hongseok Yang. 2018. Reparameterization gradient for non-differentiable models. Advances in Neural Information Processing Systems, 31 (2018).
    [37]
    Alexander K. Lew, Marco F. Cusumano-Towner, and Vikash K. Mansinghka. 2022. Recursive Monte Carlo and variational inference with auxiliary variables. In Uncertainty in Artificial Intelligence, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI 2022, 1-5 August 2022, Eindhoven, The Netherlands (Proceedings of Machine Learning Research, Vol. 180). PMLR, 1096–1106. https://proceedings.mlr.press/v180/lew22a.html
    [38]
    Alexander K Lew, Marco F Cusumano-Towner, Benjamin Sherman, Michael Carbin, and Vikash K Mansinghka. 2019. Trace types and denotational semantics for sound programmable inference in probabilistic languages. Proceedings of the ACM on Programming Languages, 4, POPL (2019), 1–32.
    [39]
    Alexander K. Lew, Matin Ghavamizadeh, Martin C. Rinard, and Vikash K. Mansinghka. 2023. Probabilistic Programming with Stochastic Probabilities. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), June, 176:1708–176:1732. https://doi.org/10.1145/3591290
    [40]
    Alexander K. Lew, Mathieu Huot, Sam Staton, and Vikash K. Mansinghka. 2023. ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs. Proceedings of the ACM on Programming Languages, 7, POPL (2023), Jan., 121–153. issn:2475-1421 https://doi.org/10.1145/3571198 arXiv:2212.06386 [cs, stat]
    [41]
    Jianlin Li, Leni Ven, Pengyuan Shi, and Yizhou Zhang. 2023. Type-preserving, dependence-aware guide generation for sound, effective amortized probabilistic inference. Proceedings of the ACM on Programming Languages, 7, POPL (2023), 1454–1482.
    [42]
    Michael Y. Li, Dieterich Lawson, and Scott Linderman. 2023. Neural Adaptive Smoothing via Twisting. https://openreview.net/forum?id=rC6-kGN-0v
    [43]
    Daniel Lundén, Johannes Borgström, and David Broman. 2021. Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages. In ESOP. 404–431.
    [44]
    Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. 2016. Auxiliary Deep Generative Models. In Proceedings of The 33rd International Conference on Machine Learning. PMLR, 1445–1453. https://proceedings.mlr.press/v48/maaloe16.html ISSN: 1938-7228
    [45]
    Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. Filtering Variational Objectives. https://doi.org/10.48550/arXiv.1705.09279 arXiv:1705.09279 [cs, stat]
    [46]
    Nikolay Malkin, Salem Lahlou, Tristan Deleu, Xu Ji, Edward Hu, Katie Everett, Dinghuai Zhang, and Yoshua Bengio. 2022. GFlowNets and variational inference. arXiv preprint arXiv:2210.00580.
    [47]
    Vikash Mansinghka, Daniel Selsam, and Yura Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. arXiv preprint arXiv:1404.0099.
    [48]
    Vikash K Mansinghka, Ulrich Schaechtle, Shivam Handa, Alexey Radul, Yutian Chen, and Martin Rinard. 2018. Probabilistic programming with programmable inference. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 603–616.
    [49]
    Jesse Michel, Kevin Mu, Xuanda Yang, Sai Praveen Bangaru, Elias Rojas Collins, Gilbert Bernstein, Jonathan Ragan-Kelley, Michael Carbin, and Tzu-Mao Li. 2024. Distributions for Compositionally Differentiating Parametric Discontinuities. Proceedings of the ACM on Programming Languages, 8, OOPSLA1 (2024), 893–922.
    [50]
    Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo gradient estimation in machine learning. Journal of Machine Learning Research, 21, 132 (2020), 1–62.
    [51]
    Christian Naesseth, Scott Linderman, Rajesh Ranganath, and David Blei. 2018. Variational sequential Monte Carlo. In International conference on artificial intelligence and statistics. 968–977.
    [52]
    Christian A. Naesseth, Fredrik Lindsten, and David Blei. 2020. Markovian score climbing: variational inference with KL(p| | q). In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20). Curran Associates Inc., Red Hook, NY, USA. 15499–15510. isbn:978-1-71382-954-6
    [53]
    Christian A Naesseth, Fredrik Lindsten, and Thomas B Schön. 2019. Elements of sequential Monte Carlo. Foundations and Trends® in Machine Learning, 12, 3 (2019), 307–392.
    [54]
    Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming: 13th International Symposium, FLOPS 2016, Kochi, Japan, March 4-6, 2016, Proceedings 13. 62–79.
    [55]
    Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Du Phan, and Jonathan P Chen. 2019. Functional tensors for probabilistic programming. arXiv preprint arXiv:1910.10775.
    [56]
    Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Neeraj Pradhan, Justin Chiu, Alexander Rush, and Noah Goodman. 2019. Tensor variable elimination for plated factor graphs. In International Conference on Machine Learning. 4871–4880.
    [57]
    Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. 2016. Variational autoencoder for deep learning of images, labels and captions. Advances in Neural Information Processing Systems, 29 (2016).
    [58]
    Alexey Radul, Adam Paszke, Roy Frostig, Matthew Johnson, and Dougal Maclaurin. 2022. You only linearize once: Tangents transpose to gradients. arXiv preprint arXiv:2204.10923.
    [59]
    Tom Rainforth, Adam R. Kosiorek, Tuan Anh Le, Chris J. Maddison, Maximilian Igl, Frank Wood, and Yee Whye Teh. 2018. Tighter Variational Bounds are Not Necessarily Better. arxiv:1802.04537v3
    [60]
    Rajesh Ranganath, Dustin Tran, and David Blei. 2016. Hierarchical Variational Models. In Proceedings of The 33rd International Conference on Machine Learning. PMLR, 324–333. https://proceedings.mlr.press/v48/ranganath16.html ISSN: 1938-7228
    [61]
    D.B. Rubin. 1988. Using the SIR algorithm to simulate posterior distributions. https://api.semanticscholar.org/CorpusID:115305396
    [62]
    Tim Salimans, Diederik Kingma, and Max Welling. 2015. Markov chain Monte Carlo and variational inference: Bridging the gap. In International Conference on Machine Learning. 1218–1226.
    [63]
    John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. Gradient estimation using stochastic computation graphs. Advances in Neural Information Processing Systems, 28 (2015).
    [64]
    Benjamin Sherman, Jesse Michel, and Michael Carbin. 2021. λ S: computable semantics for differentiable programming with higher-order functions and datatypes. Proceedings of the ACM on Programming Languages, 5, POPL (2021), Jan., 3:1–3:31. https://doi.org/10.1145/3434284
    [65]
    Artem Sobolev and Dmitry Vetrov. 2019. Importance Weighted Hierarchical Variational Inference. https://doi.org/10.48550/arXiv.1905.03290 arXiv:1905.03290 [cs, stat]
    [66]
    Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems. 28, Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html
    [67]
    Sam Stites, Heiko Zimmermann, Hao Wu, Eli Sennesh, and Jan-Willem van de Meent. 2021. Learning proposals for probabilistic programs with inference combinators. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence. PMLR, 1056–1066. https://proceedings.mlr.press/v161/stites21a.html ISSN: 2640-3498
    [68]
    Dustin Tran, Matthew D. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, and Alexey Radul. 2018. Simple, Distributed, and Accelerated Probabilistic Programming. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 7609–7620. https://proceedings.neurips.cc/paper/2018/hash/201e5bacd665709851b77148e225b332-Abstract.html
    [69]
    Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. Deep Probabilistic Programming. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Hy6b4Pqee
    [70]
    Arash Vahdat and Jan Kautz. 2020. NVAE: A deep hierarchical variational autoencoder. Advances in Neural Information Processing Systems, 33 (2020), 19667–19679.
    [71]
    Dominik Wagner. 2023. Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing. Ph. D. Dissertation. University of Oxford.
    [72]
    Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Sound probabilistic inference via guide types. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 788–803.
    [73]
    Théophane Weber, Nicolas Heess, Lars Buesing, and David Silver. 2019. Credit assignment techniques in stochastic computation graphs. In The 22nd International Conference on Artificial Intelligence and Statistics. 2650–2660.
    [74]
    Heiko Zimmermann, Hao Wu, Babak Esmaeili, and Jan-Willem van de Meent. 2021. Nested Variational Inference. https://openreview.net/forum?id=kBrHzFtwdp
    [75]
    Adam Ścibior, Ohad Kammar, and Zoubin Ghahramani. 2018. Functional programming for modular Bayesian inference. Proceedings of the ACM on Programming Languages, 2, ICFP (2018), July, 83:1–83:29. https://doi.org/10.1145/3236778
    [76]
    Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton, Hongseok Yang, Yufei Cai, Klaus Ostermann, Sean K. Moss, Chris Heunen, and Zoubin Ghahramani. 2017. Denotational validation of higher-order Bayesian inference. Proceedings of the ACM on Programming Languages, 2, POPL (2017), Dec., 60:1–60:29. https://doi.org/10.1145/3158148

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 8, Issue PLDI
    June 2024
    2198 pages
    EISSN:2475-1421
    DOI:10.1145/3554317
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2024
    Published in PACMPL Volume 8, Issue PLDI

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. automatic differentiation
    2. correctness
    3. probabilistic programming
    4. semantics
    5. variational inference

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 173
      Total Downloads
    • Downloads (Last 12 months)173
    • Downloads (Last 6 weeks)175

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media