skip to main content
10.5555/3618408.3620080guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

SE(3) diffusion model with application to protein backbone generation

Published: 23 July 2023 Publication History
  • Get Citation Alerts
  • Abstract

    The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure. Code: https://github.com/jasonkyuyim/se3_diffusion

    References

    [1]
    Ahdritz, G., Bouatta, N., Kadyan, S., Xia, Q., Gerecke, W., O'Donnell, T. J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv, 2022.
    [2]
    Albergo, M. S., Boyda, D., Hackett, D. C., Kanwar, G., Cranmer, K., Racanière, S., Rezende, D. J., and Shanahan, P. E. Introduction to normalizing flows for lattice field theory. arXiv preprint arXiv:2101.08176, 2021.
    [3]
    Anand, N. and Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022.
    [4]
    Arunachalam, P. S., Walls, A. C., Golden, N., Atyeo, C., Fischinger, S., Li, C., Aye, P., Navarro, M. J., Lai, L., Edara, V. V., et al. Adjuvanting a subunit COVID-19 vaccine to induce protective immunity. Nature, 594(7862): 253-258, 2021.
    [5]
    Ba, J. L., Kiros, J. R., and Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
    [6]
    Barfoot, T., Forbes, J. R., and Furgale, P. T. Pose estimation using linearized rotations and quaternion algebra. Acta Astronautica, 68(1):101-112, 2011.
    [7]
    Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E. The protein data bank. Nucleic Acids Research, 28(1): 235-242, 2000.
    [8]
    Carmo, M. P. a. Riemannian Geometry / Manfredo Do Carmo ; Translated by Francis Flaherty. Mathematics. Theory and Applications. Birkhäuser, 1992.
    [9]
    Chen, T., Zhang, R., and Hinton, G. Analog bits: Generating discrete data using diffusion models with selfconditioning. International Conference on Learning Representations (ICLR), 2023.
    [10]
    Corso, G., Stärk, H., Jing, B., Barzilay, R., and Jaakkola, T. Diffdock: Diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations (ICLR), 2023.
    [11]
    Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Courbet, A., de Haas, R. J., Bethel, N., Leung, P. J. Y., Huddy, T. F., Pellock, S., Tischer, D., Chan, F., Koepnick, B., Nguyen, H., Kang, A., Sankaran, B., Bera, A. K., King, N. P., and Baker, D. Robust deep learning-based protein sequence design using ProteinMPNN. Science, 378(6615):49-56, 2022.
    [12]
    De Bortoli, V., Mathieu, E., Hutchinson, M., Thornton, J., Teh, Y. W., and Doucet, A. Riemannian Score-Based Generative Modeling. In Advances in Neural Information Processing Systems, 2022.
    [13]
    Ding, W., Nakai, K., and Gong, H. Protein design via deep learning. Briefings in Bioinformatics, 23(3):bbac102, 2022.
    [14]
    Ebert, S. and Wirth, J. Diffusive wavelets on groups and homogeneous spaces. Proceedings of the Royal Society of Edinburgh Section A: Mathematics, 141(3):497-520, 2011.
    [15]
    Elesedy, B. and Zaidi, S. Provably strict generalisation benefit for equivariant models. In International Conference on Machine Learning, pp. 2959-2969. PMLR, 2021.
    [16]
    Engh, R. and Huber, R. Structure quality and target parameters. International Tables for Crystallography, 2012.
    [17]
    Faraut, J. Analysis on Lie Groups: An Introduction. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2008.
    [18]
    Fegan, H. The fundamental solution of the heat equation on a compact Lie group. Journal of Differential Geometry, 18(4):659-668, 1983.
    [19]
    Fey, M. and Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
    [20]
    Folland, G. B. A Course in Abstract Harmonic Analysis, volume 29. CRC press, 2016.
    [21]
    Hall, B. C. Lie Groups, Lie Algebras, and Representations, volume 222 of Graduate Texts in Mathematics. Springer International Publishing, 2015.
    [22]
    Harris, W., Fulton, W., and Harris, J. Representation Theory: A First Course. Graduate Texts in Mathematics. Springer New York, 1991.
    [23]
    Herbert, A. and Sternberg, M. MaxCluster: a tool for protein structure comparison and clustering. 2008.
    [24]
    Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020.
    [25]
    Hsu, E. P. Stochastic Analysis on Manifolds. Number 38. American Mathematical Soc., 2002.
    [26]
    Huang, C.-W., Lim, J. H., and Courville, A. C. A variational perspective on diffusion-based generative models and score matching. Advances in Neural Information Processing Systems, 2021.
    [27]
    Huang, C.-W., Aghajohari, M., Bose, A. J., Panangaden, P., and Courville, A. Riemannian diffusion models. In Advances in Neural Information Processing Systems, 2022.
    [28]
    Huang, P.-S., Boyken, S. E., and Baker, D. The coming of age of de novo protein design. Nature, 537(7620): 320-327, 2016.
    [29]
    Ikeda, N. and Watanabe, S. Stochastic Differential Equations and Diffusion Processes. Elsevier, 2014.
    [30]
    Ingraham, J., Baranov, M., Costello, Z., Frappier, V., Ismail, A., Tie, S., Wang, W., Xue, V., Obermeyer, F., Beam, A., and Grigoryan, G. Illuminating protein space with a programmable generative model. bioRxiv, 2022.
    [31]
    Jing, B., Corso, G., Chang, J., Barzilay, R., and Jaakkola, T. S. Torsional diffusion for molecular conformer generation. In Advances in Neural Information Processing Systems.
    [32]
    Jumper, J. M., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Zídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D. A., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P., and Hassabis, D. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583 - 589, 2021.
    [33]
    Kabsch, W. and Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12):2577-2637, 1983.
    [34]
    Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems.
    [35]
    Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [36]
    Knapp, A. W. and Knapp, A. Lie Groups: Beyond An Introduction, volume 140. Springer, 1996.
    [37]
    Köhler, J., Klein, L., and Noé, F. Equivariant flows: exact likelihood generative learning for symmetric densities. In International Conference on Machine Learning, 2020.
    [38]
    Lane, T. J. Protein structure prediction has reached the single-structure frontier. Nature Methods, pp. 1-4, January 2023.
    [39]
    Leach, A., Schmon, S. M., Degiacomi, M. T., and Willcocks, C. G. Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
    [40]
    Lee, J. M. Smooth manifolds. In Introduction to Smooth Manifolds, pp. 1-31. Springer, 2013.
    [41]
    Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., and Rives, A. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123-1130, 2023.
    [42]
    Luo, S., Su, Y., Peng, X., Wang, S., Peng, J., and Ma, J. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, 2022.
    [43]
    Murray, R., Li, Z., Sastry, S., and Sastry, S. A Mathematical Introduction to Robotic Manipulation. Taylor & Francis.
    [44]
    Nikolayev, D. I. and Savyolov, T. I. Normal distribution on the rotation group SO(3). Textures and Microstructures, 29, 1970.
    [45]
    Pollard, D. A User's Guide to Measure Theoretic Probability. Cambridge University Press, 2002.
    [46]
    Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F., and Anandkumar, A. Dynamic-backbone protein-ligand structure prediction with multiscale generative diffusion models. arXiv preprint arXiv:2209.15171, 2022.
    [47]
    Quijano-Rubio, A., Ulge, U. Y., Walkey, C. D., and Silva, D.-A. The advent of de novo proteins for cancer immunotherapy. Current Opinion in Chemical Biology, 56: 119-128, 2020. Next Generation Therapeutics.
    [48]
    Sola, J., Deray, J., and Atchuthan, D. A micro Lie theory for state estimation in robotics. arXiv preprint arXiv:1812.01537, 2018.
    [49]
    Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
    [50]
    Trippe, B. L., Yim, J., Tischer, D., Broderick, T., Baker, D., Barzilay, R., and Jaakkola, T. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. International Conference on Learning Representations (ICLR), 2023.
    [51]
    Urain, J., Funk, N., Chalvatzaki, G., and Peters, J. Se (3)-diffusionfields: Learning cost functions for joint grasp and motion optimization through diffusion. arXiv preprint arXiv:2209.03855, 2022.
    [52]
    van Kempen, M., Kim, S. S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C. L., Söding, J., and Steinegger, M. Fast and accurate protein structure search with foldseek. Nature Biotechnology, pp. 1-4, 2023.
    [53]
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
    [54]
    Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., Ahern, W., Borst, A. J., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Hanikel, N., Pellock, S. J., Courbet, A., Sheffler, W., Wang, J., Venkatesh, P., Sappington, I., Torres, S. V., Lauko, A., De Bortoli, V., Mathieu, E., Barzilay, R., Jaakkola, T. S., DiMaio, F., Baek, M., and Baker, D. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, 2022.
    [55]
    Weyl, H. and Peter, P. Die Vollständigkeit der primitiven Darstellungen einer geschlossenen kontinuierlichen Gruppe. 97:737-755, 1927.
    [56]
    Wu, K. E., Yang, K. K., Berg, R. v. d., Zou, J. Y., Lu, A. X., and Amini, A. P. Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022.
    [57]
    Xu, M., Yu, L., Song, Y., Shi, C., Ermon, S., and Tang, J. GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation. In International Conference on Learning Representations, 2022.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'23: Proceedings of the 40th International Conference on Machine Learning
    July 2023
    43479 pages

    Publisher

    JMLR.org

    Publication History

    Published: 23 July 2023

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media