Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

An algorithmic framework for synthetic cost-aware decision making in molecular design

A preprint version of the article is available at arXiv.

Abstract

Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing and testing sets of molecules. The selection of molecules to synthesize from all possible candidates is a complex decision-making process that typically relies on expert chemist intuition. Here we propose a quantitative decision-making framework, SPARROW, that prioritizes molecules for evaluation by balancing expected information gain and synthetic cost. SPARROW integrates molecular design, property prediction and retrosynthetic planning to balance the utility of testing a molecule with the cost of batch synthesis. We demonstrate, through three case studies, that the developed algorithm captures the non-additive costs inherent to batch synthesis, leverages common reaction steps and intermediates, and scales to hundreds of molecules.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of SPARROW and its role within the molecular design cycle.
Fig. 2: SPARROW’s problem formulation.
Fig. 3: Demonstration of SPARROW’s ability to balance cost and reward on a 14-member candidate library of putative ASCT2 inhibitors.
Fig. 4: Results of SPARROW applied to an autonomous molecular design cycle from ref. 33.
Fig. 5: SPARROW’s proposed routes for case 2 with λ = [3, 1, 1].
Fig. 6: Example set of synthetic routes selected by SPARROW for case 3 using λ = [30, 1, 5].

Similar content being viewed by others

Data availability

SMILES and rewards used for all case studies32,33,37 can be found at github.com/coleygroup/sparrow/tree/main/examples. All results can be reproduced using included configuration files in the same repository52. Source data are provided with this paper.

Code availability

SPARROW is open source and can be found at github.com/coleygroup/sparrow (ref. 52). All code and retrosynthetic routes from ASKCOS used to generate the described results can be found at github.com/coleygroup/sparrow/tree/main/examples. Full candidate sets with configuration files are included in this repository both for reproducibility and as examples for use of SPARROW.

References

  1. Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).

    Article  Google Scholar 

  2. Méndez-Lucio, O., Baillif, B., Clevert, D.-A., Rouquié, D. & Wichard, J. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun. 11, 10 (2020).

    Article  Google Scholar 

  3. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    Article  Google Scholar 

  4. Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).

    Article  Google Scholar 

  5. Thakkar, A., Chadimová, V., Bjerrum, E. J., Engkvist, O. & Reymond, J.-L. Retrosynthetic Accessibility Score (RAscore)—rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chem. Sci. 12, 3339–3349 (2021).

    Article  Google Scholar 

  6. Liu, C.-H. et al. RetroGNN: fast estimation of synthesizability for virtual screening and de novo design by learning from slow retrosynthesis software. J. Chem. Inf. Model. 62, 2293–2300 (2022).

    Article  Google Scholar 

  7. Andersson, S. et al. Making medicinal chemistry more effective—application of Lean Sigma to improve processes, speed and quality. Drug Discov. Today 14, 598–604 (2009).

    Article  Google Scholar 

  8. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  Google Scholar 

  9. Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Article  Google Scholar 

  10. Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminform. 12, 70 (2020).

    Article  Google Scholar 

  11. Badowski, T., Molga, K. A. & Grzybowski, B. Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans. Chem. Sci. 10, 4640–4651 (2019).

    Article  Google Scholar 

  12. Gao, W., Mercado, R. & Coley, C. W. Amortized tree generation for bottom-up synthesis planning and synthesizable molecular design. In International Conference on Learning Representations https://openreview.net/forum?id=FRxhHdnxt1 (OpenReview.net, 2022).

  13. Zhang, Q., Liu, C., Wu, S., Hayashi, Y. & Yoshida, R. A Bayesian method for concurrently designing molecules and synthetic reaction networks. Sci. Technol. Adv. Mater. Methods 3, 2204994 (2023).

    Google Scholar 

  14. Breznik, M. et al. Prioritizing small sets of molecules for synthesis through in-silico tools: a comparison of common ranking methods. ChemMedChem 18, e202200425 (2023).

    Article  Google Scholar 

  15. Frazier, P. I. Bayesian Optimization. INFORMS TutORials in Operations Research https://doi.org/10.1287/educ.2018.0188 (2018).

  16. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016).

    Article  Google Scholar 

  17. Korovina, K. et al. ChemBO: Bayesian optimization of small organic molecules with synthesizable recommendations. In Proc. Twenty Third International Conference on Artificial Intelligence and Statistics (eds Chiappa, S. & Calandra, R.) 3393–3403 (PMLR, 2020).

  18. Pyzer-Knapp, E. O. Bayesian optimization for accelerated drug discovery. IBM J. Res. Dev. 62, 2:1–2:7 (2018).

    Article  Google Scholar 

  19. Sasena, M. J. Flexibility and Efficiency Enhancements for Constrained Global Design Optimization with Kriging Approximations. PhD Thesis, Univ. of Michigan (2002).

  20. Huang, D., Allen, T. T., Notz, W. I. & Miller, R. A. Sequential Kriging optimization using multiple-fidelity evaluations. Struct. Multidiscip. Optim. 32, 369–382 (2006).

    Article  Google Scholar 

  21. Palizhati, A., Torrisi, S. B. & Aykol, M. et al. Agents for sequential learning using multiple-fidelity data. Sci. Rep. 12, 4694 (2022).

    Article  Google Scholar 

  22. Zanjani Foumani, Z., Shishehbor, M., Yousefpour, A. & Bostanabad, R. Multi-fidelity cost-aware Bayesian optimization. Comput. Methods Appl. Mech. Eng. 407, 115937 (2023).

    Article  MathSciNet  Google Scholar 

  23. Molga, K., Dittwald, P. & Grzybowski, B. A. Computational design of syntheses leading to compound libraries or isotopically labelled targets. Chem. Sci. 10, 9219–9232 (2019).

    Article  Google Scholar 

  24. Gao, H., Pauphilet, J., Struble, T. J., Coley, C. W. & Jensen, K. F. Direct optimization across computer-generated reaction networks balances materials use and feasibility of synthesis plans for molecule libraries. J. Chem. Inf. Model. 61, 493–504 (2021).

    Article  Google Scholar 

  25. Gao, H. et al. Combining retrosynthesis and mixed-integer optimization for minimizing the chemical inventory needed to realize a WHO essential medicines list. Reaction Chem. Eng. 5, 367–376 (2020).

    Article  Google Scholar 

  26. Marvin, W. A., Rangarajan, S. & Daoutidis, P. Automated generation and optimal selection of biofuel-gasoline blends and their synthesis routes. Energy Fuels 27, 3585–3594 (2013).

    Article  Google Scholar 

  27. Dahmen, M. & Marquardt, W. Model-based formulation of biofuel blends by simultaneous product and pathway design. Energy Fuels 31, 4096–4121 (2017).

    Article  Google Scholar 

  28. König, A., Neidhardt, L., Viell, J., Mitsos, A. & Dahmen, M. Integrated design of processes and products: optimal renewable fuels. Comput. Chem. Eng. 134, 106712 (2020).

    Article  Google Scholar 

  29. Adjiman, C. S. et al. Process systems engineering perspective on the design of materials and molecules. Ind. Eng. Chem. Res. 60, 5194–5206 (2021).

    Article  Google Scholar 

  30. Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Central Sci. 3, 434–443 (2017).

    Article  Google Scholar 

  31. Chemspace Services: Compound Sourcing and Procurement, Hit Discovery, Molecular Docking, Custom Synt; https://chem-space.com/services (accessed October 2023).

  32. Garibsingh, R.-A. A. et al. Rational design of ASCT2 inhibitors using an integrated experimental-computational approach. Proc. Natl Acad. Sci. USA 118, e2104093118 (2021).

    Article  Google Scholar 

  33. Koscher, B. A. et al. Autonomous, multiproperty-driven molecular discovery: from predictions to measurements and back. Science 382, eadi1407 (2023).

    Article  Google Scholar 

  34. Barry, C. E. Lessons from seven decades of antituberculosis drug discovery. Curr. Topics Med. Chem. 11, 1216–1225 (2011).

    Article  Google Scholar 

  35. Wesolowski, S. S. & Brown, D. G. Lead Generation 487–512 (John Wiley & Sons, 2016).

  36. Brown, D. G. & Boström, J. Analysis of past and present synthetic methodologies on medicinal chemistry: where have all the new reactions gone? J. Med. Chem. 59, 4443–4458 (2016).

    Article  Google Scholar 

  37. Button, A., Merk, D., Hiss, J. A. & Schneider, G. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nat. Mach. Intell. 1, 307–315 (2019).

    Article  Google Scholar 

  38. Dunning, I., Mitchell, S. & O’Sullivan, M. PuLP: A Linear Programming Toolkit for Python (Univ. Auckland, 2011).

  39. Forrest, J. et al. coin-or/Cbc: release releases/2.10.11 (2023); https://zenodo.org/doi/10.5281/zenodo.2720283 (accessed October 2023).

  40. Klotz, E. & Newman, A. M. Practical guidelines for solving difficult linear programs. Surveys Oper. Res. Manag. Sci. 18, 1–17 (2013).

    MathSciNet  Google Scholar 

  41. Klotz, E. in Bridging Data and Decisions, INFORMS TutORials in Operations Research (eds Newman, A. & Leung, J.) 54–108 (INFORMS, 2014).

  42. Benders, J. F. Partitioning procedures for solving mixed-variables programming problems. Numer. Math. 4, 238–252 (1962).

    Article  MathSciNet  Google Scholar 

  43. Grzybowski, B. A., Badowski, T., Molga, K. & Szymkuć, S. Network search algorithms and scoring functions for advanced-level computerized synthesis planning. WIREs Comput. Mol. Sci. 13, e1630 (2023).

    Article  Google Scholar 

  44. Wen, M. et al. Chemical reaction networks and opportunities for machine learning. Nat. Comput. Sci. 3, 12–24 (2023).

    Article  Google Scholar 

  45. Levin, I., Fortunato, M. E., Tan, K. L. & Coley, C. W. Computer-aided evaluation and exploration of chemical spaces constrained by reaction pathways. AIChE J. 69, e18234 (2023).

    Article  Google Scholar 

  46. Götz, J. et al. High-throughput synthesis provides data for predicting molecular properties and reaction success. Sci. Adv. 9, eadj2314 (2023).

    Article  Google Scholar 

  47. Casetti, N., Alfonso-Ramos, J. E., Coley, C. W. & Stuyver, T. Combining molecular quantum mechanical modeling and machine learning for accelerated reaction screening and discovery. Chem. A Eur. J. 29, e202301957 (2023).

    Article  Google Scholar 

  48. Pasquini, M. & Stenta, M. LinChemIn: Syngraph—a data model and a toolkit to analyze and compare synthetic routes. J. Cheminform. 15, 41 (2023).

  49. Pasquini, M. & Stenta, M. LinChemIn: route arithmetic-operations on digital synthetic routes. J. Chem. Inf. Model. 64, 1765–1771 (2024).

    Article  Google Scholar 

  50. Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Central Sci. 4, 1465–1476 (2018).

    Article  Google Scholar 

  51. Coley, C. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Article  Google Scholar 

  52. Fromer, J. & Coley, C. coleygroup/sparrow: v1.0.0 (2024); https://zenodo.org/doi/10.5281/zenodo.11068069

Download references

Acknowledgements

This work was supported by the DARPA Accelerated Molecular Discovery program (contract no. HR00111920025) and the Office of Naval Research (grant no. N00014-21-1-2195). J.C.F. received additional support from the National Science Foundation Graduate Research Fellowship (grant no. 2141064). We are grateful to M. Stenta, M. Pasquini, D. Jimenez and T. Ziegler for participating in discussions that guided the development of SPARROW. We are also grateful to M. A. McDonald, B. Koscher, R. Canty and the remaining authors of ref. 33 for providing the candidate set for case 2. Finally, we thank B. Mahjour and A. Zhang for providing insight into the validity of reactions and conditions proposed by retrosynthetic software.

Author information

Authors and Affiliations

Authors

Contributions

C.W.C. and J.C.F. conceptualized the project, validated the method, analyzed results and wrote the paper. J.C.F. curated the data and wrote the software. C.W.C. supervised the work.

Corresponding author

Correspondence to Connor W. Coley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Mingyue Zheng and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Tables 1–4.

Supplementary Data 1

Starting material prices from the ChemSpace API in October 2023 and March 2024 used in the first case study, plotted in Supplementary Fig. 1a.

Supplementary Data 2

Starting material prices from the ChemSpace API in October 2023 and March 2024 used in the second case study, plotted in Supplementary Fig. 1b.

Source data

Source Data Fig. 3

Numerical source data; reaction SMILES, scores and conditions

Source Data Fig. 4

Numerical source data for a–d

Source Data Fig. 5

Reaction SMILES, scores and conditions

Source Data Fig. 6

Reaction SMILES, scores and conditions

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fromer, J.C., Coley, C.W. An algorithmic framework for synthetic cost-aware decision making in molecular design. Nat Comput Sci 4, 440–450 (2024). https://doi.org/10.1038/s43588-024-00639-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-024-00639-y

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics