Design and Experimental Validation of a Photocatalyst Recommender Based on a Large Language Model

Abstract
Utilizing an extensive library of literature on photocatalytic transformations, we disclose the development of a machine learning (ML) model for the recommendation of photocatalysts most suitable for reactions of interest. The model is trained on > 36 000 such literature examples and uses an architecture inspired by the Bidirectional Encoder Representations from Transformer (BERT) large language model. Under cross-validation, it can suggest the “correct” photocatalysts with ∼90% accuracy. When experimentally tested on five out-of-box reactions, this algorithm consistently suggested photocatalysts that gave yields competitive to those chosen by human researchers and frequently suggested alternative photocatalysts that are potentially more appealing than the originally selected photocatalyst. Altogether, this platform serves as a valuable tool for researchers undertaking reaction optimization programs. The model is free to use at https://photocatals.grzybowskigroup.pl/predict/. Choosing a photocatalyst for a given reaction can be challenging due to complex mechanisms and multiple parameters that govern the outcome of a photocatalyzed reaction. Herein, we disclose a machine learning (ML) model that can suggest catalysts for a given reaction using an online portal. The model was experimentally validated against five photocatalysis reactions, in all cases suggesting productive photocatalysts. This model serves as a valuable tool for researchers optimizing photocatalysis reactions.
Description
Keywords
Citation
Angew. Chem. Int. Ed. 2026, 65, e14544 (1 of 10) // https://doi.org/10.1002/anie.202514544
Related research dataset
Belongs to collection