Improving the odds of synthetic chemistry success

Thursday, 01 August, 2019

University of Utah chemists Jolene Reid and Matthew Sigman have shown how analysing previously published chemical reaction data can predict how hypothetical reactions may proceed, narrowing the range of conditions chemists need to explore. Their algorithmic prediction process, which includes aspects of machine learning and has been described in the journal Nature, can save valuable time and resources in chemical research.

Chemistry is more than just mixing compound A with compound B to make compound C — there are catalysts that affect the reaction rate, as well as the physical conditions of the reaction and any intermediate steps that lead to the final product. If you’re trying to make a new chemical process for, say, pharmaceutical or materials research, you need to find the best of each of these variables.

It’s a time-consuming trial-and-error process — one that chemists have previously tried to avoid by looking up a similar reaction and mimicking the same conditions. But according to Sigman, “Almost every time, at least in my experience, it doesn’t work well. So then you systematically change the conditions.”

Sigman estimates there are around seven to 10 variables in a typical pharmaceutical reaction, meaning the number of possible combinations of conditions soon becomes overwhelming. He said, “You cannot cover all of this variable space with any type of high throughput operation. We’re talking billions of possibilities.”

Sigman and Reid wanted a way to narrow the focus to a more manageable range of conditions. For their test reaction, they looked at reactions that involve molecules with opposite mirror images of each other — in the same way your right and left hands are mirror images of each other — and that select more for one configuration than another. Such a reaction is called ‘enantioselective’.

Reid collected published scientific reports of 367 forms of reactions involving imines, which have a nitrogen base, and used machine learning algorithms to correlate features of the reactions with how selective they were for the two different forms of imines. The algorithms looked at the reactions’ catalysts, solvents and reactants, and constructed mathematical relationships between those properties and the final selectively of the reaction.

“There’s a pattern hidden beneath the surface of why it works and doesn’t work with this condition, this catalyst, this substrate and so on,” Sigman said.

“The key to our success is that we use information from many reactions,” Reid added.

Reid and Sigman’s model successfully predicted the outcomes of 15 reactions involving one reactant that wasn’t in the original set, and the outcomes of 13 reactions where both a reactant and catalyst type were not in the original set. They also looked at a study that conducted 2150 experiments to find the optimal conditions of 34 reactions — and the model arrived at the same results and same optimal catalyst.

Reid looks forward to applying the model to predicting reactions involving large, complex molecules, saying, “Often you find that new methodologies aren’t fine-tuned to complex systems. Possibly we could do that now by predicting beforehand the best kind of catalyst.”

Sigman added that predictive models can lower the barriers to new drug development, noting, “The pharmaceutical industry doesn’t want to invest money into something that they don’t know if it’s going to work. So if you have an algorithm that suggests this has a high probability of working, you ease the pain.”

Please follow us and share on Twitter and Facebook. You can also subscribe for FREE to our weekly newsletters and bimonthly magazine.