ESR3 Prediction of chemical synthesis using NLP models

Updated: almost 3 years ago
Location: Germany,
Job Type: FullTime
Deadline: 31 Jul 2021

Advanced machine learning for Innovative Drug Discovery (AIDD) Project (http://ai-dd.eu ):

Machine learning is changing our society, as exemplified by speech and image recognition applications. Also the life sciences change rapidly through the use of artificial intelligence, and it is expected that fields like drug development can take advantage of machine learning. The main goal of the AIDD project is to train and prepare the next generation of scientists who need to have skills in both machine learning and drug discovery and will, after graduating, be able to helping speeding up the drug development process. The European Marie Skłodowska-Curie Innovative Training Network funds the AIDD project that brings together twelve academic partners (Helmholtz Zentrum München (coordinator), Germany ; Aalto University, Finland ; Freie Universität Berlin, Germany ; Katholieke Universiteit Leuven, Belgium ; Johannes Kepler Universität Linz, Austria; The Swiss AI Lab IDSIA, Switzerland ; TU Dortmund, Germany ; Universiteit Leiden, Netherlands ; Université du Luxembourg, Luxembourg ; University of Vienna, Austria ; Universitat Pompeu Fabra, Spain and Vancouver Prostate Center, University of British Columbia, Canada) as well as four industrial partners (AstraZeneca, Sweden ; Bayer Aktiengesellschaft , Germany ; Janssen Pharmaceutica NV , Belgium and Enamine Limited Liability Company, Ukraine ).

The AIDD network offers 15 PhD fellowships. The employed fellows will be supervised by academics who have strong technical expertise and have contributed to some of the fundamental AI algorithms which are used billions of times each day in the world, and by machine learning scientists working at pharmaceutical companies. The developed methods by the fellows will contribute to an integrated "One Chemistry" model that can predict outcomes ranging from different properties to molecule generation and synthesis. The network will offer comprehensive, structured training through a well-elaborated Curriculum, online courses, and six schools.

Each fellow will perform research 1.5 years at an academic partner and 1.5 years at an industrial partner.

Description of the ESR3 position:

Chemical synthesis is critical to further increase life quality by contributing to new medicine and new materials. The optimal synthesis can decrease its costs as well as the amount of produced chemical waste. The prediction of the direct, i.e., which new chemical compound results by mixing a set of reactants, or retro-synthesis, which compounds are starting materials to make a given product, is the cornerstone of chemical synthesis. The ESR3 will develop a new method (based on the preliminary results [1,2]) to predict the outcome of reactions. The goal is to extend the published models by incorporating additional information about experiments (reagents, catalyst, solvent, temperature, etc.) and expert knowledge. The fellow will actively collaborate with ESR13 (QM models for reactivity prediction), ESR4 (prediction of the yield of chemical reactions), and ESR7 (multi-objective synthesis planning) and develop a solid theoretical foundation as well as practical intuition for how additional data and knowledge can improve the models.

Relevant references:

  • Karpov P., Godin G., Tetko I.V.: A Transformer Model for Retrosynthesis. In: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions: 17th - 19th September 2019 2019; Münich. Springer International Publishing: 817-830.
  • Tetko I.V., Karpov P., Van Deursen R., Godin G.: State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat Comm 2020, 11(1):1-11.


  • Similar Positions