AI in Chemistry: Study Highlights Strengths and Weaknesses

Computing power in the chemistry lab: Kevin Jablonka (left) and his team at HIPOLE Jena. Photo: Renzo Paulus

Computing power in the chemistry lab: Kevin Jablonka (left) and his team at HIPOLE Jena. Photo: Renzo Paulus

How well does artificial intelligence perform compared to human experts? A research team at HIPOLE Jena set out to answer this question in the field of chemistry. Using a newly developed evaluation method called “ChemBench,” the researchers compared the performance of modern language models such as GPT-4 with that of experienced chemists. 

The study has recently been published in the journal Nature Chemistry (DOI 10.1038/s41557-025-01815-x).

More than 2,700 chemistry tasks from research and education were tested—ranging from fundamental knowledge to complex problems. In areas such as reaction prediction or the analysis of large datasets, AI models often excelled with high efficiency. However, a critical weakness became apparent: the models also produced confident answers even when they were factually incorrect. Human chemists, by contrast, were more cautious and questioned their own assessments.

“Our study shows that AI can be a valuable tool—but it is no substitute for human expertise,” says Dr. Kevin M. Jablonka, lead author of the study. The findings offer important insights for the responsible use of AI in chemical research and education.

HIPOLE Jena (Helmholtz Institute for Polymers in Energy Applications Jena) is an institute of HZB in cooperation with Friedrich Schiller University Jena (FSU Jena).

ma

  • Copy link

You might also be interested in

  • AI agents deliver results – but do they reason scientifically?
    News
    01.06.2026
    AI agents deliver results – but do they reason scientifically?
    A research team co-led by Kevin Maik Jablonka from the Helmholtz Institute for Polymers in Energy Applications Jena (HIPOLE Jena) and N. M. Anoop Krishnan from the Indian Institute of Technology Delhi has developed Corral, a new benchmark for AI agents in science. The preprint “AI scientists produce results without reasoning scientifically” has been published on arXiv (https://doi.org/10.48550/arXiv.2604.18805). The analysis shows that current systems can execute scientific workflows and deliver results; however, they often do not follow the basic principles of scientific testing and reasoning.
  • Magnetic field during catalyst synthesis triples ammonia yield
    Science Highlight
    01.06.2026
    Magnetic field during catalyst synthesis triples ammonia yield
    Applying an external magnetic field during the synthesis of CoFe₂O₄ electrocatalysts triples the ammonia yield during electrocatalytic conversion. The magnetic field alters the surface states of the spinel oxide thin films, making catalytically active sites more accessible. In the journal 'Advanced Functional Materials', a team led by Marcel Risch at HZB and Sanjay Mathur at University of Cologne demonstrates a scalable strategy for developing next-generation electrocatalysts for efficient and sustainable chemical production.
  • Materials chemistry shapes the future of catalysis
    Science Highlight
    29.05.2026
    Materials chemistry shapes the future of catalysis
    The synthesis of materials can serve as a tool for developing smart, adaptive electrocatalysts. This rapidly evolving field of research involves in-situ analytics, data-driven discoveries and autonomous robotics. These new approaches could accelerate the discovery of long-lasting and efficient catalysts for future energy conversion and the decarbonisation of the chemical industry. A recent article by Dr Prashanth Menezes and his team in the renowned journal Angewandte Chemie provides an overview of this research.