Calculating the "fingerprints" of molecules with artificial intelligence

The graphical neural network GNN receives small molecules as input with the task of determining their spectral responses. By matching them with the known spectra, the GNN programme learns to calculate spectra reliably.

The graphical neural network GNN receives small molecules as input with the task of determining their spectral responses. By matching them with the known spectra, the GNN programme learns to calculate spectra reliably. © K. Singh, A. Bande/HZB

With conventional methods, it is extremely time-consuming to calculate the spectral fingerprint of larger molecules. But this is a prerequisite for correctly interpreting experimentally obtained data. Now, a team at HZB has achieved very good results in significantly less time using self-learning graphical neural networks.

"Macromolecules but also quantum dots, which often consist of thousands of atoms, can hardly be calculated in advance using conventional methods such as DFT," says PD Dr. Annika Bande at HZB. With her team she has now investigated how the computing time can be shortened by using methods from artificial intelligence.

The idea: a computer programme from the group of "graphical neural networks" or GNN receives small molecules as input with the task of determining their spectral responses. In the next step, the GNN programme compares the calculated spectra with the known target spectra (DFT or experimental) and corrects the calculation path accordingly. Round after round, the result becomes better. The GNN programme thus learns on its own how to calculate spectra reliably with the help of known spectra.

"We have trained five newer GNNs and found that enormous improvements can be achieved with one of them, the SchNet model: The accuracy increases by 20% and this is done in a fraction of the computation time," says first author Kanishka Singh. Singh participates in the HEIBRiDS graduate school and is supervised by two experts from different backgrounds: computer science expert Prof. Ulf Leser from Humboldt University Berlin and theoretical chemist Annika Bande.

"Recently developed GNN frameworks could do even better," she says. "And the demand is very high. We therefore want to strengthen this line of research and are planning to create a new postdoctoral position for it from summer onwards as part of the Helmholtz project "eXplainable Artificial Intelligence for X-ray Absorption Spectroscopy"."

 

Annotation:

The work was carried out within the framework of the HEIBRiDS graduate school and is being supported by the Helmholtz project "eXplainable Artificial Intelligence for X-ray Absorption Spectroscopy" (XAI-4-XAS).

The core of the project is to extend GNN, as used at HZB, to very large molecules in combination with the probabilistic analysis of molecular motifs developed at HEREON. It is used to capture only the relevant part of the configuration phase space of the molecules, which is necessary for the accurate prediction of X-ray spectra. The results of the ML predictions allow a rigorous interpretation of XAS experiments, so that characteristic parts of the spectrum of an extended material can be assigned 1:1 to its specific structural subgroups.

 

arö

  • Copy link

You might also be interested in

  • Protein crystallography at BESSY II: faster, better and more and more automatic
    Interview
    04.03.2026
    Protein crystallography at BESSY II: faster, better and more and more automatic
    Many diseases are linked to malfunctions of proteins in the organism. The three-dimensional architecture of these molecules is often highly complex, but it can provide valuable insights into biological processes and the development of drugs. X-ray diffraction at the MX beamlines of BESSY II can be used to decipher the 3D structure of proteins. To date, more than 5000 structures have been solved at the three MX beamlines. Here, we present a review and an outlook with  Manfred Weiss, head of the research group for macromolecular crystallography. 
  • 5000th protein structure at BESSY II: Starting point for a COVID drug
    Science Highlight
    26.02.2026
    5000th protein structure at BESSY II: Starting point for a COVID drug
    Many proteins have a complex architecture that enables biological functions. Molecules can bind to specific sites on a protein and alter its function. A team at HZB has now investigated the Nsp1 protein, which plays a role in infection with the SARS-CoV-2 virus. They analysed protein crystals, previously mixed with molecules from a fragment library, and discovered a total of 21 candidates as starting points for drug development. At the same time, they also decoded the 5000th structure at BESSY II.
  • What Zinc concentration in teeth reveals
    Science Highlight
    19.02.2026
    What Zinc concentration in teeth reveals
    Teeth are composites of mineral and protein, with a bulk of bony dentin that is highly porous. This structure is allows teeth to be both strong and sensitive. Besides calcium and phosphate, teeth contain trace elements such as zinc. Using complementary microscopy imaging techniques, a team from Charité Berlin, TU Berlin and HZB has quantified the distribution of natural zinc along and across teeth in 3 dimensions. The team found that, as porosity in dentine increases towards the pulp, zinc concentration increases 5~10 fold. These results help to understand the influence of widely-used zinc-containing biomaterials (e.g. filling) and could inspire improvements in dental medicine.