Conformer selection

Purpose

Organic molecules are flexible and able to adopt multiple conformations. These differ in energy due to strain and subtle changes in intramolecular atomic interactions. This benchmark evaluates the MLIP’s ability to identify the most stable conformers within an ensemble of flexible organic molecules and accurately predict their relative energy differences. It focuses on capturing subtle intramolecular interactions and strain effects that influence conformational energies. These metrics assess both numerical accuracy and the MLIP’s ability to preserve relative conformer energetics, which is critical for downstream applications like conformational sampling and ranking.

Description

For each system, the benchmark leverages the mlip library for model inference, comparing the predicted energies and forces against quantum mechanical QM reference data. Performance is quantified using the following metrics:

  • MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) for total energies (in kcal/mol)

  • Spearman rank correlation coefficient for conformer energy ordering

Dataset

The Wiggle150 [1] dataset of highly strained conformers, contains 50 conformers for each of three representative drug-like molecules: Adenosine, Benzylpenicillin, and Efavirenz.

../../_images/ado00.png

Adenosine

../../_images/bpn00.png

Benzylpenicillin

../../_images/efa00.png

Efavirenz

Interpretation

This benchmark assesses the numerical accuracy and the ability to preserve relative conformer energies of the MLIP’s energy inference method. This is critical for downstream applications like conformer sampling and ranking. The MAE and RMSE of the energy inference should be as low as possible and match the expectations on accuracy of the MLIP during training and testing. Since the energy differences in this dataset are rather large, the Spearman correlation should be close to 1.

References