Stability testing¶

Purpose¶

To assess the long-term dynamical stability of a machine-learned interatomic potential (MLIP) during realistic, molecular dynamics (MD) simulations.

Description¶

For each system in the dataset, the benchmark performs a MD simulation using the MLIP model in the NVT ensemble at 300 K for 100,000 steps (100 ps), leveraging the jax-md, as integrated via the mlip library. The test monitors the system for signs of instability by detecting abrupt temperature spikes (explosions) and hydrogen atom drift. These indicators help determine whether the MLIP maintains stable and physically consistent dynamics over simulation times.

Our stability score is computed as:

\[\begin{split}S = \begin{cases} \tfrac12\,\dfrac{fₑ}{N}, & fₑ < N \quad(\text{explosion})\\[6pt] 0.5 + \tfrac12\,\dfrac{fₕ}{N}, & fₑ = N, fₕ < N \quad(\text{H loss})\\[6pt] 1.0, & fₑ = N, fₕ = N \quad(\text{perfect stability}) \end{cases}\end{split}\]

where N is the number of frames in the simulation, fₑ the frame at which the simulation explodes and fₕ, the frame at which the first H atom detaches. We consider a bond to be broken if the H atom’s distance to its bonded atom exceeds 2.5 Angstrom.

Dataset¶

The stability dataset is composed of a series of small molecule and protein systems. Some systems are solvated, others in vacuum. The systems are the following:

Small molecule (HCNO-only) in vacuum

Small molecule containing Sulfur in vacuum

Small molecule containing Halogens in vacuum

Peptide (Neurotensin) in vacuum

Peptide (Oxytocin - contains Sulfur) in vacuum

Large protein (1A7M) in vacuum

Peptide (Neurotensin) solvated with water and counter-ions

Peptide (Oxytocin) solvated with water

The selection ensures that the benchmark systems are representative of the different types of systems that can be encountered in practice.

Interpretation¶

The stability score is a measure of the stability of the MLIP model. A score of 1.0 indicates perfect stability, a score of 0.0 indicates complete instability with respect to the benchmark systems.