Model fine-tuningΒΆ
Note
Currently, fine-tuning is only available for MACE models.
Warning
If the downstream dataset contains fewer elements than the training set, you might run into silent inconsistencies where some atomic numbers are mapped to the wrong specie indices.
Starting from mlip 0.1.4, you can (and should) pass the .dataset_info
of a trained ForceField
to GraphDatasetBuilder
when preparing data for
downstream tasks.
In the following, we describe how to fine-tune an MLIP model with this library. A common use case is fine-tuning a pre-trained MLIP model on additional data to improve its accuracy for specific types of chemical systems.
We recall that an MLIP model can be trained using multiple read-out heads. Note that
currently, this is just implemented for the MACE architecture. The number of read-out
heads can be set via num_readout_heads
in
MaceConfig
.
By default, one trains a model with only one read-out head. However, it does
not matter for this fine-tuning step whether a model already has N read-out heads,
it can be fine-tuned by adding more heads and optimizing their associated weights only.
The final energy prediction of a model is obtained by summing the outputs
of the N read-out heads.
To fine-tune a given model, set up the new model with at least one more read-out head than the pre-trained model.
from mlip.models import Mace, ForceField
pretrained_model_params = _get_params_for_pretrained_model() # placeholder
# Make sure the new model you create has at least one more read-out head
mace = Mace(Mace.Config(num_readout_heads=2), dataset_info)
initial_force_field = ForceField.from_mlip_network(mace)
Now, we can transfer the pre-trained parameters into the new parameter object by using
the function
transfer_params()
:
from mlip.models.params_transfer import transfer_params
transferred_params, finetuning_blocks = transfer_params(
pretrained_model_params,
initial_force_field.params,
scale_factor=0.1,
)
As shown above, you have the option to rescale the randomly initialized
parameters of the additional heads by setting the keyword argument
scale_factor
accordingly. Rescaling the yet untrained
parameters to values close to zero can potentially aid model learning, as it initializes
the model to be close to the pre-trained model at the start of the training.
As part of our initial testing, we have found that models that are trained with a
scale factor of 1.0 could sometimes not be optimized back to the quality of the
pre-trained model. At the same time, a scale factor of 0.0 keeps all untrained
weights at zero at the start of the training which prevents proper gradient flow.
Therefore, we recommend to apply a non-zero scale factor of about 0.1 or lower which worked well in our initial tests. However, we also encourage users to experiment with this hyperparameter themselves.
The resulting transferred_params
have the shape of your new model, but the new
heads are not yet optimized. The other parameters are taken from the pre-trained
model. The second output of the function finetuning_blocks
holds a list
of module names inside the parameters that correspond to the blocks of untrained
parameters. This list will be needed for the subsequent step.
In the final step of preparing a model fine-tuning, we need to mask the optimizer to
only update the untrained parameters. This can be easily done with the utility function
mask_optimizer_for_finetuning()
:
from mlip.training.finetuning_utils import mask_optimizer_for_finetuning
optimizer = _set_up_optimizer_like_for_normal_model_training() # placeholder
masked_optimizer = mask_optimizer_for_finetuning(
optimizer, transferred_params, finetuning_blocks
)
# Go on to set up a normal training with the masked_optimizer
# and the transferred_params
Subsequently, fine-tuning this model works exactly like the normal model training. All code can be reused. Creating a force field that is needed for training with the transferred parameters works like this:
from mlip.models import ForceField
force_field = ForceField(initial_force_field.predictor, transferred_params)
To summarize, there are only four additional steps that are required for fine-tuning in contrast to a regular model training:
Loading the original pre-trained model parameters and setting up a new model that has the same configuration but with one or more additional read-out heads.
Transfer the parameters using the function
transfer_params()
.Mask the optimizer using the function
mask_optimizer_for_finetuning()
.Make sure the
dataset_info
object is used consistently across pretrainedForceField
andGraphDatasetBuilder
.
Additional note: When fine-tuning on datasets that are quite different to the original dataset which the pre-trained model was trained on, we recommend to add a subset of the original dataset to the dataset the fine-tuning is performed on. The proportion to which the original dataset should extend the new data points (e.g., 50:50 or 90:10 ratio) is a hyperparameter to experiment with and the optimal choice may depend on how chemically different the new data is from the original data.