Skip to content

Mace Refactor#99

Draft
pdobbelaere wants to merge 20 commits intomolmod:mainfrom
pdobbelaere:mace
Draft

Mace Refactor#99
pdobbelaere wants to merge 20 commits intomolmod:mainfrom
pdobbelaere:mace

Conversation

@pdobbelaere
Copy link
Copy Markdown
Collaborator

@pdobbelaere pdobbelaere commented Mar 30, 2026

Reimplementation of the psiflow MACE interface to reduce coupling with external libraries. Updates to MACE v0.3.15 and a recent PyTorch version. Also removes the Model abstraction as MACE is the only implementation right now. It can be reintroduced later.

USAGE

While the MACE class is rewritten, most of the interface should be similar/intuitive. Users should only interact with public class/instance methods (see here), e.g.:

from psiflow.models import MACE

mace = MACE.create(new_path, config)
# or
mace  = MACE.load(existing_path)

mace.add_atomic_energy("Uo", -68.999)
mace.update_kwargs(max_num_epochs=67, default_dtype="float64" )

mace.train(train, val)

hamiltonian = mace.create_hamiltonian()

A MACE instance occupies a folder on disk in which it stores all information (models, checkpoints, logs), structured as follows:

some_mace_folder
├── checkpoints
│   ├── ...
├── logs
│   ├── ...
└── results
    ├── ...
├── config.yaml
├── last.model
├── last.pt

which is a slight modification of the default MACE folder structure. Only config.yaml, last.model and last.pt are psiflow files; everything else is standard MACE.

Some notable changes:

  • training runs are restarted from the latest checkpoint(last.pt) by default
  • the MACE instance root dir is updated automatically after training (no manual saving)
  • training runs are numbered, and earlier models/checkpoints are kept on disk
  • atomic energies are piped directly to the MACE model atomic_energies layer - make sure to define them for every element in your training set

Hamiltonian

The MACEHamiltonian/MACEFunction implementation now relies on the MACE-ASE calculator. To access new MACE inference features (multi-head evaluation, CuEquivariance, ...), check the latest implementation. You can specify these options in the MACEHamiltonian:

hamiltonian = MACEHamiltonian(path, kwargs={"head": "less"})
# or 
hamiltonian = MACEHamiltonian(path)
hamiltonian.update_kwargs(enable_cueq=True)

Foundation models can be used through the MACEHamiltonian.from_foundation() method.

INSTALLATION

A lot of packages/dependencies have been updated since v4.0.0-final. An updated installation script can be found under psiflow/install_local.sh. It might need minor tweaks to work.

OTHER CHANGES

Some Function subclasses have been slightly updated, but this does not impact usage. The EinsteinCrystal and Harmonic hamiltonians have minor API changes.

KNOWN LIMITATIONS

MACE models are not stateless (ACEsuit/mace#1415), meaning checkpoints are not truly interchangeable. If training fails - e.g., due to walltime - you will have checkpoints for the run, but no correct model to load them into.

new interface to train MACE models
- all MACE CLI options should be available
- atomic energies are passed straight into the MACE model (all elements are required to work)
- the training dir is more important, and is updated automatically (no more save model)
- we keep all MACE output (log files, older models/checkpoints)
- training runs automatically restart from the previous model
- work with the classmethods
- remove atomic_energies
- rely on MACECalculator
- hamiltonians are now immutable
- removed excess future creation in .parameters() calls
- changed the API for EinsteinCrystal and Harmonic
- equality comparisons will break if instance attributes are futures
mostly changes to EinsteinCrystal instance creation
the 'inputs' kwarg set by partial was overwritten by later code -> change to 'wait_for' keyword
we use the torchrun thingie, which limits us to single-node training
'restart_latest' is really meant to continue an unfinished training run
with 'foundation_model' you can start a new training run from an older model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant