Yes — for your project, that quote strongly supports a
controlled candidate-selection strategy, not aggressive hyperparameter optimization.
And yes, I would phrase the workflow like this:
In 03b / 03c
You do
not make the final scientific choice there.
Those notebooks are for:
- training candidate models,
- checking that training is healthy,
- collecting train/valid metrics,
- rejecting only clearly bad or broken models.
In 05
You make the
real model-selection decision based on validation relevant to your actual purpose:
- amorphous workflow stability,
- transferability,
- RDF / structure sanity,
- defect / elastic sanity if included,
- NVE stability,
- runtime.
So yes:
03b and 03c are not where the final model is chosen.
They are where the candidate pool is created and filtered.
That matches the ACE paper very well. In the paper, they trained four ACE parametrizations with increasing basis size at fixed cutoff 7.6 Å, observed that larger models improved training accuracy but worsened test behavior and speed, and therefore selected the smallest 514-basis model because transferability and evaluation cost mattered more for their use case.
That is almost exactly the philosophy you should use.
What this means for ACE in Notebook 03c
I would
not do broad “optimization” in the usual ML sense.
Instead, I would do a
small structured sweep inspired directly by the paper:
Fixed
- same dataset split
- same cutoff, at least initially: 7.6 Å
- same fitting workflow
- same training/test protocol
Varied
- basis size / model complexity
So the first ACE candidate set should be something like:
| candidate | purpose |
|---|
| ACE_A | transferability baseline |
| ACE_B | test added flexibility |
| ACE_C | test overfitting trend |
| ACE_D | only if compute allows |
[th]
cutoff
[/th][th]
basis size idea
[/th]
[td]
7.6 Å
[/td][td]
small
[/td]
[td]
7.6 Å
[/td][td]
medium
[/td]
[td]
7.6 Å
[/td][td]
larger
[/td]
[td]
7.6 Å
[/td][td]
very large, optional
[/td]
This is directly aligned with the paper’s result that larger basis sets can fit training better but transfer worse.
My recommendation for your ACE plan
For your case, I would probably do:
Minimal safe plan
- ACE_A = small
- ACE_B = medium
- ACE_C = large
That is probably enough.
You do
not necessarily need four if compute time is limited.
Important nuance: discard vs final selection
I would separate this into two levels:
1. Soft filtering in 03b / 03c
You may still discard models that are clearly unusable, for example:
- training diverged
- forces are obviously poor
- inference/export fails
- model is absurdly slow
- validation error is catastrophically worse
That is not “scientific selection”; that is just removing broken candidates.
2. Final selection in 05
Among the viable candidates, choose the production model using the NCL-oriented validation workflow.
So the answer is:
- do not make the final choice in 03b / 03c
- but do discard obviously bad or failed candidates there
That is the cleanest approach.
For MACE, same logic
The same principle applies to your MACE candidates:
- A, B, C, D all go into Notebook 05 if they train reasonably
- then Notebook 05 tells you which one is best for your purpose
Because the ACE paper shows overfitting with larger basis sets, I would especially pay attention to whether:
- your smaller MACE also transfers better,
- not just whether it gets lower training loss.
Strong scientific message for your paper
This gives you a very good methodological statement:
Candidate ACE and MACE models were first trained using controlled variations in model complexity. Final production models were then selected not by training accuracy alone, but by validation targeted to amorphous Cu–Zr simulations and later NCL-oriented MD use.
That is a strong and defensible approach.
Practical ACE strategy I would use in 03c
Stage 1: complexity sweep at fixed cutoff
Start with the paper-inspired choice:
- cutoff = 7.6 Å
- vary basis size only
This is the most important sweep.
Stage 2: only if needed, small cutoff test
Only after that, maybe test:
- one shorter cutoff candidate, like 6.0 or 6.5 Å
But only if:
- speed becomes a problem, or
- 7.6 Å models do not behave well.
Otherwise, I would stay close to the paper and keep 7.6 Å first.
My concrete suggestion
For Notebook 03c, use something like:
| model | role |
|---|
| ACE_A | likely best transferability |
| ACE_B | middle ground |
| ACE_C | test overfitting / cost |
| ACE_D | only if you want a speed test |
[th]
cutoff
[/th][th]
complexity
[/th]
[td]
7.6 Å
[/td][td]
small
[/td]
[td]
7.6 Å
[/td][td]
medium
[/td]
[td]
7.6 Å
[/td][td]
large
[/td]
[td]
optional 6.0–6.5 Å
[/td][td]
small/medium
[/td]
If compute is tight, drop ACE_D.
Bottom line
Yes — the paper you quoted is giving you a very clear lesson:
- bigger is not automatically better
- transferability matters more than training fit
- speed matters
- small, well-behaved models may be the best production choice
So for your project:
- 03b and 03c create candidate models
- broken candidates can be removed there
- final model choice happens in Notebook 05
- the chosen best model is then used for production calculations
When you are ready, we can turn this into a
precise ACE candidate table for Notebook 03c.