optimize positional encoding by javak87 · Pull Request #2172 · ecmwf/WeatherGenerator

javak87 · 2026-04-05T19:04:43Z

Description

This PR introduces a minor change in the code, resulting in a significant performance gain.

Issue Number

Fixes #2173

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

Performance comparison with develop branch

Run ../WeatherGenerator-private/hpc/launch-slurm.py --time 60

run_id	HPC	PR	Ingested Samples per GPU
mf3vpsec	JWB	develop (1 node)	1044
ve8c4vuy	JWB	javad/dev/optimize_encoder (1 nodes)	1294

Run ../WeatherGenerator-private/hpc/launch-slurm.py --time 60 --base-config ./config/config_forecasting.yml

run_id	HPC	PR	Ingested Samples per GPU
v3ngv74i	JWB	develop (1 node)	670
yuvx9hwm	JWB	javad/dev/optimize_encoder (1 nodes)	760

Run ../WeatherGenerator-private/hpc/launch-slurm.py --time 60 --base-config ./config/config_jepa.yml

run_id	HPC	PR	Ingested Samples per GPU
d4vw793o	JWB	develop (1 node)	2308
ob69315g	JWB	javad/dev/optimize_encoder (1 nodes)	4486

Performance improvements ranging from 14% to 94%, depending on the configuration, are expected.

clessig · 2026-04-06T14:36:41Z

Performance improvement with multiple streams:

New:

000 : 00280/04096 : 000280 : loss = 7.8765E-01 (lr=2.93E-05, s/sec=0.876)

LossPhysical.ERA5.mse.avg : 9.4160E-01 
LossPhysical.NPPATMS.mse.avg : 3.8870E-01 
LossPhysical.SurfaceCombined.mse.avg : 5.8787E-01 
LossPhysical.loss_avg : 7.8765E-01

Old:

000 : 00110/04096 : 000110 : loss = 9.2028E-01 (lr=6.49E-06, s/sec=0.627)

LossPhysical.ERA5.mse.avg : 1.0701E+00 
LossPhysical.NPPATMS.mse.avg : 3.5584E-01 
LossPhysical.SurfaceCombined.mse.avg : 7.0956E-01 
LossPhysical.loss_avg : 9.2028E-01

clessig · 2026-04-06T14:37:17Z

@javak87 : I am happy to merge it. Any reason it was still marked as draft?

javak87 · 2026-04-06T16:19:27Z

@javak87 : I am happy to merge it. Any reason it was still marked as draft?

Not a specific reason. You can merge it.

clessig · 2026-04-06T16:31:40Z

@javak87 : Can we use this version:

        rows = torch.arange( tok_counts.max(), device=tok_counts.device).unsqueeze(0)
        rows = rows.expand(tok_counts.shape[0], -1)
        pe_idxs = rows[rows < tok_counts.unsqueeze(1)]

It's equivalent to your code but avoids one shape promotion (in the third and fourth line of your code this is happening ones implicit and ones explicit).

javak87 · 2026-04-06T16:44:02Z

@javak87 : Can we use this version:
        rows = torch.arange( tok_counts.max(), device=tok_counts.device).unsqueeze(0)
        rows = rows.expand(tok_counts.shape[0], -1)
        pe_idxs = rows[rows < tok_counts.unsqueeze(1)]
It's equivalent to your code but avoids one shape promotion (in the third and fourth line of your code this is happening ones implicit and ones explicit).

Good suggestion!!
Let me run and make sure it's performant.

clessig · 2026-04-06T17:20:26Z

@javak87 : Can we use this version:
        rows = torch.arange( tok_counts.max(), device=tok_counts.device).unsqueeze(0)
        rows = rows.expand(tok_counts.shape[0], -1)
        pe_idxs = rows[rows < tok_counts.unsqueeze(1)]
It's equivalent to your code but avoids one shape promotion (in the third and fourth line of your code this is happening ones implicit and ones explicit).
Good suggestion!! Let me run and make sure it's performant.

Ok, please double-check and then we can merge.

javak87 · 2026-04-07T09:43:48Z

@javak87 : Can we use this version:
        rows = torch.arange( tok_counts.max(), device=tok_counts.device).unsqueeze(0)
        rows = rows.expand(tok_counts.shape[0], -1)
        pe_idxs = rows[rows < tok_counts.unsqueeze(1)]
It's equivalent to your code but avoids one shape promotion (in the third and fourth line of your code this is happening ones implicit and ones explicit).
Good suggestion!! Let me run and make sure it's performant.
Ok, please double-check and then we can merge.

Since config_jepa.yml is more sensitive to this optimization, I tested your suggested changes. The number of ingested samples decreased from 4486 to 4416 per GPU.

Given this, I think my proposed change performs slightly better.

clessig · 2026-04-07T18:50:27Z

decreased from 4486 to 4416 per GPU.

@javak87 : For me this is in the noise range. Can you reproduce this difference reliably?

javak87 · 2026-04-08T21:25:14Z

decreased from 4486 to 4416 per GPU.

@javak87 : For me this is in the noise range. Can you reproduce this difference reliably?

Run config_jepa.yml for 180 mins:
Again, decreased from 13532 to 13392 per GPU.
Here is the result:

clessig · 2026-04-09T06:17:29Z

decreased from 4486 to 4416 per GPU.
@javak87 : For me this is in the noise range. Can you reproduce this difference reliably?

Run config_jepa.yml for 180 mins: Again, decreased from 13532 to 13392 per GPU. Here is the result:

Let's use

        rows = torch.arange( tok_counts.max(), device=tok_counts.device).unsqueeze(0)
        rows = rows.expand(tok_counts.shape[0], -1)
        pe_idxs = rows[rows < tok_counts.unsqueeze(1)]

There is one temporary less. The small degradation might change with minor changes in pytorch and I prefer the cleaner solution.

* optimize positional encoding * update positional encoding impl --------- Co-authored-by: Javad Kasravi <j.kasravi@fz-juelich.de> Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int>

optimize positional encoding

4a934da

github-project-automation bot added this to WeatherGen-dev Apr 5, 2026

javak87 marked this pull request as draft April 5, 2026 19:05

github-actions bot added the model Related to model training or definition (not generic infra) label Apr 5, 2026

Merge branch 'develop' into javad/dev/optimize_encoder

beea026

clessig marked this pull request as ready for review April 6, 2026 14:35

florianscheidl mentioned this pull request Apr 9, 2026

Search for torch.cat-torch.arange pattern to optimize #2184

Open

7 tasks

update positional encoding impl

a6fbd80

clessig approved these changes Apr 9, 2026

View reviewed changes

Merge branch 'develop' into javad/dev/optimize_encoder

869180f

clessig merged commit 3d50683 into ecmwf:develop Apr 9, 2026
5 checks passed

github-project-automation bot moved this to Done in WeatherGen-dev Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize positional encoding#2172

optimize positional encoding#2172
clessig merged 4 commits intoecmwf:developfrom
javak87:javad/dev/optimize_encoder

javak87 commented Apr 5, 2026 •

edited

Loading

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

javak87 commented Apr 6, 2026

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

javak87 commented Apr 6, 2026

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

javak87 commented Apr 7, 2026

Uh oh!

clessig commented Apr 7, 2026

Uh oh!

javak87 commented Apr 8, 2026

Uh oh!

clessig commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

javak87 commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Performance comparison with develop branch

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

javak87 commented Apr 6, 2026

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

javak87 commented Apr 6, 2026

Uh oh!

clessig commented Apr 6, 2026

Uh oh!

javak87 commented Apr 7, 2026

Uh oh!

clessig commented Apr 7, 2026

Uh oh!

javak87 commented Apr 8, 2026

Uh oh!

clessig commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

javak87 commented Apr 5, 2026 •

edited

Loading