Skip to content

UCX Notified Communication#20

Open
joe-explr wants to merge 1 commit intodevreal:notified-rmafrom
joe-explr:notified-rma-ucx
Open

UCX Notified Communication#20
joe-explr wants to merge 1 commit intodevreal:notified-rmafrom
joe-explr:notified-rma-ucx

Conversation

@joe-explr
Copy link
Copy Markdown

Signed-off-by: Joseph Antony jajoseph.antony18@gmail.com

 Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
Copy link
Copy Markdown
Owner

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments.

+ (uint64_t)notify * sizeof(uint64_t);
}

#define CHECK_NOTIFY_IDX(notify) \
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest passing in the module and checking against a value stored in there. OMPI_OSC_UCX_MAX_NOTIFY_COUNTERS is just a crutch until we have a better solution


/* Counters are local memory — just read with a barrier to ensure
* any preceding remote writes to this counter are visible. */
opal_atomic_rmb();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this actually does anything. Shouldn't the read barrier come after the counter read to prevent subsequent reads from being reordered?


volatile uint64_t *counter =
(volatile uint64_t *)(module->addrs[my_rank] + module->size) + notify;
*value = (OMPI_MPI_COUNT_TYPE)opal_atomic_swap_64((volatile int64_t *)counter, 0);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to use UCX to swap the value, otherwise it is not guaranteed to be atomic wrt to other network atomic operations.

/* create the segment */

size_t total = 0;
size_t notify_size = OMPI_OSC_UCX_MAX_NOTIFY_COUNTERS * sizeof(uint64_t);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the use of OMPI_OSC_UCX_MAX_NOTIFY_COUNTERS with a local variable that we can set to something that comes out of the info object. That makes it easier going forward. Also, store the value of that variable in the module (see my earlier comment).

@devreal
Copy link
Copy Markdown
Owner

devreal commented Apr 1, 2026

Oh, I guess we'll need to get #9 in first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants