Skip to content
/ server Public

MDEV-7394 : Making slave_skip_errors writable at runtime when slave is stopped #4634

Open
Mahmoud-kh1 wants to merge 4 commits intoMariaDB:mainfrom
Mahmoud-kh1:dynamic-slave-skip-error
Open

MDEV-7394 : Making slave_skip_errors writable at runtime when slave is stopped #4634
Mahmoud-kh1 wants to merge 4 commits intoMariaDB:mainfrom
Mahmoud-kh1:dynamic-slave-skip-error

Conversation

@Mahmoud-kh1
Copy link

@Mahmoud-kh1 Mahmoud-kh1 commented Feb 8, 2026

Now we can update slave_skip_errors at runtime when slaves stopped

  • This feature makes the slave_skip_errors system variable dynamic allowing it
    to be changed at runtime when the replication slave is stopped.
  • Previously slave_skip_errors was read only at runtime and required a
    server restart to be changed.
  • Runtime updates are now validated and safely rejected when the slave is
    running preventing inconsistent replication state.

Key Changes

  • Added ON_CHECK handler to verify that updates are only allowed while the
    slave is stopped.
  • Added ON_UPDATE handler to reinitialize the internal skip error state
    when the variable is changed.
  • Added an rpl mtr test that verifies slave_skip_errors can be changed
    dynamically when the slave is stopped and verified that updates are rejected while the slave is running.

behavior now is like that :
test slave2

Feature :
MDEV-7394

@CLAassistant
Copy link

CLAassistant commented Feb 8, 2026

CLA assistant check
All committers have signed the CLA.

@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch 6 times, most recently from 87dbe8f to 3fa1017 Compare February 9, 2026 07:34
@Mahmoud-kh1
Copy link
Author

Mahmoud-kh1 commented Feb 9, 2026

The following test cases fail because they assume that slave_skip_errors is read only which is no longer true which making their checks fail also
sys_vars.sysvars_server_notembedded
main.variables-notembedded
sys_vars.slave_skip_errors_basic

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 9, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a preliminary review. I'd like to request changes mostly because there are tests that need to be re-recorded (and possibly fixed too).

The rest of the comments are just my own limited take on the change. Feel free to ignore and leave for the final review.

@grooverdan
Copy link
Member

Thank you so much for implementing my 11 year old bug report. I'd be very grateful if you stick through the review process on this. There's a lot to keep correct in the server to implement this change.

Make slave_skip_errors dynamic so it can be changed while the slave is stopped. Attempts to change it while the slave is running are rejected with a clear error.
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch from 3fa1017 to 440eec2 Compare February 12, 2026 12:41
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just fix the spacing issues. The rest is pretty much as one expects it to be. Aside from some polishing. Once the space changes are gone I will do another round and approve it.

Removed unnecessary blank lines before the conditional check.
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch 7 times, most recently from b28dbed to 4b94518 Compare February 19, 2026 14:59
@Mahmoud-kh1 Mahmoud-kh1 requested a review from gkodinov February 20, 2026 11:42
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also address the buildbot failure. It seems related.

@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch 3 times, most recently from d21fd5b to 5af34c9 Compare February 20, 2026 14:07
@Mahmoud-kh1 Mahmoud-kh1 requested a review from gkodinov February 20, 2026 15:46
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch 3 times, most recently from b294419 to 450161b Compare February 21, 2026 14:46
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please do not do changes that are not relevant to the fix (like the one outlined below).

Please stand by for the final review.

show variables like 'slave_skip_errors';
Variable_name Value
slave_skip_errors 0,3,100,137,643,1752
slave_skip_errors 3,100,137,0,643,1752
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and below: why the change?

Copy link
Author

@Mahmoud-kh1 Mahmoud-kh1 Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suspected that the issue might be related to the modifications I made. however, I run the same test on the main branch and it appears the same order.

The test is failing in main branch because the variable is now global and the test has not fixed for that I think (I can make an issue for that and pr to fix it).

Image

so I opened the .opt file where the the initial values is defined and I found it in the same order they appear now

Image

so I suppose it shows the values in the same order as they were set.

@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch from 450161b to 92b8f1e Compare March 10, 2026 02:08
@bnestere bnestere added the Replication Patches involved in replication label Mar 10, 2026
"provided list",
READ_ONLY GLOBAL_VAR(opt_slave_skip_errors), CMD_LINE(REQUIRED_ARG),
DEFAULT(0));
PREALLOCATED GLOBAL_VAR(opt_slave_skip_errors), CMD_LINE(REQUIRED_ARG),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my first implemention I didn't add PREALLOCATED and the default value was "OFF" but I have some memory issues specially when the first init happen and I free opt_slave_skip_errors it freeze because it's not yet heap memory , I tried to solve it with many approaches like making flag that tell me that this var is pointing to heap or not ... but later I looked to other sys_var and I found this and used it

set global sql_slave_skip_counter= 0;
set @@global.slave_net_timeout= @my_slave_net_timeout;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't do whitespace changes unrelated to your commit.

SET @@global.slave_skip_errors= 7;
SET @@global.slave_skip_errors= "7";
SELECT @@global.slave_skip_errors;
SET @@global.slave_skip_errors= "OFF";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing an additional path? Or just a reset?

If its just a reset, omit.

Either way, put the last line set GLOBAL slave_skip_errors = @my_slave_skip_errors; here as its relevant to this test.

Could put the first set @my_slave_skip_errors =@@global.slave_skip_errors; just prior to this test block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not reset , I am saving the initial values in the beginning of the test , and resting in the end, but I will move those this block of test as you suggested

if (!use_slave_mask || bitmap_is_clear_all(&slave_error_mask))
{
/* purecov: begin tested */
memset(slave_skip_error_names, 0, sizeof(slave_skip_error_names));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this, and the below memset needed?

sql/slave.cc Outdated
bool is_network_error(uint errorno)
{
if (errorno == CR_CONNECTION_ERROR ||
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mahmoud-kh1 - please address - "NO whitespace only changes"

for (;my_isspace(system_charset_info,*arg);++arg)
/* empty */;
if (!system_charset_info->strnncoll((uchar*)arg,4,(const uchar*)"all",4))
if (strlen(arg) == 3 && !system_charset_info->strnncoll((uchar*)arg,3,(const uchar*)"all",3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear why this is changed.

@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch from 92b8f1e to 5564f58 Compare March 18, 2026 11:52
@Mahmoud-kh1 Mahmoud-kh1 force-pushed the dynamic-slave-skip-error branch from 5564f58 to 5cfb40f Compare March 18, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

5 participants