Skip to content

PYTHON-5752 - Only retry overload errors if retries are enabled#2726

Merged
NoahStapp merged 7 commits intomongodb:backpressurefrom
NoahStapp:PYTHON-5751
Mar 12, 2026
Merged

PYTHON-5752 - Only retry overload errors if retries are enabled#2726
NoahStapp merged 7 commits intomongodb:backpressurefrom
NoahStapp:PYTHON-5751

Conversation

@NoahStapp
Copy link
Contributor

@NoahStapp NoahStapp commented Mar 9, 2026

PYTHON-5752

Changes in this PR

Adherence to the backpressure spec. Overload errors are only retried if the appropriate retryWrite/retryRead=True option is set for the client (both are by default).

runCommand explicitly requires both retryReads and retryWrites to be enabled in order to retry overload errors.

aggregate is normally a read operation, except when the pipeline contains an $out/$merge stage, in which case it is treated as a write.

Test Plan

Added the tests in DRIVERS-3416.

Checklist

Checklist for Author

  • Did you update the changelog (if necessary)?
  • Is there test coverage?
  • Is any followup work tracked in a JIRA ticket? If so, add link(s).

Checklist for Reviewer

  • Does the title of the PR reference a JIRA Ticket?
  • Do you fully understand the implementation? (Would you be comfortable explaining how this code works to someone else?)
  • Is all relevant documentation (README or docstring) updated?

@NoahStapp NoahStapp marked this pull request as ready for review March 10, 2026 17:23
@NoahStapp NoahStapp requested a review from a team as a code owner March 10, 2026 17:23
@NoahStapp NoahStapp requested review from ShaneHarvey, caseyclements and sleepyStick and removed request for a team March 10, 2026 17:23
@NoahStapp
Copy link
Contributor Author

Encryption test failures are expected on backpressure.

self._client.options.retry_reads and self._client.options.retry_writes
):
raise
if self._is_aggregate_write and not self._client.options.retry_writes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? aggregate writes are never consider a retryable read or write (hence why retryable=False was passed previously). They should only be retried on overload errors. Do we even need the is_aggregate_write flag at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All aggregate operations, including writes, call retryable_read. This causes them to be treated as reads even though retryWrites determines whether or not overload error retries are allowed. This check here is for that specific case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that makes sure aggregate + writes are NOT retried when they hit retryable write errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-overload retryable write errors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Things like ShutdownInProgress that include the RetryableWriteError label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2010,6 +2009,8 @@ async def _retry_internal(
read_pref: Optional[_ServerMode] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encryption errors seem relevant. Is this expected?

[2026/03/10 10:26:26.362] FAILURE: AttributeError: AsyncMongoClient has no attribute '_retry_policy'. To access the _retry_policy database, use client['_retry_policy']. ()
[2026/03/10 10:26:26.362] self = <test.asynchronous.test_encryption.TestDeadlockProse testMethod=test_case_5>
[2026/03/10 10:26:26.362]     async def test_case_5(self):
[2026/03/10 10:26:26.362] >       await self._run_test(
[2026/03/10 10:26:26.362]             max_pool_size=None,
[2026/03/10 10:26:26.362]             auto_encryption_opts=AutoEncryptionOpts(
[2026/03/10 10:26:26.362]                 *self.optargs, bypass_auto_encryption=False, key_vault_client=None
[2026/03/10 10:26:26.362]             ),
[2026/03/10 10:26:26.362]         )
[2026/03/10 10:26:26.362] test/asynchronous/test_encryption.py:1605: 
[2026/03/10 10:26:26.362] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2026/03/10 10:26:26.362] test/asynchronous/test_encryption.py:1504: in _run_test
[2026/03/10 10:26:26.362]     client_encrypted = await self.async_rs_or_single_client(
[2026/03/10 10:26:26.362] test/asynchronous/__init__.py:1127: in async_rs_or_single_client
[2026/03/10 10:26:26.362]     return await self._async_mongo_client(h, p, **kwargs)
[2026/03/10 10:26:26.362]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026/03/10 10:26:26.362] test/asynchronous/__init__.py:1035: in _async_mongo_client
[2026/03/10 10:26:26.362]     client = AsyncMongoClient(uri, port, **client_options)
[2026/03/10 10:26:26.362]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026/03/10 10:26:26.362] pymongo/asynchronous/mongo_client.py:897: in __init__
[2026/03/10 10:26:26.362]     self._init_based_on_options(self._seeds, srv_max_hosts, srv_service_name)
[2026/03/10 10:26:26.362] pymongo/asynchronous/mongo_client.py:1007: in _init_based_on_options
[2026/03/10 10:26:26.362]     self._encrypter = _Encrypter(self, self._options.auto_encryption_opts)
[2026/03/10 10:26:26.362]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026/03/10 10:26:26.362] pymongo/asynchronous/encryption.py:434: in __init__
[2026/03/10 10:26:26.362]     key_vault_coll = key_vault_client[db][coll]
[2026/03/10 10:26:26.362]                      ^^^^^^^^^^^^^^^^^^^^
[2026/03/10 10:26:26.362] pymongo/asynchronous/mongo_client.py:1366: in __getitem__
[2026/03/10 10:26:26.362]     return database.AsyncDatabase(self, name)
[2026/03/10 10:26:26.362]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026/03/10 10:26:26.362] pymongo/asynchronous/database.py:138: in __init__
[2026/03/10 10:26:26.362]     self._retry_policy = client._retry_policy
[2026/03/10 10:26:26.362]                          ^^^^^^^^^^^^^^^^^^^^

https://spruce.corp.mongodb.com/task/mongo_python_driver_backpressure_encryption_macos_test_non_standard_latest_python3.13_noauth_nossl_standalone_patch_e7a5247bed99c6c1875634ac11c056c9d00f6cc7_69b048f47a99ad0007977051_26_03_10_16_38_14/tests?execution=0&sorts=STATUS%3AASC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this must have been hidden by the expected encryption errors previously. We just need to instantiate self._retry_policy before we call self._init_based_on_options in the client constructor.

}
},
{
"connectionCheckedInEvent": {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order between commandStartedEvent/commandFailedEvent and connectionCheckedInEvent well defined in our specs?

It might be safer to only check connectionCheckOutEvent and connectionCheckedInEvent.

@NoahStapp NoahStapp requested a review from ShaneHarvey March 10, 2026 19:23
@NoahStapp NoahStapp changed the title PYTHON-5751 - Only retry overload errors if retries are enabled PYTHON-5752 - Only retry overload errors if retries are enabled Mar 10, 2026
@NoahStapp NoahStapp merged commit 0a47a19 into mongodb:backpressure Mar 12, 2026
77 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants