Fix circular dependency deadlock in TestSecrets initialization#15864
Fix circular dependency deadlock in TestSecrets initialization#15864danmuzi merged 11 commits intoapache:mainfrom
Conversation
Signed-off-by: Namgyu Kim <namgyu@apache.org>
Signed-off-by: Namgyu Kim <namgyu@apache.org>
| expectThrows(AssertionError.class, () -> TestSecrets.setSegmentReaderAccess(null)); | ||
| } | ||
|
|
||
| public void testDeadlock() throws Exception { |
There was a problem hiding this comment.
This test is fine, but I'd like to avoaid new tests spawning childs processes as much as possible. It is good to reproduce, but maybe @dweiss has an opinion.
There was a problem hiding this comment.
This is rather complicated and even with a barrier there's no guarantee this will always deadlock... I'm not sure we need this, to be honest.
There was a problem hiding this comment.
We have a test for the codec/postingsformat deadlock:
This one is using some randomness to trigger different combinations of initialization. Maybe we can include the test here. It is realted, as most class loading deadlocks occur in Lucene around loading IndexWriter together with codecs/postings/docvalues format and.... TestSecrents
There was a problem hiding this comment.
I think we can add this case there in the thread:
- update switch statement and add a new case that loads TestSecrets and calls one of the getter methods
- raise the threadcount as described at beginning of tests (2 ztimes the module/number of cases). I'd improve the test to use a common variable for that, it is a bit tricky
There was a problem hiding this comment.
I was able to beef-up that test. Works much better and tests the whole deadloding around IndexWriter. I will post a patch for the other test, or should I commit directly?
TestCodecLoadingDeadlock > testDeadlock FAILED
java.lang.AssertionError: Process died abnormally? expected:<0> but was:<1>
at __randomizedtesting.SeedInfo.seed([A5DDBF322DCE93C1:A8B65E262B943E17]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.apache.lucene.codecs.TestCodecLoadingDeadlock.testDeadlock(TestCodecLoadingDeadlock.java:88)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:565)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:426)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:716)
at com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:637)
org.apache.lucene.codecs.TestCodecLoadingDeadlock > test suite's output saved to C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\build\test-results\test_1\outputs\OUTPUT-org.apache.lucene.codecs.TestCodecLoadingDeadlock.txt, copied below:
2> März 24, 2026 9:10:47 AM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
2> WARNUNG: Vector bitsize enforcement; using default vectorization provider outside of testMode
1> codec: FastDecompressionCompressingStoredFieldsData, pf: MockRandom, dvf: Lucene90
1> Subprocess emitted the following output:
1> M´┐¢rz 24, 2026 9:10:48 AM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
1> WARNUNG: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
1> Pool didn't return after 30 seconds, classloader deadlock? Dumping stack traces.
1> # Thread: Thread[#26,deadlockchecker-1-thread-1,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#27,deadlockchecker-1-thread-2,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#28,deadlockchecker-1-thread-3,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#29,deadlockchecker-1-thread-4,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#30,deadlockchecker-1-thread-5,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#31,deadlockchecker-1-thread-6,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#32,deadlockchecker-1-thread-7,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#33,deadlockchecker-1-thread-8,5,main], state: RUNNABLE, stack:
1> org.apache.lucene.index.IndexWriter.<clinit>(IndexWriter.java:6575) java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method) java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1169) java.base/java.lang.invoke.MethodHandles$Lookup.ensureInitialized(MethodHandles.java:2748) org.apache.lucene.codecs.TestCodecLoadingDeadlock.lambda$main$1(TestCodecLoadingDeadlock.java:157) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) java.base/java.lang.Thread.run(Thread.java:1474)
1>
1> # Thread: Thread[#34,deadlockchecker-1-thread-9,5,main], state: RUNNABLE, stack:
1> java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method) java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1169) java.base/java.lang.invoke.MethodHandles$Lookup.ensureInitialized(MethodHandles.java:2748) org.apache.lucene.codecs.TestCodecLoadingDeadlock.lambda$main$1(TestCodecLoadingDeadlock.java:160) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) java.base/java.lang.Thread.run(Thread.java:1474)
1>
1> # Thread: Thread[#35,deadlockchecker-1-thread-10,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#36,deadlockchecker-1-thread-11,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#37,deadlockchecker-1-thread-12,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#38,deadlockchecker-1-thread-13,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#39,deadlockchecker-1-thread-14,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#40,deadlockchecker-1-thread-15,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#41,deadlockchecker-1-thread-16,5,], state: TERMINATED, stack:
1>
1>
1> # Thread: Thread[#42,deadlockchecker-1-thread-17,5,main], state: RUNNABLE, stack:
1> java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method) java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1169) java.base/java.lang.invoke.MethodHandles$Lookup.ensureInitialized(MethodHandles.java:2748) org.apache.lucene.codecs.TestCodecLoadingDeadlock.lambda$main$1(TestCodecLoadingDeadlock.java:157) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) java.base/java.lang.Thread.run(Thread.java:1474)
1>
1> # Thread: Thread[#43,deadlockchecker-1-thread-18,5,main], state: RUNNABLE, stack:
1> java.base/java.lang.Class.forName0(Native Method) java.base/java.lang.Class.forName(Class.java:467) java.base/java.lang.Class.forName(Class.java:458) org.apache.lucene.internal.tests.TestSecrets.lambda$static$0(TestSecrets.java:43) org.apache.lucene.internal.tests.TestSecrets.<clinit>(TestSecrets.java:51) org.apache.lucene.index.SegmentReader.<clinit>(SegmentReader.java:402) java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method) java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1169) java.base/java.lang.invoke.MethodHandles$Lookup.ensureInitialized(MethodHandles.java:2748) org.apache.lucene.codecs.TestCodecLoadingDeadlock.lambda$main$1(TestCodecLoadingDeadlock.java:160) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) java.base/java.lang.Thread.run(Thread.java:1474)
1>
> java.lang.AssertionError: Process died abnormally? expected:<0> but was:<1>
> at __randomizedtesting.SeedInfo.seed([A5DDBF322DCE93C1:A8B65E262B943E17]:0)
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.apache.lucene.codecs.TestCodecLoadingDeadlock.testDeadlock(TestCodecLoadingDeadlock.java:88)
> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
> at java.base/java.lang.reflect.Method.invoke(Method.java:565)
> at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:426)
> at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:716)
> at com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:138)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:637)
:lucene:core:test_1 (FAILURE): 1 test, 1 failure
1 test completed, 1 failed
> Task :lucene:core:test_1 FAILED
Patch:
diff --git a/lucene/core/src/test/org/apache/lucene/codecs/TestCodecLoadingDeadlock.java b/lucene/core/src/test/org/apache/lucene/codecs/TestCodecLoadingDeadlock.java
index 0e0c3941bec..3705c8e6d58 100644
--- a/lucene/core/src/test/org/apache/lucene/codecs/TestCodecLoadingDeadlock.java
+++ b/lucene/core/src/test/org/apache/lucene/codecs/TestCodecLoadingDeadlock.java
@@ -20,6 +20,8 @@ import com.carrotsearch.randomizedtesting.LifecycleScope;
import com.carrotsearch.randomizedtesting.RandomizedContext;
import com.carrotsearch.randomizedtesting.RandomizedRunner;
import com.carrotsearch.randomizedtesting.RandomizedTest;
+
+import java.lang.invoke.MethodHandles;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
@@ -36,6 +38,10 @@ import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
+
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.SegmentReader;
+import org.apache.lucene.internal.tests.TestSecrets;
import org.apache.lucene.tests.util.LuceneTestCase;
import org.apache.lucene.util.NamedThreadFactory;
import org.apache.lucene.util.SuppressForbidden;
@@ -99,7 +105,8 @@ public class TestCodecLoadingDeadlock extends Assert {
final String pfName = args[1];
final String dvfName = args[2];
- final int numThreads = 14; // two times the modulo in switch statement below
+ final int numTasks = 9; // add number of catch cases in switch below
+ final int numThreads = numTasks * 2; // two times the modulo in switch statement below
final CopyOnWriteArrayList<Thread> allThreads = new CopyOnWriteArrayList<>();
final ExecutorService pool =
Executors.newFixedThreadPool(
@@ -114,6 +121,7 @@ public class TestCodecLoadingDeadlock extends Assert {
});
final CyclicBarrier barrier = new CyclicBarrier(numThreads);
+ final var lookup = MethodHandles.lookup();
IntStream.range(0, numThreads)
.forEach(
taskNo ->
@@ -123,7 +131,7 @@ public class TestCodecLoadingDeadlock extends Assert {
// Await a common barrier point for all threads and then
// run racy code. This is intentional.
barrier.await();
- switch (taskNo % 7) {
+ switch (taskNo % numTasks) {
case 0:
Codec.getDefault();
break;
@@ -145,6 +153,12 @@ public class TestCodecLoadingDeadlock extends Assert {
case 6:
DocValuesFormat.availableDocValuesFormats();
break;
+ case 7:
+ lookup.ensureInitialized(IndexWriter.class);
+ break;
+ case 8:
+ lookup.ensureInitialized(SegmentReader.class);
+ break;
default:
throw new AssertionError();
}There was a problem hiding this comment.
It fails with current main, have not yet applied to this branch. I can do this an disable your test.
The "beefed" up test now checks better for deadlocks by really treating not only codec loading but everything around IndexWriter / IndexReader.
There was a problem hiding this comment.
I removed the test and included the test in the more general deadlock testcase which covers also codec deadloads.
The test failed (see above) and passes with this PR.
|
Hi, |
|
Please run |
Signed-off-by: Namgyu Kim <namgyu@apache.org>
Signed-off-by: Namgyu Kim <namgyu@apache.org>
Signed-off-by: Namgyu Kim <namgyu@apache.org>
…lerForSetter Signed-off-by: Namgyu Kim <namgyu@apache.org>
uschindler
left a comment
There was a problem hiding this comment.
I am not sure if the test is really needed, otherwise looks perfect.
Back porting should be easy.
|
@uschindler |
|
We have a similar test for Service loader deadlock with codecs. Maybe we can adapt that one to test both? |
| expectThrows(AssertionError.class, () -> TestSecrets.setSegmentReaderAccess(null)); | ||
| } | ||
|
|
||
| public void testDeadlock() throws Exception { |
There was a problem hiding this comment.
This is rather complicated and even with a barrier there's no guarantee this will always deadlock... I'm not sure we need this, to be honest.
…imporve that test
|
Hi @danmuzi , I also fixed one more Exception in the setters. In another commit I will now move and rename the test to be more general.TestCodecLoadingDeadlock => TestClassloaderDeadlocks in index package |
|
Test moved. Have fun! |
# Conflicts: # lucene/CHANGES.txt
|
This passes: |
|
I think we should replace all I would do this for here and the other places in Lucene in a separate PR. |
|
Hi @uschindler , I've tested your latest changes with the updated Thanks for your detailed review! |
Signed-off-by: Namgyu Kim <namgyu@apache.org> Co-authored-by: Uwe Schindler <uschindler@apache.org>
|
Perfect, thanks. I will open a small PR for the global replacement of the Exception. |
Related Issues
Overview
A deadlock occurred during static initialization due to the JVM Class Initialization Lock.
This issue occurs through mutual references between
org.apache.lucene.index.IndexWriter(CMS, SegmentReader, FilterIndexInput also) andorg.apache.lucene.internal.tests.TestSecrets.Cause
Circular Dependency
IndexWriter invokes TestSecrets during its initialization, while TestSecrets conversely attempts to force-load IndexWriter during its own initialization.
JVM Class Loading Lock
When the JVM loads a class for the first time, it acquires a global lock on that class object.
A deadlock occurs when two threads each hold a lock on different classes and wait for the other thread to release the lock for the class they are trying to load.
Code structure
Deadlock Scenario
This deadlock occurs in a multi-threaded environment when two classes are accessed for the first time almost simultaneously.
IndexWriterclass.SegmentReaderclass.IndexWriter.classand starts initialization.SegmentReader.classand starts initialization.TestSecrets→ Acquires lock onTestSecrets.class.TestSecrets.TestSecretslock (held by Thread A).TestSecretsstatic block callsClass.forName("SegmentReader").SegmentReaderlock (held by Thread B).Result: A permanent deadlock occurs where Thread A waits for the
SegmentReaderlock while Thread B waits for theTestSecretslock.Steps to Reproduce
Without applying my
TestSecretsfix:main()of theDeadlockTestclass inTestTestSecrets.testDeadlock()inTestTestSecrets.Solution - Lazy Initialization
All
Class.forNamecalls that forcibly load target classes within the static initialization block ofTestSecretshave been removed.Instead, the logic has been updated to perform initialization individually within each Getter method at the actual time of invocation.
Diagram
graph TD subgraph "Thread A (Initializes IndexWriter)" A1[IndexWriter class load] --> A2["IndexWriter.<clinit> (Lock IndexWriter.class)"] A2 --> A3["Call TestSecrets.setIndexWriterAccess()"] A3 --> A4["Trigger TestSecrets.<clinit> (Lock TestSecrets.class)"] A4 --> A5["TestSecrets.<clinit> calls Class.forName('SegmentReader')"] A5 -- "Wait for SegmentReader.class lock" --> B2 end subgraph "Thread B (Initializes SegmentReader)" B1[SegmentReader class load] --> B2["SegmentReader.<clinit> (Lock SegmentReader.class)"] B2 --> B3["Call TestSecrets.setSegmentReaderAccess()"] B3 -- "Wait for TestSecrets.class lock" --> A4 end A5 -. Deadlock .-> B2 B3 -. Deadlock .-> A4