Conversation
Motivation: To facilitate debug of user reports regarding hot file replication. Modification: - Add enablement information to mover request debug messages. - Add new debug messages to `MigrationModule.reportFileRequests()` on: - NOP due to failure to exceed replication threshold. - Discovery of an existing hot file replication job for the same `PnfsId`. Result: Improved logging for diagnostic purposes when addressing user reports of hot file replication issues. Target: master Request: 11.2 Patch: https://rb.dcache.org/r/14655/diff/raw Closes: Requires-notes: yes Requires-book: no Acked-by: Dmitry Litvintsev
There was a problem hiding this comment.
Pull request overview
This PR aggregates a set of 11.2.x maintenance updates across dCache, covering SciTags/Firefly marker behavior, pool hotfile replication selection logic, WebDAV/TPC header propagation, CLI usability fixes, and routine version/dependency bumps.
Changes:
- Extend and harden SciTags/Firefly handling (protocol coverage, extra collector copy, improved logging) and propagate SciTag for remote HTTP(S) TPC movers.
- Improve pool hotfile replication behavior (exclude p2p from request counting; use PoolManager query selection honoring read-preference levels; expand unit tests + developer doc).
- Add
dcache pool benchmark(fio-based) and fix pool size display whenmax diskspaceis configured as a percentage; bump ZooKeeper and nfs4j; add Jackson JDK8 datatype module for REST Optional support.
Reviewed changes
Copilot reviewed 97 out of 98 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| skel/share/lib/utils.sh | Adds filesystem total-space helper used by pool size calculation. |
| skel/share/lib/pool.sh | Fix pool size display when configured as percentage of filesystem. |
| skel/share/lib/pool-benchmark.fio | New fio workload definition for pool benchmarking. |
| skel/share/defaults/pool.properties | Updates Firefly destination property documentation (collector semantics). |
| skel/man/man8/dcache.8 | Documents new pool benchmark CLI subcommand. |
| skel/bin/dcache | Adds dcache pool benchmark command implementation. |
| pom.xml | Bumps parent version + dependencies; adds jackson-datatype-jdk8. |
| plugins/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| plugins/hsqldb/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| packages/tar/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| packages/system-test/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| packages/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| packages/fhs/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/srm-server/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/srm-common/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/srm-client/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/missingfiles-semsg/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/logback-test-config/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/logback-console-config/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-voms/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-scitoken/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-roles/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-pyscript/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-omnisession/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-oidc/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-oidc-te/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-nsswitch/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-nis/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-multimap/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-ldap/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-krb5/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-kpwd/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-jaas/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-htpasswd/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-grid/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-fermi/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-banfile/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/gplazma2-alise/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/ftp-client/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache/src/test/java/org/dcache/pool/movers/TransferLifeCycleTest.java | Adds regression tests for SciTag boundaries, FQAN fallback, excludes, and end-time semantics. |
| modules/dcache/src/test/java/org/dcache/pool/migration/PoolListByPoolMgrQueryTest.java | Unit tests for new PoolManager-query-based pool list selection logic. |
| modules/dcache/src/test/java/org/dcache/pool/migration/MigrationModuleTest.java | Updates tests for hotfile replication to include protocol info and new selection behavior. |
| modules/dcache/src/test/java/org/dcache/pool/classic/MoverRequestSchedulerTest.java | Tests that p2p movers are excluded from request counting. |
| modules/dcache/src/test/java/org/dcache/pool/classic/IoQueueManagerTest.java | Tests request counting across queues excluding p2p. |
| modules/dcache/src/test/java/org/dcache/pool/classic/AbstractMoverProtocolTransferServiceTest.java | Updates test setup to provide protocol info and pnfsid context. |
| modules/dcache/src/main/resources/org/dcache/pool/classic/pool.xml | Wires transfer-lifecycle into remote HTTP transfer service. |
| modules/dcache/src/main/java/org/dcache/pool/movers/TransferLifeCycle.java | Extends marker support/protocols, adds collector copy, improves time semantics and structured debug logging. |
| modules/dcache/src/main/java/org/dcache/pool/migration/PoolListByPoolMgrQuery.java | New RefreshablePoolList using PoolMgrQueryPoolsMsg and highest-preference selection. |
| modules/dcache/src/main/java/org/dcache/pool/migration/MigrationModule.java | Uses PoolMgrQueryPoolsMsg-based selection; derives protocol/net unit from ProtocolInfo for hotfile replication. |
| modules/dcache/src/main/java/org/dcache/pool/classic/PoolV4.java | Passes protocolInfo to hotfile monitor and improves debug logging. |
| modules/dcache/src/main/java/org/dcache/pool/classic/MoverRequestScheduler.java | Excludes pool-to-pool movers from request counting. |
| modules/dcache/src/main/java/org/dcache/pool/classic/FileRequestMonitor.java | Extends API to include (nullable) ProtocolInfo. |
| modules/dcache/src/main/java/org/dcache/pool/classic/DefaultPostTransferService.java | Adds SciTags lifecycle-end logging/derivation and invokes TransferLifeCycle with improved diagnostics. |
| modules/dcache/src/main/java/org/dcache/pool/classic/AbstractMoverProtocolTransferService.java | Adds SciTags-related logging and wiring hook for TransferLifeCycle. |
| modules/dcache/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-xrootd/src/main/java/org/dcache/xrootd/door/XrootdTransfer.java | Adds SciTags request logging and transferTag debug. |
| modules/dcache-xrootd/src/main/java/org/dcache/xrootd/door/XrootdDoor.java | Calls SciTags request logging at transfer creation time. |
| modules/dcache-xrootd/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-webdav/src/test/java/org/dcache/webdav/DcacheResourceFactoryTest.java | Tests case-insensitive SciTag header lookup helper. |
| modules/dcache-webdav/src/main/java/org/dcache/webdav/transfer/RemoteTransferHandler.java | Adds transferTag propagation to remote HTTP(S) protocol infos; supports ArchiveMetadata header on custodial uploads. |
| modules/dcache-webdav/src/main/java/org/dcache/webdav/transfer/CopyFilter.java | Captures SciTag headers for TPC logging and forwards transferTag to pool for remote movers. |
| modules/dcache-webdav/src/main/java/org/dcache/webdav/LoggingHandler.java | Logs captured TPC SciTag header values. |
| modules/dcache-webdav/src/main/java/org/dcache/webdav/DcacheResourceFactory.java | Adds case-insensitive header lookup and logs SciTags request source; supports ArchiveMetadata header on custodial uploads. |
| modules/dcache-webdav/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-vehicles/src/main/java/diskCacheV111/vehicles/RemoteHttpDataTransferProtocolInfo.java | Adds transferTag support with backward-compatible deserialization. |
| modules/dcache-vehicles/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-srm/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-spacemanager/src/main/java/diskCacheV111/services/space/SpaceManagerService.java | Ensures access latency/retention policy are always set; logs override instead of throwing. |
| modules/dcache-spacemanager/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-resilience/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-qos/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-nfs/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-nearline-spi/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-info/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-history/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-gplazma/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-ftp/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-frontend/src/main/java/org/dcache/restful/providers/ObjectMapperProvider.java | Registers Jackson Jdk8Module to support Optional types. |
| modules/dcache-frontend/pom.xml | Adds jackson-datatype-jdk8 dependency. |
| modules/dcache-dcap/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-chimera/src/main/java/org/dcache/chimera/namespace/ChimeraHsmStorageInfoExtractor.java | Removes null-return behavior for default access latency/retention policy. |
| modules/dcache-chimera/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/dcache-bulk/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/common/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/common-security/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/common-cli/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/chimera/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/cells/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/benchmarks/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/acl/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| modules/acl-vehicles/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| docs/UserGuide/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| docs/TheBook/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| docs/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| docs/dev/hotfile-replication.md | Adds developer documentation for hotfile replication behavior and diagnostics. |
| archetypes/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
| archetypes/dcache-nearline-plugin-archetype/pom.xml | Version bump alignment to 11.2.4-SNAPSHOT. |
Comments suppressed due to low confidence (1)
modules/dcache/src/main/java/org/dcache/pool/movers/TransferLifeCycle.java:141
- Javadoc for onEnd lists
@paramprotocolInfo and@paramsubject, but the method signature only takes (src, dst, mover). Please update the Javadoc to match the actual parameters to avoid confusion.
| # Extracts the amount of free space in GiB. | ||
| getTotalSpace() # in $1 = path | ||
| { | ||
| [ -d "$1" ] && ( df -k "${1}" | awk 'NR == 2 { if (NF < 4) { getline; x = $1 } else { x = $2 }; printf "%d", x / (1024 * 1024)}' ) | ||
| } |
There was a problem hiding this comment.
Comment says this function extracts free space, but the implementation returns total filesystem size (df total blocks). Please update the comment to avoid misleading callers.
| *%) | ||
| total=$(getTotalSpace "$path") | ||
| percent=$(echo "${size}" | tr -d "%") | ||
| size=$((${total}*${percent}/100)) | ||
| stringToGiB "${size}G" size | ||
| echo "${size}G" |
There was a problem hiding this comment.
The new percentage-handling branch uses global variables (path/total/percent) instead of the function argument and doesn't declare locals, which makes getSizeOfPool fragile when called from other contexts. Consider introducing a local path="$1" and local total/percent variables and using consistent indentation.
| [concurrent_randread] | ||
| description=Concurrent Random READ | ||
| rw=randread | ||
| rw=read | ||
| ; check checksum during read |
There was a problem hiding this comment.
In the [concurrent_randread] job, 'rw' is defined twice; the later 'rw=read' overrides 'rw=randread', so the job is not actually random read. Remove the duplicate or set the intended mode once.
| ; | ||
| ; NOTE: ALL READ JOBS DELETE TEST FILE AFTER BENCHMARK | ||
| ; |
There was a problem hiding this comment.
The header note says all read jobs delete the test file after benchmark, but seqread currently doesn't set unlink=1. Either update the note or add unlink=1 to the seqread job to match the documented behavior.
| rw=randread | ||
| ; inremental random offest | ||
| rw_sequencer=sequential |
There was a problem hiding this comment.
Typo in comment: 'inremental random offest' should be 'incremental random offset'.
| } catch (InterruptedException e) { | ||
| SCITAGS_LOGGER.debug( | ||
| "scitags lifecycle=start abort reason=interrupted protocol={} pnfsid={} transferTag={} message={}", | ||
| protocolName(), | ||
| _mover.getFileAttributes().getPnfsId(), | ||
| transferTag(), | ||
| formatError(e)); | ||
| InterruptedException why = _explanation == null ? e : | ||
| (InterruptedException) (new InterruptedException(_explanation).initCause(e)); | ||
| _completionHandler.failed(why, null); | ||
| } catch (Throwable t) { | ||
| SCITAGS_LOGGER.debug( | ||
| "scitags lifecycle=start abort reason=execution-failed protocol={} pnfsid={} transferTag={} message={}", | ||
| protocolName(), | ||
| _mover.getFileAttributes().getPnfsId(), | ||
| transferTag(), | ||
| formatError(t)); | ||
| _completionHandler.failed(t, null); | ||
| } | ||
| } | ||
|
|
||
| private void handleChecksumMover() { | ||
| MoverProtocol mover = _mover.getMover(); | ||
|
|
||
| if (mover instanceof ChecksumMover) { | ||
| ChecksumMover cm = (ChecksumMover) mover; | ||
| cm.acceptIntegrityChecker(_mover::addExpectedChecksum); | ||
| } | ||
| } | ||
|
|
||
| private void runMoverForRead(RepositoryChannel fileIoChannel) throws Exception { | ||
| _mover.getMover() | ||
| .runIO(_mover.getFileAttributes(), fileIoChannel, _mover.getProtocolInfo(), | ||
| _mover.getIoMode()); | ||
| } | ||
|
|
||
| private void tryToSync(RepositoryChannel channel) throws IOException { | ||
| if (channel.isOpen()) { | ||
| try { | ||
| channel.sync(); | ||
| } catch (SyncFailedException e) { | ||
| channel.sync(); | ||
| LOGGER.info("First sync failed [{}], but second sync succeeded", e); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private void runMoverForWrite(RepositoryChannel fileIoChannel) throws Exception { | ||
| handleChecksumMover(); | ||
| try { | ||
| _mover.getMover() | ||
| .runIO(_mover.getFileAttributes(), fileIoChannel, _mover.getProtocolInfo(), | ||
| _mover.getIoMode()); | ||
| } finally { | ||
| tryToSync(fileIoChannel); | ||
| } | ||
| } | ||
|
|
||
| private String protocolName() { | ||
| return _mover.getProtocolInfo().getProtocol().toLowerCase(); | ||
| } | ||
|
|
||
| private String transferTag() { | ||
| String transferTag = _mover.getProtocolInfo().getTransferTag(); | ||
| return transferTag == null || transferTag.isEmpty() ? "-" : transferTag; | ||
| } |
There was a problem hiding this comment.
protocolName()/transferTag() dereference _mover.getProtocolInfo() without null checks. Since PoolIoFileMessage allows null ProtocolInfo, this logging path can throw NPE when handling exceptions, obscuring the original error. Please handle null protocolInfo safely.
| private PostTransferService _postTransferService; | ||
| private TransferLifeCycle _transferLifeCycle; | ||
|
|
||
|
|
||
| @Required | ||
| public void setPostTransferService(PostTransferService postTransferService) { | ||
| _postTransferService = postTransferService; | ||
| } | ||
|
|
||
| public void setTransferLifeCycle(TransferLifeCycle transferLifeCycle) { | ||
| _transferLifeCycle = transferLifeCycle; | ||
| } |
There was a problem hiding this comment.
_transferLifeCycle is set via setter but never read/used. If this wiring is intended to enable SciTags markers for these movers, the service should actually invoke it (similar to NettyTransferService), otherwise the field and Spring property are dead code.
| * Determine experiment ID, initially from the ProtocolInfo (xroot/http), | ||
| * if that fails then fallback to the Subject's primary FQAN. | ||
| * | ||
| * @param protocolInfo the ProtocolInfo object containing transfer-related metadata | ||
| * @param subject the Subject representing the user or entity associated with the transfer | ||
| * @return the experiment ID, or -1 if it cannot be determined | ||
| */ |
There was a problem hiding this comment.
Javadoc for getExperimentId says it returns "-1 if it cannot be determined", but the method actually returns OptionalInt.empty(). Please adjust the @return description accordingly.
| <!-- Java 8 Datatypes --> | ||
| <dependency> | ||
| <groupId>com.fasterxml.jackson.datatype</groupId> | ||
| <artifactId>jackson-datatype-jdk8</artifactId> | ||
| </dependency> | ||
| <dependency> |
There was a problem hiding this comment.
This dependency block is indented with tabs and doesn't match the surrounding XML formatting. Please reformat to consistent spacing to avoid style-checker issues and reduce diff noise.
| <!-- Java 8 Datatypes --> | |
| <dependency> | |
| <groupId>com.fasterxml.jackson.datatype</groupId> | |
| <artifactId>jackson-datatype-jdk8</artifactId> | |
| </dependency> | |
| <dependency> | |
| <!-- Java 8 Datatypes --> | |
| <dependency> | |
| <groupId>com.fasterxml.jackson.datatype</groupId> | |
| <artifactId>jackson-datatype-jdk8</artifactId> | |
| </dependency> | |
| <dependency> |
| import org.junit.Before; | ||
| import org.junit.Test; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
|
|
There was a problem hiding this comment.
Unused import: java.util.concurrent.atomic.AtomicInteger is never referenced in this test class. Please remove it to keep the test clean (and to avoid style-tool failures if enabled).
dcache pool benchmarkcommand