Skip to content

Add initial calo digi to fragment module and a validation module#1774

Open
michaelmackenzie wants to merge 4 commits intoMu2e:mainfrom
michaelmackenzie:CaloDigiToFragment
Open

Add initial calo digi to fragment module and a validation module#1774
michaelmackenzie wants to merge 4 commits intoMu2e:mainfrom
michaelmackenzie:CaloDigiToFragment

Conversation

@michaelmackenzie
Copy link
Contributor

@michaelmackenzie michaelmackenzie commented Mar 26, 2026

I used AI to create a calo digi -> fragment module (with many iterations to get it to work). I also had it create a validation module. Here's an example fcl to test it:

#include "Offline/fcl/minimalMessageService.fcl"
#include "Offline/fcl/standardServices.fcl"

process_name : DigiTest
source : {
  module_type : RootInput
  fileNames   : @nil
}

services : @local::Services.Reco
services.GeometryService.bFieldFile     : "Offline/Mu2eG4/geom/bfgeom_reco_v01.txt"

physics : {
  producers : {
    CaloDigisToFragments : {
      module_type : CaloDigisToFragments
      diagLevel : 0
      useOfflineID : true
      caloDigiTag : "CaloDigiMaker"
    }
    CaloFragmentsToDigis : {
      module_type   : "CaloDigisFromDTCEvents"
      dataType : 0
      diagLevel : 0
      useOfflineID : true
      useDTCROCID : false
    }
  }
  analyzers : {
    CompareCaloDigis : {
      module_type : CaloDigiCollectionsComparator
      referenceTag : "CaloDigiMaker"
      candidateTag : "CaloFragmentsToDigis"
      failOnMismatch : true
      diagLevel : 1
    }
  }
  TrigPath      : [ CaloDigisToFragments, CaloFragmentsToDigis ]
  EndPath       : [ CompareCaloDigis ]
  trigger_paths : [ TrigPath ]
  end_paths     : [ EndPath ]
}

Where on 1k events I had no errors:
mu2e -c test_digi_to_frag.fcl -S mu2e-trig-config/ci/data_files.txt -n 1000

This should be checked by experts, but it's just to get the ball rolling on this conversion code

@FNALbuild
Copy link
Collaborator

Hi @michaelmackenzie,
You have proposed changes to files in these packages:

  • DAQ

which require these tests: build.

@Mu2e/write, @Mu2e/fnalbuild-users have access to CI actions on main.

⌛ The following tests have been triggered for de3ca7d: build (Build queue - API unavailable)

About FNALbuild. Code review on Mu2e/Offline.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces DAQ-side support for turning reconstructed calorimeter digis back into DTCEVT fragments, plus a small validation analyzer to compare two CaloDigiCollections (useful for round-trip checks through fragmentization/decoding).

Changes:

  • Add CaloDigisToFragments EDProducer to build artdaq::Fragments (DTCEVT) from CaloDigiCollection.
  • Add CaloDigiCollectionsComparator EDAnalyzer to compare a “reference” and “candidate” CaloDigiCollection.
  • Register both modules in DAQ/CMakeLists.txt.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
DAQ/src/CaloDigisToFragments_module.cc New producer that encodes calo digis into DTCLib event/subevent/datablock buffers and emits a DTCEVT fragment.
DAQ/src/CaloDigiCollectionsComparator_module.cc New analyzer to normalize/sort and diff two CaloDigiCollections, optionally throwing on mismatch.
DAQ/CMakeLists.txt Adds plugin build rules for the two new modules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +111 to +116
// Each hit occupies 12-byte chunks: 12-byte header + packed 12-bit samples + chunk padding.
size_t hitBytes = static_cast<size_t>(12 * std::ceil((12.0 + 1.5 * waveform.size()) / 12.0));
size_t waveformBytes = hitBytes - 12;
std::vector<uint8_t> rawBytes(waveformBytes, 0xFF);
std::memcpy(rawBytes.data(), rawWords.data(), std::min(rawBytes.size(), rawWords.size() * sizeof(uint16_t)));
return rawBytes;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::ceil is used (e.g., to compute hitBytes / transferByteCount) but <cmath> is not included in this file, which will fail to compile on standard toolchains. Add the appropriate header (and consider keeping math helpers near these calculations).

Copilot uses AI. Check for mistakes.
@oksuzian
Copy link
Collaborator

Let me review this pull request in detail.Now let me look at the existing decoder module to understand the expected data format conventions:Here is my review of PR #1774. I've cross-referenced the new encoder (CaloDigisToFragments) against the existing decoder (CaloDigisFromDTCEvents) to identify issues.


Issues Found

1. 🔴 Wrong namespace — module is declared in art:: instead of mu2e::

namespace art {
class CaloDigisToFragments;
}

The module is placed in the art namespace. All other Mu2e modules in this repo follow the same pattern (e.g., CaloDigiFromDTCEvents is also in art::), so this is consistent with the existing codebase. However, this is widely considered an anti-pattern — user code shouldn't inject types into the art namespace. The comparator module correctly uses namespace mu2e. Consider making this consistent.

2. 🔴 uint16_t transferByteCount will overflow for large events

  uint16_t transferByteCount{0};
  uint16_t packetCount{0};

transferByteCount is a uint16_t, which has a max value of 65535. If a ROC link has many hits with large waveforms, the accumulation loop:

block.transferByteCount += static_cast<uint16_t>(12 * std::ceil((12.0 + 1.5 * wf.size()) / 12.0));

will silently overflow and wrap around, leading to a truncated/corrupted data block. packetCount has the same issue. These should be uint32_t (or at least size_t) during accumulation, and then checked/clamped before being written into the header.

3. 🔴 Potential buffer overwrite — pos is not bounds-checked against transferByteCount

In buildDtcEventFromDigis, the code writes hit headers and waveforms into thisBlock.allocBytes at offset pos, but never checks whether pos exceeds block.transferByteCount. If the byte-count calculation is wrong (or overflows per issue #2), the memcpy calls will write past the allocated block, causing undefined behavior/memory corruption.

4. 🟡 encodeHitHeader writes 96 bits (indices 0–95) but only 12 bytes are copied

The encodeHitHeader function uses writeBits with the last field ending at bit 86 + 10 = 96, which fits exactly in 6 × uint16_t = 12 bytes. This is correct, but the memcpy uses a hardcoded 12:

std::memcpy(thisBlock.allocBytes->data() + pos, rawHeader.data(), 12);

This relies on sizeof(uint16_t) * 6 == 12, which is true but fragile. Consider using sizeof(rawHeader) or rawHeader.size() * sizeof(uint16_t) to be explicit.

5. 🟡 waveformMaximumIndex is defined but never called

size_t art::CaloDigisToFragments::waveformMaximumIndex(const std::vector<uint16_t>& waveform) {
  if (waveform.empty()) {
    return 0;
  }
  return std::distance(waveform.begin(), std::max_element(waveform.begin(), waveform.end()));
}

This static method is dead code and should be removed.

6. 🟡 digi.t0() truncated to uint16_t — potential data loss

hit.Time = static_cast<uint16_t>(digi.t0());

The CaloDigi stores t0 as int. If t0 exceeds 65535 (or is negative), the cast silently truncates/wraps. The decoder reads this back as uint16_t and casts to int, so it won't recover the original value. Consider at minimum a range check with a warning.

7. 🟡 InPayloadEventWindowTag and Baseline are hardcoded to 0

hit.InPayloadEventWindowTag = 0;
hit.Baseline = 0;

These fields are zeroed during encoding but the original CaloDigi doesn't store them, so the round-trip through encode → decode will produce different raw packets than the original data. This is acceptable for a digi→fragment conversion, but should be documented (the round-trip validation comparison only checks SiPMID, t0, peakpos, and waveform, which don't include these fields).

8. 🟡 detectorID is not set when useOfflineID_ == false

uint16_t detectorID = 0;
if (useOfflineID_) {
  ...
  detectorID = offId.detType();
} else {
  rawId = mu2e::CaloRawSiPMId(static_cast<uint16_t>(digi.SiPMID()));
}

When useOfflineID_ is false, detectorID remains 0. This means hit.DetectorID will be incorrect for non-disk-0 detectors. The decoder doesn't use DetectorID for the digi reconstruction, but it could matter for downstream consumers of the raw data.

9. 🟡 Empty-ROC data blocks emitted for links with no hits

for (uint8_t linkID = 0; linkID < kCaloRocsPerDtc; ++linkID) {
    auto& block = linkMap[linkID];

This loop creates a data block for every link ID 0–5 even for DTCs that only have hits on some links. The decoder (CaloDigisFromDTCEvents) handles empty blocks, so this works, but it adds unnecessary overhead to the fragment. Additionally, linkMap[linkID] inserts default-constructed entries into the map for links that never had any hits.

10. 🟡 CMakeLists.txt indentation is inconsistent

    cet_build_plugin(CaloDigisToFragments art::module
        REG_SOURCE src/CaloDigisToFragments_module.cc
        LIBRARIES REG
      Offline::DAQ
      Offline::CaloConditions

The LIBRARIES REG block uses a different indentation level than the surrounding entries in the file. Look at the existing CaloDigisFromDTCEvents entry for the expected style.

11. 🟢 Minor: PR description has a typo in the test command

The PR body shows:

mu2e -c test_digi_to_frag.fcl -S mu2e-trig-config/ci/data_files.txt -n 1000u2e -c test_digi_to_frag.fcl ...

This appears to be a duplicated/mangled command string.


Summary

Severity Count Key concerns
🔴 High 3 uint16_t overflow on byte count, potential buffer overwrite, namespace choice
🟡 Medium 6 Dead code, truncation of t0, hardcoded zeros, detectorID gap, unnecessary empty blocks, CMake style
🟢 Low 1 PR description typo

The most critical issue is the uint16_t transferByteCount overflow — this can silently corrupt the output fragment for events with many calorimeter hits on a single ROC link. The pos bounds check is also important to prevent memory corruption.

@FNALbuild
Copy link
Collaborator

☀️ The build tests passed at de3ca7d.

Test Result Details
test with Command did not list any other PRs to include
merge Merged de3ca7d at 5b02c13
build (prof) Log file. Build time: 04 min 09 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 2 files
clang-tidy ➡️ 4 errors 88 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at de3ca7d after being merged into the base branch at 5b02c13.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@michaelmackenzie
Copy link
Contributor Author

Responding to the Opus review:

  1. Many of the DAQ modules are art namespace instead of mu2e namespace
  2. I think this is true in general, that this is the variable size in the fragment -> digi?
  3. I'm again not sure if this is possible given that these started as fragments, a calo expert would need to answer
  4. Hard-coded copies like this are inherited from how the decoding/overlays are done I believe
  5. I think this was useful in early iterations, I can delete it but it might be useful in the future
  6. It comes from this data type in fragment -> digi I believe?
  7. I don't think we have data in the digi product to make these more meaningful
  8. I think this is expected
  9. Empty packets are expected for ROC links without hits, errors are thrown if they're missing
  10. AI updated this after the copilot review
  11. I fixed this

Let me know if there are changes I should make given these comments or expert opinions!

@giro94
Copy link
Collaborator

giro94 commented Mar 27, 2026

Hi @michaelmackenzie , before I start reviewing the code, isn't this module: https://github.com/Mu2e/Offline/blob/main/DAQ/src/ArtBinaryPacketsFromDigis_module.cc already built for this purpose?

@michaelmackenzie
Copy link
Contributor Author

@giro94 I can have AI try to test and update that module instead of separating out subsystem digi -> fragment production into a separate model if you think this is best. Do you know if it's functional?

mu2e::CaloRawSiPMId rawId;
uint16_t detectorID = 0;

if (useOfflineID_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The useOfflineID_ flag in CaloDigiMaker is used for non-disk topologies (like HEERC tests for example).
I think this module can assume disk-only data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll remove this use-case

rawId = calodaqconds.rawId(offId);
detectorID = offId.detType();
} else {
rawId = mu2e::CaloRawSiPMId(static_cast<uint16_t>(digi.SiPMID()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep useOfflineID_ == false cases, we should also add the useDTCROCID_ logic in https://github.com/Mu2e/Offline/blob/main/DAQ/src/CaloDigisFromDTCEvents_module.cc#L198C13-L199

}

if (!rawId.isValid()) {
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw exception rather than skip?

}

auto const diracID = rawId.dirac();
auto const dtcID = static_cast<uint8_t>(diracID / kCaloRocsPerDtc);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DTCs are not aligned with board (dirac) IDs.
We can't determine them from boardID unless we implement a map

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, for the purpose of this module, we can ignore that the DTC numbers are not the real ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was somewhat my though, have it come up with some sort of DTC ID/ROC ID etc. numbering to create some sort of reasonable fragment collection, but not necessarily be 100% correct in figuring out the actual topology. But if we make maps, we can always update this


auto const diracID = rawId.dirac();
auto const dtcID = static_cast<uint8_t>(diracID / kCaloRocsPerDtc);
auto const linkID = static_cast<uint8_t>(diracID % kCaloRocsPerDtc);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants