diff --git a/docs/api-reference/environment-variables.md b/docs/api-reference/environment-variables.md index 4770a6f..29dfacf 100644 --- a/docs/api-reference/environment-variables.md +++ b/docs/api-reference/environment-variables.md @@ -33,10 +33,10 @@ Note: ## Parent Selection -| Variable | Description | Default | Example | -| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------- | --------------------------------------------------------------------- | -| `PARENT_FILENAME_PROMOTE` | Substrings to promote as parent filenames. Supports empty string for negative matching, the `sequence` keyword and automatic sequence detection for burst photos. | `cover,edit,crop,hdr,biggestNumber` | `,_edited` or `edit,raw` or `COVER,sequence` or `0000,0001,0002,0003` | -| `PARENT_EXT_PROMOTE` | Extensions to promote as parent files | `.jpg,.png,.jpeg,.heic,.dng` | `.jpg,.dng` | +| Variable | Description | Default | Example | +| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------- | ---------------------------------------------------------------------------------------------- | +| `PARENT_FILENAME_PROMOTE` | Substrings to promote as parent filenames. Supports empty string for negative matching, the `sequence` keyword, automatic sequence detection for burst photos, and the `biggestNumber` / `biggestSize` / `smallestSize` magic keywords for tie-breakers. | `cover,edit,crop,hdr,biggestNumber` | `,_edited` or `edit,raw` or `COVER,sequence` or `0000,0001,0002,0003` or `_edited,biggestSize` | +| `PARENT_EXT_PROMOTE` | Extensions to promote as parent files | `.jpg,.png,.jpeg,.heic,.dng` | `.jpg,.dng` | ### Empty String for Negative Matching diff --git a/docs/features/edited-photo-promotion.md b/docs/features/edited-photo-promotion.md index f6c03d2..0ebdc11 100644 --- a/docs/features/edited-photo-promotion.md +++ b/docs/features/edited-photo-promotion.md @@ -127,6 +127,54 @@ Simply remove `biggestNumber` from your promote list: PARENT_FILENAME_PROMOTE=edit,crop,hdr ``` +## Promote by File Size (`biggestSize` / `smallestSize`) + +Sometimes the filename alone can't tell you which version of a photo was the post-processed +export. A typical example: shooting JPG+RAW on a DSLR and exporting an edited JPG from +Lightroom — all three end up sharing the same base filename but the exported JPG is +substantially larger than the camera's original. Use the `biggestSize` (or, symmetrically, +`smallestSize`) magic keyword in `PARENT_FILENAME_PROMOTE` to break ties on file size. + +### Configuration + +```bash +# Always promote the largest file when other criteria don't resolve +PARENT_FILENAME_PROMOTE=biggestSize + +# Combine with substring matches: pick "_edited" files first, then largest within that bucket +PARENT_FILENAME_PROMOTE=_edited,biggestSize + +# Symmetric: promote the smallest file (useful for thumbnails / reduced exports) +PARENT_FILENAME_PROMOTE=smallestSize +``` + +### Example: DSLR with JPG + RAW + Lightroom Export + +``` +Files: Size +- IMG_5821.JPG (camera JPG) 4.8 MB +- IMG_5821.CR2 (camera RAW) 28.0 MB +- IMG_5821.jpg (exported from Lightroom) 12.4 MB + +With PARENT_FILENAME_PROMOTE=biggestSize and PARENT_EXT_PROMOTE=.jpg,.jpeg,.png: +1. IMG_5821.jpg ← winner (largest among the JPGs) +2. IMG_5821.JPG +3. IMG_5821.CR2 (lower extension priority) +``` + +### How It Works + +- `biggestSize` and `smallestSize` are evaluated **after** extension preferences, so a + high-priority extension still wins over an unrelated larger file (e.g. a 28 MB RAW + doesn't outrank a 12 MB JPG when `.jpg` is listed first in `PARENT_EXT_PROMOTE`). +- The tie-break uses `exifInfo.fileSizeInByte` from Immich. Assets with a positive size + are sorted first (by size, in the chosen direction); assets **without** exif data are + bucketed at the back and never win as parent. Missing exif is treated as "no data", + not as size zero. +- If both `biggestSize` and `smallestSize` are present, `biggestSize` wins. +- The keywords compose with `biggestNumber` and substring matchers; ordering in the + list does not affect their precedence (it's hard-coded by sort layer). + ## Technical Details The `biggestNumber` feature: diff --git a/pkg/stacker/stacker_promote.go b/pkg/stacker/stacker_promote.go index 5100958..fd501a6 100644 --- a/pkg/stacker/stacker_promote.go +++ b/pkg/stacker/stacker_promote.go @@ -43,6 +43,19 @@ func isSequenceKeyword(promote string) bool { return promote == "sequence" || strings.HasPrefix(promote, "sequence:") } +/************************************************************************************************** +** isMagicPromoteKeyword reports whether a promote-list entry is a tie-breaker keyword that should +** never be treated as a filename substring. These keywords influence ordering through a side path +** (numeric suffix, file size, etc.) rather than substring matching. +**************************************************************************************************/ +func isMagicPromoteKeyword(promote string) bool { + switch promote { + case "biggestNumber", "biggestSize", "smallestSize": + return true + } + return isSequenceKeyword(promote) +} + /************************************************************************************************** ** extractSequencePattern extracts the pattern from a sequence keyword. ** Examples: @@ -80,7 +93,7 @@ func getPromoteIndex(value string, promoteList []string) int { if emptyStringIndex == -1 { emptyStringIndex = idx } - } else if promote != "biggestNumber" { + } else if !isMagicPromoteKeyword(promote) { hasNonEmptyStrings = true loweredPromote := strings.ToLower(promote) if strings.Contains(loweredValue, loweredPromote) { @@ -100,9 +113,10 @@ func getPromoteIndex(value string, promoteList []string) int { return emptyStringIndex } - // If 'biggestNumber' is in the promote list, assign its index to unmatched files + // Magic keywords (biggestNumber, biggestSize, smallestSize) at the end of the list act as a + // "no match" fallback bucket so the sort routine can still apply a tie-breaker afterwards. for idx, promote := range promoteList { - if promote == "biggestNumber" { + if isMagicPromoteKeyword(promote) { return idx } } @@ -168,7 +182,7 @@ func getPromoteIndexWithMode(value string, promoteList []string, matchMode strin if emptyStringIndex == -1 { emptyStringIndex = idx // Only record the first empty string } - } else if !isSequenceKeyword(promote) { + } else if !isMagicPromoteKeyword(promote) { hasNonEmptyStrings = true // Check for match while we're iterating loweredPromote := strings.ToLower(promote) @@ -312,9 +326,13 @@ func getPromoteIndexWithMode(value string, promoteList []string, matchMode strin } } - // If 'biggestNumber' is in the promote list, assign its index to unmatched files + // Size/number tie-breaker keywords act as a fallback bucket for unmatched files so the + // sort routine can still apply a tie-breaker. Sequence keywords are intentionally NOT + // included here — they have their own resolution path above and falling through to + // len(promoteList) is the documented behavior when a sequence pattern is absent. for idx, promote := range promoteList { - if promote == "biggestNumber" { + switch promote { + case "biggestNumber", "biggestSize", "smallestSize": return idx } } @@ -414,7 +432,7 @@ func detectPromoteMatchMode(promoteList []string, sampleFilename string) string for _, promote := range promoteList { if isSequenceKeyword(promote) { hasSequenceKeyword = true - } else if promote != "" && promote != "biggestNumber" { + } else if promote != "" && !isMagicPromoteKeyword(promote) { hasNonSequenceItems = true } } @@ -465,7 +483,7 @@ func isSequencePattern(promoteList []string) bool { patternRegex := regexp.MustCompile(`^(.*?)(\d+)(.*?)$`) for _, item := range promoteList { - if item == "biggestNumber" { + if isMagicPromoteKeyword(item) { continue } @@ -612,6 +630,17 @@ func buildCriteriaIdentifier(key string, index int) string { return fmt.Sprintf("%s:%d", key, index) } +/************************************************************************************************** +** assetSize returns the file size in bytes for an asset, or 0 when Immich didn't return exif info. +** Used by the biggestSize / smallestSize promote keywords as a sort tie-breaker. +**************************************************************************************************/ +func assetSize(a utils.TAsset) int64 { + if a.ExifInfo == nil { + return 0 + } + return a.ExifInfo.FileSizeInByte +} + /************************************************************************************************** ** extractLargestNumberSuffix finds a numeric suffix at the end of the base filename (before the ** extension), but ONLY if it appears after a delimiter. If no delimiters are present, always @@ -657,11 +686,16 @@ func extractLargestNumberSuffix(filename string, delimiters []string) int { /************************************************************************************************** ** sortStack sorts a stack of assets based on filename and extension priority. ** The order is: -** 1. Regex-based promotion (if criteria has regex with promote_index) -** 2. Promoted filenames (PARENT_FILENAME_PROMOTE, comma-separated, order matters) -** 3. Promoted extensions (PARENT_EXT_PROMOTE, comma-separated, order matters) -** 4. Extension priority (jpeg > jpg > png > others) -** 5. Alphabetical order (case-sensitive) +** 1. Regex-based promotion (if criteria has regex with promote_index) +** 2. Promoted filenames (PARENT_FILENAME_PROMOTE, comma-separated, order matters) +** 3. 'biggestNumber' tie-break within the same PARENT_FILENAME_PROMOTE bucket — runs early +** because it's filename-suffix based (e.g. picks photo~3.jpg over photo~2.jpg) +** 4. Promoted extensions (PARENT_EXT_PROMOTE, comma-separated, order matters) +** 5. Extension priority (jpeg > jpg > png > others) +** 6. 'biggestSize' / 'smallestSize' tie-break — runs late because it's metadata-based, +** so configured extension preferences still win (e.g. a small .jpg beats a huge .cr2 +** when .jpg is first in PARENT_EXT_PROMOTE) +** 7. Alphabetical order (case-sensitive) ** ** @param stack - List of assets to sort ** @param parentFilenamePromote - Comma-separated list of filename substrings to promote @@ -689,6 +723,10 @@ func sortStack(stack []utils.TAsset, parentFilenamePromote string, parentExtProm matchMode = detectPromoteMatchMode(promoteSubstrings, stack[0].OriginalFileName) } + hasBiggestSize := utils.Contains(promoteSubstrings, "biggestSize") + hasSmallestSize := utils.Contains(promoteSubstrings, "smallestSize") + sizeSortActive := hasBiggestSize || hasSmallestSize + sort.SliceStable(stack, func(i, j int) bool { // First, check regex-based promotion iRegexPromoteIdx := getRegexPromoteIndex(stack[i].ID, promoteData, stackCriteria, promotionMaps) @@ -739,6 +777,30 @@ func sortStack(stack []utils.TAsset, parentFilenamePromote string, parentExtProm return rankI > rankJ } + // Metadata-based tie-break: 'biggestSize' / 'smallestSize'. Placed after extension + // preferences so e.g. a 28MB .cr2 doesn't beat a 12MB .jpg when the user listed .jpg + // first in PARENT_EXT_PROMOTE. + // + // Bucket partition for transitivity: assets WITH a positive size form the front + // bucket (sorted by size), assets WITHOUT exif data form the back bucket (fall + // through to alphabetical). The "has size" predicate is a property of a single + // asset, not of the pair, so the comparator is transitive. + if sizeSortActive { + iSize := assetSize(stack[i]) + jSize := assetSize(stack[j]) + iHasSize := iSize > 0 + jHasSize := jSize > 0 + if iHasSize != jHasSize { + return iHasSize // assets with exif data come before those without + } + if iHasSize && iSize != jSize { + if hasBiggestSize { + return iSize > jSize // largest first + } + return iSize < jSize // smallest first + } + } + return iOriginalFileNameNoExt < jOriginalFileNameNoExt }) diff --git a/pkg/stacker/stacker_size_promote_test.go b/pkg/stacker/stacker_size_promote_test.go new file mode 100644 index 0000000..d8ab130 --- /dev/null +++ b/pkg/stacker/stacker_size_promote_test.go @@ -0,0 +1,262 @@ +package stacker + +import ( + "encoding/json" + "testing" + + "github.com/majorfi/immich-stack/pkg/utils" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +/************************************************************************************************** +** sizedAssetFactory builds a TAsset with the given filename and size in bytes. Used here rather +** than assetFactory because size-promotion tests need to control ExifInfo.FileSizeInByte directly. +**************************************************************************************************/ +func sizedAssetFactory(filename string, sizeInByte int64) utils.TAsset { + return utils.TAsset{ + ID: filename, + OriginalFileName: filename, + ExifInfo: &utils.TExifInfo{FileSizeInByte: sizeInByte}, + } +} + +/************************************************************************************************** +** emptyPromoteData returns an empty thread-safe promote data store + empty promotion maps. +** Used to satisfy sortStack's signature when no regex-promotion criteria are exercised. +**************************************************************************************************/ +func emptyPromoteData() (*safePromoteData, map[int]map[string]int) { + return &safePromoteData{data: make(map[string]map[string]string)}, make(map[int]map[string]int) +} + +/************************************************************************************************** +** TestSortStack_BiggestSize covers the new biggestSize magic keyword: when present in the +** promote list, the asset with the largest exifInfo.fileSizeInByte wins the parent slot. +**************************************************************************************************/ +func TestSortStack_BiggestSize(t *testing.T) { + data, maps := emptyPromoteData() + assets := []utils.TAsset{ + sizedAssetFactory("IMG_1234.jpg", 2_500_000), // small original + sizedAssetFactory("IMG_1234_b.jpg", 8_400_000), // large exported + sizedAssetFactory("IMG_1234_a.jpg", 5_100_000), // medium + } + sorted := sortStack(assets, "biggestSize", "", []string{"_", "."}, + utils.DefaultCriteria, data, maps) + assert.Equal(t, "IMG_1234_b.jpg", sorted[0].OriginalFileName, + "largest file should be promoted as parent") + assert.Equal(t, "IMG_1234_a.jpg", sorted[1].OriginalFileName) + assert.Equal(t, "IMG_1234.jpg", sorted[2].OriginalFileName) +} + +/************************************************************************************************** +** TestSortStack_SmallestSize is the symmetric counterpart — useful for stacks where the +** thumbnail/reduced variant should win (less common but supported for symmetry). +**************************************************************************************************/ +func TestSortStack_SmallestSize(t *testing.T) { + data, maps := emptyPromoteData() + assets := []utils.TAsset{ + sizedAssetFactory("IMG_a.jpg", 8_000_000), + sizedAssetFactory("IMG_b.jpg", 1_200_000), + sizedAssetFactory("IMG_c.jpg", 4_500_000), + } + sorted := sortStack(assets, "smallestSize", "", []string{"_", "."}, + utils.DefaultCriteria, data, maps) + assert.Equal(t, "IMG_b.jpg", sorted[0].OriginalFileName, + "smallest file should be promoted as parent") +} + +/************************************************************************************************** +** TestSortStack_BiggestSize_WithSubstringMatch: substring promotes still take priority over the +** size tie-breaker. This is the realistic case where a user wants edits to win, with size only +** breaking ties within the matched (or unmatched) buckets. +**************************************************************************************************/ +func TestSortStack_BiggestSize_WithSubstringMatch(t *testing.T) { + data, maps := emptyPromoteData() + assets := []utils.TAsset{ + sizedAssetFactory("IMG_1234.jpg", 9_000_000), // unmatched, huge + sizedAssetFactory("IMG_1234_edited.jpg", 3_000_000), // matches "_edited", small + sizedAssetFactory("IMG_1234_edited2.jpg", 5_500_000), // matches "_edited", medium + } + sorted := sortStack(assets, "_edited,biggestSize", "", []string{"_", "."}, + utils.DefaultCriteria, data, maps) + // "_edited" match wins over the larger unmatched original, then size tie-breaks + // inside the matched bucket. + assert.Equal(t, "IMG_1234_edited2.jpg", sorted[0].OriginalFileName, + "largest _edited file should be parent") + assert.Equal(t, "IMG_1234_edited.jpg", sorted[1].OriginalFileName) + assert.Equal(t, "IMG_1234.jpg", sorted[2].OriginalFileName, + "unmatched original is last despite being largest") +} + +/************************************************************************************************** +** TestSortStack_BiggestSize_MissingExif verifies the "no exif" fall-through: when neither asset +** has exif data we must NOT pin them to size=0=0 (which would short-circuit the tie-break); +** the alphabetical fallback must still kick in. +**************************************************************************************************/ +func TestSortStack_BiggestSize_MissingExif(t *testing.T) { + data, maps := emptyPromoteData() + assets := []utils.TAsset{ + {ID: "b", OriginalFileName: "b.jpg"}, + {ID: "a", OriginalFileName: "a.jpg"}, + } + sorted := sortStack(assets, "biggestSize", "", []string{"_", "."}, + utils.DefaultCriteria, data, maps) + assert.Equal(t, "a.jpg", sorted[0].OriginalFileName, + "with no exif data, fallback alphabetical sort must still apply") +} + +/************************************************************************************************** +** TestSortStack_BiggestSize_PartialExif: assets with a positive size always rank ahead of +** assets without exif data. The "has size" predicate is per-asset, not per-pair, which keeps +** the comparator transitive even though the alphabetical fall-through is non-monotonic with +** respect to size. +**************************************************************************************************/ +func TestSortStack_BiggestSize_PartialExif(t *testing.T) { + data, maps := emptyPromoteData() + assets := []utils.TAsset{ + {ID: "z", OriginalFileName: "z.jpg"}, // no exif + sizedAssetFactory("a.jpg", 5_000_000), + } + sorted := sortStack(assets, "biggestSize", "", []string{"_", "."}, + utils.DefaultCriteria, data, maps) + assert.Equal(t, "a.jpg", sorted[0].OriginalFileName, + "asset with exif data ranks ahead of asset without, even if alphabetically later") + assert.Equal(t, "z.jpg", sorted[1].OriginalFileName, + "missing-exif asset goes to the back bucket") +} + +/************************************************************************************************** +** TestSortStack_BiggestSize_TransitivityWithMissingExif is the regression test for the bug +** Copilot caught on PR #64. The original implementation applied the size tie-break pairwise +** (only when both sides had positive size, falling through to alphabetical otherwise), which +** produced a non-transitive comparator with 3+ assets: +** +** - z.jpg (10MB) vs a.jpg (5MB) → both have size → z first by size +** - z.jpg (10MB) vs m.jpg (none) → alphabetical fall-through → m first +** - a.jpg (5MB) vs m.jpg (none) → alphabetical fall-through → a first +** +** => z