Skip to content

RI-8159: [BE] vector set similarity#5867

Merged
dantovska merged 13 commits into
mainfrom
be/feature/RI-8159/vector-set-similarity
May 12, 2026
Merged

RI-8159: [BE] vector set similarity#5867
dantovska merged 13 commits into
mainfrom
be/feature/RI-8159/vector-set-similarity

Conversation

@dantovska
Copy link
Copy Markdown
Contributor

@dantovska dantovska commented May 8, 2026

What

Add two backend endpoints for the vector-set similarity-search feature:

  • POST /vector-set/similarity-search — runs VSIM against the supplied key and returns the matches (name + score + optional attributes).
  • POST /vector-set/similarity-search/preview — returns the same VSIM invocation as a CLI-friendly string so the FE can show users the command they're about to run.

Both endpoints accept a single SimilaritySearchDto and pick the query mode (ELE / VALUES / FP32) at runtime — exactly one of elementName, vectorValues, or vectorFp32 must be supplied. WITHSCORES and WITHATTRIBS are always appended so the response shape is stable; COUNT / FILTER are forwarded only when present.
The command builder, preview formatter, reply parser, and shared VsimTokenWriter strategy live in vector-set.utils.ts so the executable command and its preview can't drift on clause order or rendering. VSIM is also registered as a built-in ioredis command.

Testing


Note

Medium Risk
Adds new API endpoints that execute Redis VSIM and parse its flat reply format, so correctness depends on command construction and reply parsing across RESP2/RESP3 types. Risk is mitigated by extensive unit tests but touches runtime Redis command execution.

Overview
Adds vector-set similarity search support via POST /vector-set/similarity-search (executes VSIM with mandatory WITHSCORES WITHATTRIBS and optional COUNT/FILTER) and POST /vector-set/similarity-search/preview (returns a CLI-safe command string preview).

Introduces new DTOs/responses for search results and preview, registers VSIM as a built-in ioredis command, and centralizes VSIM command building/preview formatting/reply parsing in vector-set.utils.ts with tests covering query-mode validation (ELE/VALUES/FP32), quoting/escaping, and score/attributes decoding.

Reviewed by Cursor Bugbot for commit da7c971. Bugbot is set up for automated code reviews on this repo. Configure here.

@dantovska dantovska self-assigned this May 8, 2026
@dantovska dantovska requested a review from a team as a code owner May 8, 2026 04:13
@dantovska dantovska changed the title Be/feature/ri 8159/vector set similarity RI-8159: [BE] vector set similarity May 8, 2026
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 8, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Comment thread redisinsight/api/src/modules/browser/vector-set/vector-set.utils.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Code Coverage - Integration Tests

Status Category Percentage Covered / Total
🟡 Statements 79.49% 17484/21993
🟡 Branches 61.96% 7993/12900
🟡 Functions 67.97% 2418/3557
🟡 Lines 79.06% 16441/20795

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Code Coverage - Backend unit tests

St.
Category Percentage Covered / Total
🟢 Statements 92.63% 15568/16806
🟡 Branches 74.71% 4878/6529
🟢 Functions 86.72% 2423/2794
🟢 Lines 92.47% 14879/16091

Test suite run success

3413 tests passing in 306 suites.

Report generated by 🧪jest coverage report action from da7c971

count: faker.number.int({ min: 1, max: 100 }),
}));

export const searchVectorSetByValuesDtoFactory =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to define 3 separate factories here - it's really a single data type with variations on its data (whether it carries vectorValues, vectorFp32, or something else).

I know it sounds a bit more complex to define a single factory and toggle which fields get prefilled internally (via transient params), but the entity itself is a single one - a vector set - and if it changes at some point, we'd only need to update one factory instead of keeping all the variants in sync.

Here's a quick sketch of what I mean:

interface VectorSetTransientParams {
  variant: VectorVariant;
}

export const vectorSetFactory = Factory.define<VectorSet, VectorSetTransientParams>(
  ({ transientParams }) => {
    const { variant = 'none' } = transientParams;

    return {
      keyName: Buffer.from(`vset:${faker.string.alphanumeric(6)}`),
      elementName: Buffer.from(faker.string.alphanumeric(8)),
      vectorValues: variant === 'values' ? [/* fill defaults */] : undefined,
      vectorFp32:   variant === 'fp32'   ? Buffer.from(/* ... */) : undefined,
      count: faker.number.int({ min: 1, max: 100 }),
    };
  },
);

// Usage:
vectorSetFactory.build({}, { transient: { variant: 'values' } });  // values variant
vectorSetFactory.build({}, { transient: { variant: 'fp32' } });    // fp32 variant

And if a specific test needs custom values, regular params still take precedence over the transient defaults:

vectorSetFactory.build(
  { vectorFp32: Buffer.from([/* ... */]) },
  { transient: { variant: 'fp32' } },
);

Docs for reference:

Happy to pair on it if it'd be easier - totally fine to keep things as-is too if you'd rather not expand scope on this PR!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! not at all, seems reasonable and it is in scope anyway

Comment thread redisinsight/api/src/modules/browser/vector-set/vector-set.utils.ts
@dantovska dantovska requested a review from valkirilov May 11, 2026 11:14
@dantovska dantovska force-pushed the be/feature/RI-8159/vector-set-similarity branch from 88faba9 to fc43cca Compare May 12, 2026 06:56
valkirilov
valkirilov previously approved these changes May 12, 2026
| `count` | number | Maximum number of results (omitted from preview when undefined) |
| `filter` | string | Optional filter expression evaluated against element attributes |

At most one of `elementName`, `vectorValues`, `vectorFp32` may be present. Supplying more than one returns `400`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: As we discussed offline, you can add examples for the various ways of using this API endpoint, but it can happen in a separate PR :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good note. I'll tackle this in a separate PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

description:
'Build a human-readable preview of the VSIM command that the similarity-search endpoint would execute for the supplied DTO. ' +
'Reuses the same internal command builder as the search endpoint so the preview cannot drift from what is actually executed. ' +
'Returns an empty `preview` string when no query payload (`elementName` / `vectorValues` / `vectorFp32`) is supplied.',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why don't we enforce the same validation rules as for the other endpoints - to have at least one correct param passed, since the other validation (when you pass more than a single param) is already in place?

Comment thread redisinsight/api/src/modules/browser/vector-set/vector-set.utils.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4d983b1. Configure here.

Comment thread redisinsight/api/src/modules/browser/vector-set/__tests__/vector-set.factory.ts Outdated
@dantovska dantovska force-pushed the be/feature/RI-8159/vector-set-similarity branch from 4d983b1 to da7c971 Compare May 12, 2026 13:34
@dantovska dantovska merged commit aa51a31 into main May 12, 2026
28 checks passed
@dantovska dantovska deleted the be/feature/RI-8159/vector-set-similarity branch May 12, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants