From 68ee9769cb82c5852b04bd1a593fe1824cc18326 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Tue, 5 May 2026 16:21:52 +0300 Subject: [PATCH 01/13] RDSC-4603 Update RDI public documentation --- .../reference/config-yaml-reference.md | 670 ++++++++++++------ 1 file changed, 454 insertions(+), 216 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 10643e8156..452e9cc55a 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -7,133 +7,135 @@ alwaysopen: false categories: ["redis-di"] aliases: --- +# Redis Data Integration Configuration File + +Configuration file for Redis Data Integration (RDI) source collectors and target connections. -Configuration file for Redis Data Integration (RDI) source collectors and target connections **Properties** -| Name | Type | Description | Required | -| ----------------------------------------------------------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -| [**sources**](#sources)
(Source collectors) | `object` | Defines source collectors and their configurations. Each key represents a unique source identifier, and its value contains specific configuration for that collector
| | -| [**processors**](#processors)
(Data processing configuration) | `object`, `null` | Configuration settings that control how data is processed, including batch sizes, error handling, and performance tuning
| | -| [**targets**](#targets)
(Target connections) | `object` | Configuration for target Redis databases where processed data will be written
| | -| [**secret\-providers**](#secret-providers)
(Secret providers) | `object` | Configuration for secret management providers
| | -| [**metadata**](#metadata)
(Pipeline metadata) | `object` | Pipeline metadata
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**sources**](#sources)
(Source collectors)|`object`|Source collectors that capture changes from upstream databases. Each key is a unique source identifier; the value configures one collector.
|| +|[**targets**](#targets)
(Target connections)|`object`|Target Redis databases where processed records are written. Each key is a target identifier; the value configures the connection.
|| +|[**processors**](#processors)
(Data processing configuration)|`object`, `null`|Settings that control how data is processed, including batch sizes, error handling, and performance tuning.
|| +|[**secret\-providers**](#secret-providers)
(Secret providers)|`object`|External secret providers used to resolve `${...}` references in the configuration.
|| +|[**metadata**](#metadata)
(Pipeline metadata)|`object`|Optional metadata describing this pipeline, such as a display name and description.
|| **Additional Properties:** not allowed - ## sources: Source collectors -Defines source collectors and their configurations. Each key represents a unique source identifier, and its value contains specific configuration for that collector +Source collectors that capture changes from upstream databases. Each key is a unique source identifier; the value configures one collector. + **Properties** (key: `.*`) -| Name | Type | Description | Required | -| ------------------------------------------------------------- | ---------- | ----------------------------------------------------------------------------------------------- | -------- | -| **connection** | | | yes | -| **name**
(Source name) | `string` | User-friendly source name
Maximal Length: `100`
| no | -| **type**
(Collector type) | `string` | Type of the source collector.
Default: `"cdc"`
Enum: `"cdc"`, `"flink"`, `"riotx"`
| yes | -| **active**
(Collector enabled) | `boolean` | Flag to enable or disable the source collector
Default: `true`
| no | -| [**logging**](#sourceslogging)
(Logging configuration) | `object` | Logging configuration for the source collector
| no | -| [**tables**](#sourcestables)
(Tables to capture) | `object` | Defines which tables to capture and how to handle their data
| no | -| [**schemas**](#sourcesschemas)
(Schema names) | `string[]` | Schema names to capture from the source database (schema.include.list)
| no | -| [**databases**](#sourcesdatabases)
(Database names) | `string[]` | Database names to capture from the source database (database.include.list)
| no | -| [**advanced**](#sourcesadvanced)
(Advanced configuration) | `object` | Advanced configuration options for fine-tuning the collector
| no | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**connection**|||yes| +|**name**
(Source name)|`string`|Human-readable name for the source collector. Maximum 100 characters.
Maximal Length: `100`
|no| +|**type**
(Collector type)|`string`|Type of the source collector. Use `cdc` (default) for change data capture using [Debezium](https://debezium.io/). Use `riotx` for Snowflake CDC using [RIOT-X](https://redis.github.io/riotx/).
Default: `"cdc"`
Enum: `"cdc"`, `"riotx"`
|yes| +|**active**
(Collector enabled)|`boolean`|When `true`, the collector runs; when `false`, the collector is disabled and produces no events.
Default: `true`
|no| +|[**logging**](#sourceslogging)
(Logging configuration)|`object`|Logging settings for this source collector.
|no| +|[**tables**](#sourcestables)
(Tables to capture)|`object`|Tables to capture from the source database, keyed by table name. The value configures column selection and key handling for that table.
|no| +|[**schemas**](#sourcesschemas)
(Schema names)|`string[]`|Schema names to capture from the source database. Maps to the underlying connector's `schema.include.list`.
|no| +|[**databases**](#sourcesdatabases)
(Database names)|`string[]`|Database names to capture from the source database. Maps to the underlying connector's `database.include.list`.
|no| +|[**advanced**](#sourcesadvanced)
(Advanced configuration)|`object`|Advanced configuration that overrides the underlying engine's defaults. Only required for non-standard tuning.
|no| - + ### sources\.logging: Logging configuration -Logging configuration for the source collector +Logging settings for this source collector. + **Properties** -| Name | Type | Description | Required | -| ----------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- | -------- | -| **level**
(Logging level) | `string` | Logging level for the source collector
Default: `"info"`
Enum: `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**level**
(Logging level)|`string`|Log verbosity for the source collector.
Default: `"info"`
Enum: `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`
|| **Additional Properties:** not allowed **Example** ```yaml level: info + ``` - ### sources\.tables: Tables to capture -Defines which tables to capture and how to handle their data +Tables to capture from the source database, keyed by table name. The value configures column selection and key handling for that table. + **Additional Properties** -| Name | Type | Description | Required | -| --------------------------------------------------------------- | ---------------- | ----------- | -------- | -| [**Additional Properties**](#sourcestablesadditionalproperties) | `object`, `null` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**Additional Properties**](#sourcestablesadditionalproperties)|`object`, `null`||| **Minimal Properties:** 1 - #### sources\.tables\.additionalProperties: object,null **Properties** -| Name | Type | Description | Required | -| ------------------------------------------------------------------------------------------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -| **snapshot_sql** | `string` | Custom SQL statement to use for the initial data snapshot, allowing fine-grained control over what data is captured
| | -| [**columns**](#sourcestablesadditionalpropertiescolumns)
(Columns to capture) | `string[]` | List of specific columns to capture for changes. If not specified, all columns will be captured. Note: This property cannot be used for MongoDB connections
| | -| [**exclude_columns**](#sourcestablesadditionalpropertiesexclude_columns)
(Columns to exclude) | `string[]` | List of specific columns to exclude from capture. If not specified, no columns will be excluded. Note: This property can only be used for MongoDB connections
| | -| [**keys**](#sourcestablesadditionalpropertieskeys)
(Message keys) | `string[]` | Optional list of columns to use as a composite unique identifier. Only required when the table lacks a primary key or unique constraint. Must form a unique combination of fields
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**snapshot\_sql**|`string`|Custom SQL statement used during the initial snapshot, giving fine-grained control over the data captured.
|| +|[**columns**](#sourcestablesadditionalpropertiescolumns)
(Columns to capture)|`string[]`|Specific columns to capture. When omitted, all columns are captured. Not supported for MongoDB connections.
|| +|[**exclude\_columns**](#sourcestablesadditionalpropertiesexclude_columns)
(Columns to exclude)|`string[]`|Specific columns to exclude from capture. When omitted, no columns are excluded. Only supported for MongoDB connections.
|| +|[**keys**](#sourcestablesadditionalpropertieskeys)
(Message keys)|`string[]`|Columns that together form a unique identifier for each row. Only required when the table lacks a primary key or unique constraint.
|| **Additional Properties:** not allowed - ##### sources\.tables\.additionalProperties\.columns\[\]: Columns to capture -List of specific columns to capture for changes. If not specified, all columns will be captured. Note: This property cannot be used for MongoDB connections +Specific columns to capture. When omitted, all columns are captured. Not supported for MongoDB connections. + +##### sources\.tables\.additionalProperties\.exclude\_columns\[\]: Columns to exclude -##### sources\.tables\.additionalProperties\.exclude_columns\[\]: Columns to exclude +Specific columns to exclude from capture. When omitted, no columns are excluded. Only supported for MongoDB connections. -List of specific columns to exclude from capture. If not specified, no columns will be excluded. Note: This property can only be used for MongoDB connections - ##### sources\.tables\.additionalProperties\.keys\[\]: Message keys -Optional list of columns to use as a composite unique identifier. Only required when the table lacks a primary key or unique constraint. Must form a unique combination of fields +Columns that together form a unique identifier for each row. Only required when the table lacks a primary key or unique constraint. - + ### sources\.schemas\[\]: Schema names -Schema names to capture from the source database (schema.include.list) +Schema names to capture from the source database. Maps to the underlying connector's `schema.include.list`. - + ### sources\.databases\[\]: Database names -Database names to capture from the source database (database.include.list) +Database names to capture from the source database. Maps to the underlying connector's `database.include.list`. - + ### sources\.advanced: Advanced configuration -Advanced configuration options for fine-tuning the collector +Advanced configuration that overrides the underlying engine's defaults. Only required for non-standard tuning. + **Properties** -| Name | Type | Description | Required | -| -------------------------------------------------------------------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -| [**sink**](#sourcesadvancedsink)
(RDI Collector stream writer configuration) | `object` | Advanced configuration properties for RDI Collector stream writer connection and behaviour. When using collector type 'cdc', see the full list of properties at - https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream . When using a property from that list, remove the `debezium.sink.` prefix. When using collector type 'flink', refer to the Flink connector documentation for the full list of supported properties.
| | -| [**source**](#sourcesadvancedsource)
(Advanced source settings) | `object` | Advanced configuration properties for the source database connection and CDC behavior
| | -| [**quarkus**](#sourcesadvancedquarkus)
(Quarkus runtime settings) | `object` | Advanced configuration properties for the Quarkus runtime environment
| | -| [**flink**](#sourcesadvancedflink)
(Advanced Flink settings) | `object` | Advanced configuration properties for Flink
| | -| [**resources**](#sourcesadvancedresources)
(Collector resource settings) | `object` | Resource settings for the collector. When provided, the same values are used consistently across the collector runtime configuration
| | -| [**riotx**](#sourcesadvancedriotx)
(Advanced RIOTX settings) | `object` | Advanced configuration properties for RIOTX Snowflake collector
| | -| **java_options**
(Advanced Java options) | `string` | These Java options will be passed to the command line command when launching the source collector
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**sink**](#sourcesadvancedsink)
(RDI Collector stream writer configuration)|`object`|Advanced configuration properties for the RDI Collector stream writer connection and behaviour. **Only applies to the `cdc` collector type.** See the full list of properties at [Debezium Server — Redis Stream sink](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream). When using a property from that page, omit the `debezium.sink.` prefix.
|| +|[**source**](#sourcesadvancedsource)
(Advanced source settings)|`object`|Advanced configuration properties for the source database connection and CDC behavior. **Only applies to the `cdc` collector type.** Available properties depend on the source database type — refer to the relevant Debezium connector documentation: [MySQL](https://debezium.io/documentation/reference/stable/connectors/mysql.html), [MariaDB](https://debezium.io/documentation/reference/stable/connectors/mariadb.html), [PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html), [Oracle](https://debezium.io/documentation/reference/stable/connectors/oracle.html), [SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html), [Db2](https://debezium.io/documentation/reference/stable/connectors/db2.html), [MongoDB](https://debezium.io/documentation/reference/stable/connectors/mongodb.html). When using a property from those pages, omit the `debezium.source.` prefix.
|| +|[**quarkus**](#sourcesadvancedquarkus)
(Quarkus runtime settings)|`object`|Advanced configuration properties for the Quarkus runtime that hosts Debezium Server. **Only applies to the `cdc` collector type.** See the [Debezium Server documentation](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) for runtime configuration options. When using a property from that page, omit the `quarkus.` prefix.
|| +|[**resources**](#sourcesadvancedresources)
(Collector resource settings)|`object`|Compute resources allocated to the collector. **Only applies to the `cdc` collector type.**
|| +|[**riotx**](#sourcesadvancedriotx)
(Advanced RIOT\-X settings)|`object`|Advanced configuration properties for the RIOT-X Snowflake collector. **Only applies to the `riotx` collector type.**
|| +|**java\_options**
(Advanced Java options)|`string`|These Java options will be passed to the command line command when launching the source collector. **Only applies to the `cdc` collector type.**
|| **Additional Properties:** not allowed **Minimal Properties:** 1 @@ -143,100 +145,87 @@ Advanced configuration options for fine-tuning the collector sink: {} source: {} quarkus: {} -flink: {} resources: {} riotx: poll: 30s snapshot: INITIAL - streamPrefix: "data:" + streamPrefix: 'data:' clearOffset: false count: 0 + ``` - #### sources\.advanced\.sink: RDI Collector stream writer configuration -Advanced configuration properties for RDI Collector stream writer connection and behaviour. When using collector type 'cdc', see the full list of properties at - https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream . When using a property from that list, remove the `debezium.sink.` prefix. When using collector type 'flink', refer to the Flink connector documentation for the full list of supported properties. +Advanced configuration properties for the RDI Collector stream writer connection and behaviour. **Only applies to the `cdc` collector type.** See the full list of properties at [Debezium Server — Redis Stream sink](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream). When using a property from that page, omit the `debezium.sink.` prefix. + **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - #### sources\.advanced\.source: Advanced source settings -Advanced configuration properties for the source database connection and CDC behavior +Advanced configuration properties for the source database connection and CDC behavior. **Only applies to the `cdc` collector type.** Available properties depend on the source database type — refer to the relevant Debezium connector documentation: [MySQL](https://debezium.io/documentation/reference/stable/connectors/mysql.html), [MariaDB](https://debezium.io/documentation/reference/stable/connectors/mariadb.html), [PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html), [Oracle](https://debezium.io/documentation/reference/stable/connectors/oracle.html), [SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html), [Db2](https://debezium.io/documentation/reference/stable/connectors/db2.html), [MongoDB](https://debezium.io/documentation/reference/stable/connectors/mongodb.html). When using a property from those pages, omit the `debezium.source.` prefix. + **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - #### sources\.advanced\.quarkus: Quarkus runtime settings -Advanced configuration properties for the Quarkus runtime environment - -**Additional Properties** +Advanced configuration properties for the Quarkus runtime that hosts Debezium Server. **Only applies to the `cdc` collector type.** See the [Debezium Server documentation](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) for runtime configuration options. When using a property from that page, omit the `quarkus.` prefix. -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | - -**Minimal Properties:** 1 - - -#### sources\.advanced\.flink: Advanced Flink settings - -Advanced configuration properties for Flink **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - #### sources\.advanced\.resources: Collector resource settings -Resource settings for the collector. When provided, the same values are used consistently across the collector runtime configuration +Compute resources allocated to the collector. **Only applies to the `cdc` collector type.** + **Properties** -| Name | Type | Description | Required | -| -------------------------------------- | -------- | -------------------------------------------------------------------- | -------- | -| **cpu**
(CPU resource value) | `string` | CPU value for the collector (for example, '1' or '500m')
| | -| **memory**
(Memory resource value) | `string` | Memory value for the collector (for example, '1024Mi' or '2Gi')
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**cpu**
(CPU resource value)|`string`|CPU request for the collector container, for example `1` or `500m`.
|| +|**memory**
(Memory resource value)|`string`|Memory request for the collector container, for example `1024Mi` or `2Gi`.
|| **Additional Properties:** not allowed **Minimal Properties:** 1 +#### sources\.advanced\.riotx: Advanced RIOT\-X settings -#### sources\.advanced\.riotx: Advanced RIOTX settings +Advanced configuration properties for the RIOT-X Snowflake collector. **Only applies to the `riotx` collector type.** -Advanced configuration properties for RIOTX Snowflake collector **Properties** -| Name | Type | Description | Required | -| ------------------------------------------------------------------- | ---------- | -------------------------------------------------------------------------------------------------- | -------- | -| **poll**
(Polling interval) | `string` | Polling interval for stream changes (e.g., '30s', 'PT30S')
Default: `"30s"`
| | -| **snapshot**
(Snapshot mode) | `string` | Snapshot mode for initial data load
Default: `"INITIAL"`
Enum: `"INITIAL"`, `"NEVER"`
| | -| **streamPrefix**
(Redis stream key prefix) | `string` | Prefix for Redis stream keys
Default: `"data:"`
| | -| **streamLimit**
(Maximum stream length) | `integer` | Maximum number of entries in the Redis stream
Minimum: `1`
| | -| [**keyColumns**](#sourcesadvancedriotxkeycolumns)
(Key columns) | `string[]` | List of columns to use as message keys
| | -| **clearOffset**
(Clear existing offset) | `boolean` | Whether to clear existing offset on start
Default: `false`
| | -| **count**
(Record count limit) | `integer` | Limit number of records to process (0 = unlimited)
Default: `0`
Minimum: `0`
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**poll**
(Polling interval)|`string`|Interval between polls for new stream changes, for example `30s` or `PT30S`.
Default: `"30s"`
|| +|**snapshot**
(Snapshot mode)|`string`|Initial-load behavior. `INITIAL` performs a one-time snapshot before streaming; `NEVER` skips the snapshot.
Default: `"INITIAL"`
Enum: `"INITIAL"`, `"NEVER"`
|| +|**streamPrefix**
(Redis stream key prefix)|`string`|Prefix used when constructing Redis stream keys, for example `data:`.
Default: `"data:"`
|| +|**streamLimit**
(Maximum stream length)|`integer`|Maximum number of entries kept in each Redis stream before older entries are trimmed.
Minimum: `1`
|| +|[**keyColumns**](#sourcesadvancedriotxkeycolumns)
(Key columns)|`string[]`|Columns whose values form the unique message key for each row.
|| +|**clearOffset**
(Clear existing offset)|`boolean`|When `true`, the stored offset is cleared on startup, forcing a fresh read.
Default: `false`
|| +|**count**
(Record count limit)|`integer`|Maximum number of records to process. Set to `0` for unlimited.
Default: `0`
Minimum: `0`
|| **Additional Properties:** not allowed **Minimal Properties:** 1 @@ -245,219 +234,466 @@ Advanced configuration properties for RIOTX Snowflake collector ```yaml poll: 30s snapshot: INITIAL -streamPrefix: "data:" +streamPrefix: 'data:' clearOffset: false count: 0 + ``` - ##### sources\.advanced\.riotx\.keyColumns\[\]: Key columns -List of columns to use as message keys +Columns whose values form the unique message key for each row. - + +## targets: Target connections + +Target Redis databases where processed records are written. Each key is a target identifier; the value configures the connection. + + +**Properties** (key: `.*`) + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**connection**](#targetsconnection)
(Database connection)|`object`|Connection configuration for a Redis database.
|yes| +|**name**
(Target name)|`string`|Human-readable name for the target connection. Maximum 100 characters.
Maximal Length: `100`
|no| + + + +### targets\.connection: Database connection + +Connection configuration for a Redis database. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Database type)||Database type identifier. Always `redis` for this connection.
Constant Value: `"redis"`
|yes| +|**host**
(Database host)|`string`|Hostname or IP address of the Redis server.
|yes| +|**port**
(Database port)||Network port on which the Redis server is listening.
|yes| +|**user**
(Database user)|`string`|Username for authentication to the Redis database.
|no| +|**password**
(Database password)|`string`|Password for authentication to the Redis database.
|no| +|**key**
(Private key file)|`string`|Path to the private key file used for SSL/TLS client authentication.
|no| +|**key\_password**
(Private key password)|`string`|Password used to decrypt the private key file.
|no| +|**cert**
(Client certificate)|`string`|Path to the client certificate file used for SSL/TLS client authentication.
|no| +|**cacert**
(CA certificate)|`string`|Path to the Certificate Authority (CA) certificate file used to verify the server's TLS certificate.
|no| + +**Additional Properties:** not allowed +**Minimal Properties:** 3 +**If property *key* is defined**, property/ies *cert* is/are required. +**If property *cert* is defined**, property/ies *key* is/are required. +**If property *key_password* is defined**, property/ies *key* is/are required. + ## processors: Data processing configuration -Configuration settings that control how data is processed, including batch sizes, error handling, and performance tuning +Settings that control how data is processed, including batch sizes, error handling, and performance tuning. + **Properties** -| Name | Type | Description | Required | -| --------------------------------------------------------------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -| **type**
(Processor type) | `string` | Processor type, either 'classic' or 'flink'
Default: `"classic"`
Enum: `"classic"`
| | -| **on_failed_retry_interval**
(Retry interval on failure) | `integer`, `string` | Number of seconds to wait before retrying a failed operation
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **read_batch_size** | `integer`, `string` | Maximum number of records to process in a single batch
Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **read_batch_timeout_ms**
(Read batch timeout) | `integer` | Maximum time in milliseconds to wait for a batch to fill before processing
Default: `100`
Minimum: `1`
| | -| **enable_async_processing** | `boolean` | Enable async processing to improve throughput
Default: `true`
| | -| **batch_queue_size** | `integer` | Maximum number of batches to queue for processing
Default: `3`
Minimum: `1`
| | -| **ack_queue_size** | `integer` | Maximum number of batches to queue for asynchronous acknowledgement
Default: `10`
Minimum: `1`
| | -| **dedup**
(Enable deduplication) | `boolean` | Enable the deduplication mechanism to handle duplicate records
Default: `false`
| | -| **dedup_max_size**
(Deduplication set size) | `integer` | Maximum number of entries to store in the deduplication set
Default: `1024`
Minimum: `1`
| | -| **dedup_strategy**
(Deduplication strategy) | `string` | (DEPRECATED)
Property 'dedup_strategy' is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
| | -| **duration**
(Batch duration limit) | `integer`, `string` | Maximum time in milliseconds to wait for a batch to fill before processing
Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **write_batch_size** | `integer`, `string` | Maximum number of records to write to target Redis database in a single batch
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **error_handling**
(Error handling strategy) | `string` | Strategy for handling errors: ignore to skip errors, dlq to store rejected messages in dead letter queue
Default: `"dlq"`
Pattern: `^\${.*}$\|ignore\|dlq`
| | -| **dlq_max_messages**
(DLQ message limit) | `integer`, `string` | Maximum number of messages to store in dead letter queue per stream
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **target_data_type**
(Target Redis data type) | `string` | Data type to use in Redis: hash for Redis Hash, json for RedisJSON (requires RedisJSON module)
Default: `"hash"`
Pattern: `^\${.*}$\|hash\|json`
| | -| **json_update_strategy** | `string` | Strategy for updating JSON data in Redis: replace to overwrite the entire JSON object, merge to merge new data with existing JSON object
Default: `"replace"`
Pattern: `^\${.*}$\|replace\|merge`
| | -| **initial_sync_processes** | `integer`, `string` | Number of parallel processes for performing initial data synchronization
Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
| | -| **idle_sleep_time_ms**
(Idle sleep interval) | `integer`, `string` | Time in milliseconds to sleep between processing batches when idle
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| | -| **idle_streams_check_interval_ms**
(Idle streams check interval) | `integer`, `string` | Time in milliseconds between checking for new streams when processor is idle
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| | -| **busy_streams_check_interval_ms**
(Busy streams check interval) | `integer`, `string` | Time in milliseconds between checking for new streams when processor is busy
Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| | -| **wait_enabled**
(Enable replica wait) | `boolean` | Enable verification that data has been written to replica shards of the target database
Default: `false`
| | -| **wait_timeout**
(Replica wait timeout) | `integer`, `string` | Maximum time in milliseconds to wait for replica write verification of the target database
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **retry_max_attempts**
(Maximum retry attempts) | `integer`, `string` | Maximum number of attempts for failed operations
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`
| | -| **retry_initial_delay_ms**
(Initial retry delay) | `integer`, `string` | Initial delay in milliseconds before retrying a failed operation
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| | -| **retry_max_delay_ms**
(Maximum retry delay) | `integer`, `string` | Maximum delay in milliseconds between retry attempts
Default: `10000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| | -| **retry_on_replica_failure** | `boolean` | Continue retrying writes until successful replication to replica shards is confirmed
Default: `true`
| | -| [**logging**](#processorslogging)
(Logging configuration) | `object` | Logging configuration for the processor
| | -| **use_native_json_merge**
(Use native JSON merge from RedisJSON module) | `boolean` | Controls whether to use the native `JSON.MERGE` command (when `true`) or Lua scripts (when `false`) for JSON merge operations. Introduced in RDI 1.15.0. The native command provides 2x performance improvement but handles null values differently:

**Previous behavior (Lua merge)**: When merging `{"field1": "value1", "field2": "value2"}` with `{"field2": null, "field3": "value3"}`, the result was `{"field1": "value1", "field2": null, "field3": "value3"}` (null value is preserved)

**New behavior (JSON.MERGE)**: The same merge produces `{"field1": "value1", "field3": "value3"}` (null value removes the field, following [RFC 7396](https://datatracker.ietf.org/doc/html/rfc7396))

**Note**: The native `JSON.MERGE` command requires RedisJSON 2.6.0 or higher. If the target database has an older version of RedisJSON, RDI will automatically fall back to using Lua-based merge operations regardless of this setting.

**Impact**: If your application logic distinguishes between a field with a `null` value and a missing field, you may need to adjust your data handling. This follows the JSON Merge Patch RFC standard but differs from the previous Lua implementation. Set to `false` to revert to the previous Lua-based merge behavior if needed.
Default: `true`
| | -| [**advanced**](#processorsadvanced)
(Advanced configuration) | `object` | Advanced configuration options for fine-tuning the processor
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Processor type)|`string`|Processor implementation to run. `classic` runs the classic processor; `flink` runs the Apache Flink-based processor (Kubernetes deployments only).
Default: `"classic"`
Enum: `"classic"`, `"flink"`
|| +|**read\_batch\_size**|`integer`, `string`|Maximum number of records read from the source streams in a single batch.
Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**read\_batch\_timeout\_ms**
(Read batch timeout)|`integer`|Maximum time in milliseconds to wait for a batch to fill before processing it.
Default: `100`
Minimum: `1`
|| +|**duration**
(Batch duration limit)|`integer`, `string`|(DEPRECATED)
This property has no effect; use `read_batch_timeout_ms` instead.
Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**write\_batch\_size**|`integer`, `string`|Maximum number of records written to the target Redis database in a single batch.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**enable\_async\_processing**|`boolean`|When `true`, the processor handles batches asynchronously to improve throughput. **Classic processor only.**
Default: `true`
|| +|**batch\_queue\_size**|`integer`|Maximum number of batches queued for processing. **Classic processor only.**
Default: `3`
Minimum: `1`
|| +|**ack\_queue\_size**|`integer`|Maximum number of batches queued for asynchronous acknowledgement. **Classic processor only.**
Default: `10`
Minimum: `1`
|| +|**dedup**
(Enable deduplication)|`boolean`|When `true`, the processor deduplicates incoming records. **Classic processor only.**
Default: `false`
|| +|**dedup\_max\_size**
(Deduplication set size)|`integer`|Maximum number of entries kept in the deduplication set. **Classic processor only.**
Default: `1024`
Minimum: `1`
|| +|**dedup\_strategy**
(Deduplication strategy)|`string`|(DEPRECATED)
This property has no effect — the only supported strategy is `ignore`. Remove it from the configuration. **Classic processor only.**
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
|| +|**error\_handling**
(Error handling strategy)|`string`|Strategy for handling failed records. `ignore` silently drops them; `dlq` writes them to the dead-letter queue.
Default: `"dlq"`
Pattern: `^\${.*}$\|ignore\|dlq`
|| +|**dlq\_max\_messages**
(DLQ message limit)|`integer`, `string`|Maximum number of messages stored per dead-letter queue stream.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**target\_data\_type**
(Target Redis data type)|`string`|Data type used to store target records in Redis. `hash` writes a Redis Hash; `json` writes a RedisJSON document and requires the RedisJSON module.
Default: `"hash"`
Pattern: `^\${.*}$\|hash\|json`
|| +|**json\_update\_strategy**|`string`|Strategy for updating existing JSON documents in Redis. `replace` overwrites the entire document; `merge` merges incoming fields into it.
Default: `"replace"`
Pattern: `^\${.*}$\|replace\|merge`
|| +|**use\_native\_json\_merge**
(Use native JSON merge from RedisJSON module)|`boolean`|Controls whether JSON merge operations use the native `JSON.MERGE` command (when `true`) or Lua scripts (when `false`). Introduced in RDI 1.15.0. The native command provides 2x performance improvement but handles null values differently:

**Previous behavior (Lua merge)**: When merging `{"field1": "value1", "field2": "value2"}` with `{"field2": null, "field3": "value3"}`, the result was `{"field1": "value1", "field2": null, "field3": "value3"}` (null value is preserved).

**New behavior (JSON.MERGE)**: The same merge produces `{"field1": "value1", "field3": "value3"}` (null value removes the field, following [RFC 7396](https://datatracker.ietf.org/doc/html/rfc7396)).

**Note**: The native `JSON.MERGE` command requires RedisJSON 2.6.0 or higher. If the target database has an older version of RedisJSON, RDI automatically falls back to Lua-based merge operations regardless of this setting.

**Impact**: If your application logic distinguishes between a field with a `null` value and a missing field, you may need to adjust your data handling. This follows the JSON Merge Patch RFC standard but differs from the previous Lua implementation. Set to `false` to revert to the previous Lua-based merge behavior if needed.

The Flink processor always uses the native `JSON.MERGE` command when the target database supports it. **Classic processor only.**
Default: `true`
|| +|**initial\_sync\_processes**|`integer`, `string`|Number of parallel processes used to perform the initial data synchronization. For the Flink processor, parallelism is controlled by Flink properties instead. **Classic processor only.**
Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
|| +|**idle\_sleep\_time\_ms**
(Idle sleep interval)|`integer`, `string`|Time in milliseconds to sleep between processing batches when idle. **Classic processor only.**
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**idle\_streams\_check\_interval\_ms**
(Idle streams check interval)|`integer`, `string`|Time in milliseconds between checks for new streams when the processor is idle. For the Flink processor, use `processors.advanced.source.discovery.interval.ms` instead to configure a single discovery interval regardless of load. **Classic processor only.**
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**busy\_streams\_check\_interval\_ms**
(Busy streams check interval)|`integer`, `string`|Time in milliseconds between checks for new streams when the processor is busy. For the Flink processor, use `processors.advanced.source.discovery.interval.ms` instead to configure a single discovery interval regardless of load. **Classic processor only.**
Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**retry\_max\_attempts**
(Maximum retry attempts)|`integer`, `string`|Maximum number of attempts for a failed write to the target Redis database before giving up.
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**retry\_initial\_delay\_ms**
(Initial retry delay)|`integer`, `string`|Initial delay in milliseconds before the first retry of a failed write.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**retry\_max\_delay\_ms**
(Maximum retry delay)|`integer`, `string`|Maximum delay in milliseconds between retry attempts.
Default: `10000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**wait\_enabled**
(Enable replica wait)|`boolean`|When `true`, RDI verifies that each write has been replicated to the target database's replica shards before acknowledging it.
Default: `false`
|| +|**wait\_timeout**
(Replica wait timeout)|`integer`, `string`|Maximum time in milliseconds to wait for replica write verification on the target database.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**retry\_on\_replica\_failure**|`boolean`|When `true`, RDI keeps retrying a write until replica replication is confirmed; when `false`, it gives up after the first failure.
Default: `true`
|| +|**on\_failed\_retry\_interval**
(Retry interval on failure)|`integer`, `string`|(DEPRECATED)
This property has no effect; remove it from the configuration.
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|[**logging**](#processorslogging)
(Logging configuration)|`object`|Logging settings for the processor. **Flink processor only.**
|| +|[**advanced**](#processorsadvanced)
(Advanced configuration)|`object`|Advanced configuration for fine-tuning the processor. **All properties under `advanced` apply to the Flink processor only and are silently ignored by the classic processor.**
|| **Additional Properties:** not allowed - ### processors\.logging: Logging configuration -Logging configuration for the processor +Logging settings for the processor. **Flink processor only.** + **Properties** -| Name | Type | Description | Required | -| ----------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ | -------- | -| **level**
(Logging level) | `string` | Logging level for the processor
Default: `"info"`
Enum: `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**level**
(Logging level)|`string`|Log verbosity for the processor.
Default: `"info"`
Enum: `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`
|| **Additional Properties:** not allowed **Example** ```yaml level: info + ``` - ### processors\.advanced: Advanced configuration -Advanced configuration options for fine-tuning the processor +Advanced configuration for fine-tuning the processor. **All properties under `advanced` apply to the Flink processor only and are silently ignored by the classic processor.** + **Properties** -| Name | Type | Description | Required | -| ------------------------------------------------------------------------------- | -------- | ------------------------------------------------------------------------------------------------------- | -------- | -| [**source**](#processorsadvancedsource)
(Advanced source settings) | `object` | Advanced configuration properties for the source Redis client, connection pool, and streams reader
| | -| [**sink**](#processorsadvancedsink)
(Advanced sink settings) | `object` | Advanced configuration properties for the sink
| | -| [**target**](#processorsadvancedtarget)
(Advanced target settings) | `object` | Advanced configuration properties for the target Redis client, connection pool, and sink
| | -| [**dlq**](#processorsadvanceddlq)
(Advanced DLQ settings) | `object` | Advanced configuration properties for the DLQ Redis client, connection pool, and sink
| | -| [**processor**](#processorsadvancedprocessor)
(Advanced processor settings) | `object` | Advanced configuration properties for the processor
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**source**](#processorsadvancedsource)
(Advanced source settings)|`object`|Advanced configuration properties for the source Redis client and streams reader. **Flink processor only.**
|| +|[**target**](#processorsadvancedtarget)
(Advanced target settings)|`object`|Advanced configuration properties for the target Redis client and sink. **Flink processor only.**
|| +|[**dlq**](#processorsadvanceddlq)
(Advanced DLQ settings)|`object`|Advanced configuration properties for the DLQ Redis client and sink. **Flink processor only.**
|| +|[**processor**](#processorsadvancedprocessor)
(Advanced processor settings)|`object`|Advanced configuration properties for the processor. **Flink processor only.**
|| +|[**flink**](#processorsadvancedflink)
(Advanced Flink settings)|`object`|Advanced configuration properties forwarded to the underlying Flink runtime. Any property listed in the [Flink configuration documentation](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/config/) can be set here and will override the RDI default. **Flink processor only.**
|| +|[**resources**](#processorsadvancedresources)
(Advanced resource settings)|`object`|Compute resources allocated to the Flink job, such as the number of task manager pods. **Flink processor only.**
|| **Additional Properties:** not allowed **Minimal Properties:** 1 **Example** ```yaml -source: {} -sink: {} -target: {} -dlq: {} -processor: {} +source: + stream.name.pattern: data:* + discovery.interval.ms: 1000 + batch.size: 2000 + batch.timeout.ms: 100 + connection.timeout.ms: 2000 + socket.timeout.ms: 2000 + retry.max.attempts: 5 + retry.initial.delay.ms: 100 + retry.max.delay.ms: 3000 + retry.backoff.multiplier: 2 +target: + batch.size: 200 + flush.interval.ms: 100 + connection.timeout.ms: 2000 + socket.timeout.ms: 2000 + retry.max.attempts: 5 + retry.initial.delay.ms: 1000 + retry.max.delay.ms: 10000 + retry.backoff.multiplier: 2 + wait.enabled: false + wait.write.timeout.ms: 1000 + wait.retry.enabled: true + wait.retry.delay.ms: 1000 +dlq: + max.len: 1000 + batch.size: 100 + flush.interval.ms: 100 + connection.timeout.ms: 2000 + socket.timeout.ms: 2000 + retry.max.attempts: 1 + retry.initial.delay.ms: 100 + retry.max.delay.ms: 3000 + retry.backoff.multiplier: 2 + wait.enabled: false + wait.write.timeout.ms: 1000 + wait.retry.enabled: false + wait.retry.delay.ms: 1000 +processor: + default.data.type: hash + default.json.update.strategy: replace + dlq.enabled: true +flink: + taskmanager.numberOfTaskSlots: 1 + taskmanager.memory.process.size: 2048m +resources: + taskManager: {} + ``` - #### processors\.advanced\.source: Advanced source settings -Advanced configuration properties for the source Redis client, connection pool, and streams reader +Advanced configuration properties for the source Redis client and streams reader. **Flink processor only.** + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**stream\.name\.pattern**
(Source stream name pattern)|`string`|Glob pattern used to discover input streams in the source Redis database, for example `data:*`.
Default: `"data:*"`
|| +|**discovery\.interval\.ms**
(Stream discovery interval)|`integer`|Time in milliseconds between checks for new input streams. Replaces the classic `processors.idle_streams_check_interval_ms` and `processors.busy_streams_check_interval_ms` properties.
Default: `1000`
Minimum: `0`
|| +|**batch\.size**
(Source batch size)|`integer`|Maximum number of records the source operator reads in a single batch. Alias for `processors.read_batch_size`; takes priority when both are set.
Default: `2000`
Minimum: `1`
|| +|**batch\.timeout\.ms**
(Source batch timeout)|`integer`|Maximum time in milliseconds to wait for a source batch to fill before processing. Alias for `processors.read_batch_timeout_ms`; takes priority when both are set.
Default: `100`
Minimum: `1`
|| +|**connection\.timeout\.ms**
(Source connection timeout)|`integer`|Connection timeout in milliseconds for the source Redis client.
Default: `2000`
Minimum: `1`
|| +|**socket\.timeout\.ms**
(Source socket timeout)|`integer`|Socket read/write timeout in milliseconds for the source Redis client.
Default: `2000`
Minimum: `1`
|| +|**retry\.max\.attempts**
(Source retry max attempts)|`integer`|Maximum number of retry attempts for failed source Redis operations.
Default: `5`
Minimum: `1`
|| +|**retry\.initial\.delay\.ms**
(Source retry initial delay)|`integer`|Initial delay in milliseconds before the first retry of a failed source Redis operation.
Default: `100`
Minimum: `1`
|| +|**retry\.max\.delay\.ms**
(Source retry max delay)|`integer`|Maximum delay in milliseconds between retry attempts for source Redis operations.
Default: `3000`
Minimum: `1`
|| +|**retry\.backoff\.multiplier**
(Source retry backoff multiplier)|`number`|Exponential backoff multiplier between retry attempts for source Redis operations.
Default: `2`
Minimum: `1`
|| **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - +**Example** -#### processors\.advanced\.sink: Advanced sink settings +```yaml +stream.name.pattern: data:* +discovery.interval.ms: 1000 +batch.size: 2000 +batch.timeout.ms: 100 +connection.timeout.ms: 2000 +socket.timeout.ms: 2000 +retry.max.attempts: 5 +retry.initial.delay.ms: 100 +retry.max.delay.ms: 3000 +retry.backoff.multiplier: 2 -Advanced configuration properties for the sink +``` -**Additional Properties** + +#### processors\.advanced\.target: Advanced target settings -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +Advanced configuration properties for the target Redis client and sink. **Flink processor only.** -**Minimal Properties:** 1 - -#### processors\.advanced\.target: Advanced target settings +**Properties** -Advanced configuration properties for the target Redis client, connection pool, and sink +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**batch\.size**
(Target sink batch size)|`integer`|Maximum number of records the target sink writes in a single batch. Alias for `processors.write_batch_size`; takes priority when both are set.
Default: `200`
Minimum: `1`
|| +|**flush\.interval\.ms**
(Target sink flush interval)|`integer`|Maximum time in milliseconds the target sink waits to fill a batch before flushing it to Redis.
Default: `100`
Minimum: `1`
|| +|**connection\.timeout\.ms**
(Target connection timeout)|`integer`|Connection timeout in milliseconds for the target Redis client.
Default: `2000`
Minimum: `1`
|| +|**socket\.timeout\.ms**
(Target socket timeout)|`integer`|Socket read/write timeout in milliseconds for the target Redis client.
Default: `2000`
Minimum: `1`
|| +|**retry\.max\.attempts**
(Target retry max attempts)|`integer`|Maximum number of retry attempts for failed target Redis operations. Alias for `processors.retry_max_attempts`; takes priority when both are set.
Default: `5`
Minimum: `1`
|| +|**retry\.initial\.delay\.ms**
(Target retry initial delay)|`integer`|Initial delay in milliseconds before the first retry of a failed target Redis operation. Alias for `processors.retry_initial_delay_ms`; takes priority when both are set.
Default: `1000`
Minimum: `1`
|| +|**retry\.max\.delay\.ms**
(Target retry max delay)|`integer`|Maximum delay in milliseconds between retry attempts for target Redis operations. Alias for `processors.retry_max_delay_ms`; takes priority when both are set.
Default: `10000`
Minimum: `1`
|| +|**retry\.backoff\.multiplier**
(Target retry backoff multiplier)|`number`|Exponential backoff multiplier between retry attempts for target Redis operations.
Default: `2`
Minimum: `1`
|| +|**wait\.enabled**
(Target replica wait enabled)|`boolean`|When `true`, RDI verifies that each write has been replicated to the target database's replica shards before acknowledging it. Alias for `processors.wait_enabled`; takes priority when both are set.
Default: `false`
|| +|**wait\.write\.timeout\.ms**
(Target replica wait timeout)|`integer`|Maximum time in milliseconds to wait for target replica write verification. Alias for `processors.wait_timeout`; takes priority when both are set.
Default: `1000`
Minimum: `1`
|| +|**wait\.retry\.enabled**
(Target replica wait retry enabled)|`boolean`|When `true`, RDI keeps retrying a target write until replica replication is confirmed; when `false`, it gives up after the first failure. Alias for `processors.retry_on_replica_failure`; takes priority when both are set. When enabled, the Flink processor retries indefinitely until the checkpoint timeout, unlike the classic processor which retries once.
Default: `true`
|| +|**wait\.retry\.delay\.ms**
(Target replica wait retry delay)|`integer`|Delay in milliseconds between target replica wait retry attempts.
Default: `1000`
Minimum: `1`
|| **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - +**Example** + +```yaml +batch.size: 200 +flush.interval.ms: 100 +connection.timeout.ms: 2000 +socket.timeout.ms: 2000 +retry.max.attempts: 5 +retry.initial.delay.ms: 1000 +retry.max.delay.ms: 10000 +retry.backoff.multiplier: 2 +wait.enabled: false +wait.write.timeout.ms: 1000 +wait.retry.enabled: true +wait.retry.delay.ms: 1000 +``` + + #### processors\.advanced\.dlq: Advanced DLQ settings -Advanced configuration properties for the DLQ Redis client, connection pool, and sink +Advanced configuration properties for the DLQ Redis client and sink. **Flink processor only.** + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**max\.len**
(DLQ sink max length)|`integer`|Maximum number of messages stored per dead letter queue stream. Alias for `processors.dlq_max_messages`; takes priority when both are set.
Default: `1000`
Minimum: `1`
|| +|**batch\.size**
(DLQ sink batch size)|`integer`|Maximum number of records the DLQ sink writes in a single batch.
Default: `100`
Minimum: `1`
|| +|**flush\.interval\.ms**
(DLQ sink flush interval)|`integer`|Maximum time in milliseconds the DLQ sink waits to fill a batch before flushing it to Redis.
Default: `100`
Minimum: `1`
|| +|**connection\.timeout\.ms**
(DLQ connection timeout)|`integer`|Connection timeout in milliseconds for the DLQ Redis client.
Default: `2000`
Minimum: `1`
|| +|**socket\.timeout\.ms**
(DLQ socket timeout)|`integer`|Socket read/write timeout in milliseconds for the DLQ Redis client.
Default: `2000`
Minimum: `1`
|| +|**retry\.max\.attempts**
(DLQ retry max attempts)|`integer`|Maximum number of retry attempts for failed DLQ Redis operations.
Default: `1`
Minimum: `1`
|| +|**retry\.initial\.delay\.ms**
(DLQ retry initial delay)|`integer`|Initial delay in milliseconds before the first retry of a failed DLQ Redis operation.
Default: `100`
Minimum: `1`
|| +|**retry\.max\.delay\.ms**
(DLQ retry max delay)|`integer`|Maximum delay in milliseconds between retry attempts for DLQ Redis operations.
Default: `3000`
Minimum: `1`
|| +|**retry\.backoff\.multiplier**
(DLQ retry backoff multiplier)|`number`|Exponential backoff multiplier between retry attempts for DLQ Redis operations.
Default: `2`
Minimum: `1`
|| +|**wait\.enabled**
(DLQ replica wait enabled)|`boolean`|When `true`, RDI verifies that each DLQ write has been replicated to the DLQ database's replica shards before acknowledging it.
Default: `false`
|| +|**wait\.write\.timeout\.ms**
(DLQ replica wait timeout)|`integer`|Maximum time in milliseconds to wait for DLQ replica write verification.
Default: `1000`
Minimum: `1`
|| +|**wait\.retry\.enabled**
(DLQ replica wait retry enabled)|`boolean`|When `true`, RDI keeps retrying a DLQ write until replica replication is confirmed; when `false`, it gives up after the first failure.
Default: `false`
|| +|**wait\.retry\.delay\.ms**
(DLQ replica wait retry delay)|`integer`|Delay in milliseconds between DLQ replica wait retry attempts.
Default: `1000`
Minimum: `1`
|| **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - +**Example** + +```yaml +max.len: 1000 +batch.size: 100 +flush.interval.ms: 100 +connection.timeout.ms: 2000 +socket.timeout.ms: 2000 +retry.max.attempts: 1 +retry.initial.delay.ms: 100 +retry.max.delay.ms: 3000 +retry.backoff.multiplier: 2 +wait.enabled: false +wait.write.timeout.ms: 1000 +wait.retry.enabled: false +wait.retry.delay.ms: 1000 + +``` + #### processors\.advanced\.processor: Advanced processor settings -Advanced configuration properties for the processor +Advanced configuration properties for the processor. **Flink processor only.** + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**default\.data\.type**
(Default target data type)|`string`|Data type to use in Redis when not overridden per job: `hash` for Redis Hash, `json` for RedisJSON. Alias for `processors.target_data_type`; takes priority when both are set.
Default: `"hash"`
Enum: `"hash"`, `"json"`
|| +|**default\.json\.update\.strategy**
(Default JSON update strategy)|`string`|Strategy for updating JSON data in Redis: `replace` to overwrite the entire JSON object, `merge` to merge new data with the existing JSON object. Alias for `processors.json_update_strategy`; takes priority when both are set.
Default: `"replace"`
Enum: `"replace"`, `"merge"`
|| +|**dlq\.enabled**
(Enable DLQ)|`boolean`|When `true`, rejected messages are stored in the dead-letter queue; when `false`, errors are silently skipped. Alias for `processors.error_handling`; takes priority when both are set.
Default: `true`
|| +|**log\.level**
(Processor log level)|`string`|Log level for the processor. Takes priority over `processors.logging.level` when both are set.
Enum: `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`
|| **Additional Properties** -| Name | Type | Description | Required | -| ------------------------- | ----------------------------- | ----------- | -------- | -| **Additional Properties** | `string`, `number`, `boolean` | | | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 - +**Example** -## targets: Target connections +```yaml +default.data.type: hash +default.json.update.strategy: replace +dlq.enabled: true -Configuration for target Redis databases where processed data will be written +``` -**Properties (Pattern)** + +#### processors\.advanced\.flink: Advanced Flink settings -| Name | Type | Description | Required | -| -------- | ---- | ----------- | -------- | -| **\.\*** | | | | +Advanced configuration properties forwarded to the underlying Flink runtime. Any property listed in the [Flink configuration documentation](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/config/) can be set here and will override the RDI default. **Flink processor only.**

The properties listed below are the ones most likely to require adjustment. **Changing any other Flink property is not recommended unless instructed by Redis support.** - +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**parallelism\.default**
(Default parallelism)|`integer`|Default parallelism for jobs and operators. When unset, Flink uses the number of available task slots across all task managers (`taskManager.replicas × taskmanager.numberOfTaskSlots`). Increase to fan out work across more task slots; see [parallel execution](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/dev/datastream/execution/parallel/).
Minimum: `1`
|| +|**taskmanager\.numberOfTaskSlots**
(Task slots per task manager)|`integer`|Number of parallel task slots per task manager pod. Each slot can run one parallel pipeline instance, so this caps the parallelism a single task manager can absorb. See [task slots and resources](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/concepts/flink-architecture/#task-slots-and-resources).
Default: `1`
Minimum: `1`
|| +|**taskmanager\.memory\.process\.size**
(Task manager process memory)|`string`|Total memory budget for each task manager JVM process (heap + managed + network + metaspace + JVM overhead), expressed with a unit suffix such as `2048m` or `4g`. See [task manager memory configuration](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/memory/mem_setup_tm/).
Default: `"2048m"`
|| + +**Additional Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| + +**Minimal Properties:** 1 +**Example** + +```yaml +taskmanager.numberOfTaskSlots: 1 +taskmanager.memory.process.size: 2048m + +``` + + +#### processors\.advanced\.resources: Advanced resource settings + +Compute resources allocated to the Flink job, such as the number of task manager pods. **Flink processor only.** + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**taskManager**](#processorsadvancedresourcestaskmanager)
(Task manager resource settings)|`object`|Resource settings for Flink task manager pods.
|| + +**Additional Properties:** not allowed +**Minimal Properties:** 1 +**Example** + +```yaml +taskManager: {} + +``` + + +##### processors\.advanced\.resources\.taskManager: Task manager resource settings + +Resource settings for Flink task manager pods. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**replicas**
(Task manager replicas)|`integer`|Number of Flink task manager pods to run.
Minimum: `1`
|| + +**Additional Properties:** not allowed +**Minimal Properties:** 1 + ## secret\-providers: Secret providers -Configuration for secret management providers +External secret providers used to resolve `${...}` references in the configuration. + **Properties** (key: `.*`) -| Name | Type | Description | Required | -| ----------------------------------------------------------------------- | -------- | ----------------------------------------------------------------- | -------- | -| **type**
(Provider type) | `string` | Type of secret provider service
Enum: `"aws"`, `"vault"`
| yes | -| [**parameters**](#secret-providersparameters)
(Provider parameters) | `object` | Configuration parameters for the secret provider
| yes | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Provider type)|`string`|Secret provider backend. `aws` uses AWS Secrets Manager; `vault` uses HashiCorp Vault.
Enum: `"aws"`, `"vault"`
|yes| +|[**parameters**](#secret-providersparameters)
(Provider parameters)|`object`|Configuration parameters for the secret provider.
|yes| - + ### secret\-providers\.parameters: Provider parameters -Configuration parameters for the secret provider +Configuration parameters for the secret provider. + **Properties** -| Name | Type | Description | Required | -| ----------------------------------------------------------------------------- | ---------- | ------------------------------------------------------ | -------- | -| [**objects**](#secret-providersparametersobjects)
(Secrets objects array) | `object[]` | List of secret objects to fetch from the provider
| yes | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**objects**](#secret-providersparametersobjects)
(Secrets objects array)|`object[]`|Secret objects to fetch from the provider.
|yes| **Example** ```yaml objects: - {} + ``` - #### secret\-providers\.parameters\.objects\[\]: Secrets objects array -List of secret objects to fetch from the provider +Secret objects to fetch from the provider. + **Items: Secret object** @@ -467,19 +703,21 @@ List of secret objects to fetch from the provider ```yaml - {} + ``` - ## metadata: Pipeline metadata -Pipeline metadata +Optional metadata describing this pipeline, such as a display name and description. + **Properties** -| Name | Type | Description | Required | -| ------------------------------------------ | -------- | --------------------------------------------------- | -------- | -| **name**
(Pipeline name) | `string` | Pipeline name
Maximal Length: `100`
| | -| **description**
(Pipeline description) | `string` | Pipeline description
Maximal Length: `500`
| | +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**name**
(Pipeline name)|`string`|Human-readable name for the pipeline. Maximum 100 characters.
Maximal Length: `100`
|| +|**description**
(Pipeline description)|`string`|Free-form description of what the pipeline does. Maximum 500 characters.
Maximal Length: `500`
|| + +**Additional Properties:** not allowed -**Additional Properties:** not allowed From 3b3e403f9e43d346dd4f752380ea8f1925a2073a Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Tue, 5 May 2026 17:47:45 +0300 Subject: [PATCH 02/13] Add / update remaining docs pages --- .../redis-data-integration/architecture.md | 27 ++++ .../data-pipelines/pipeline-config.md | 90 +++++++---- .../transform-examples/redis-set-example.md | 5 +- .../redis-sorted-set-example.md | 3 + .../redis-stream-example.md | 7 +- .../redis-string-example.md | 3 + .../installation/install-k8s.md | 42 ++++++ .../migration-classic-to-flink.md | 141 ++++++++++++++++++ .../installation/upgrade.md | 30 +++- .../redis-data-integration/observability.md | 88 ++++++++++- 10 files changed, 391 insertions(+), 45 deletions(-) create mode 100644 content/integrate/redis-data-integration/installation/migration-classic-to-flink.md diff --git a/content/integrate/redis-data-integration/architecture.md b/content/integrate/redis-data-integration/architecture.md index 4b7d76225f..459e97ba57 100644 --- a/content/integrate/redis-data-integration/architecture.md +++ b/content/integrate/redis-data-integration/architecture.md @@ -142,6 +142,33 @@ The diagram below shows all RDI components and the interactions between them: {{< image filename="images/rdi/ingest/ingest-control-plane.webp" >}} +## Stream processor implementations + +RDI provides two implementations of the stream processor. You select the +implementation per pipeline through the +[`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}) +property in `config.yaml`. The default is `classic`, so existing pipelines +keep their behavior unchanged. + +- The **classic** processor is implemented in Python. It is the original RDI + stream processor, supports both VM and Kubernetes deployments, and writes + to all Redis target data types (`hash`, `json`, `set`, `sorted_set`, + `stream`, `string`). + +- The **Flink** processor is implemented on top of + [Apache Flink](https://flink.apache.org/) and currently runs on Kubernetes only. + It can achieve much higher throughput during snapshots, scales horizontally + by changing the number of TaskManager replicas, uses Flink checkpointing for fault tolerance, + and exposes Prometheus metrics directly from its JobManager and TaskManager pods + (the `rdi-metrics-exporter` is not deployed for Flink-based pipelines). + The Flink processor currently supports only `hash` and `json` target data types. + +See +[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) +for guidance on migrating an existing pipeline to the Flink processor. + +## VM and Kubernetes deployments + The following sections describe the VM configurations you can use to deploy RDI. diff --git a/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md b/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md index fc7b4e8a2d..6833f32f18 100644 --- a/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md +++ b/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md @@ -141,41 +141,48 @@ targets: # cert: ${TARGET_DB_CERT} # cacert: ${TARGET_DB_CACERT} processors: - # Interval (in seconds) on which to perform retry on failure. - # on_failed_retry_interval: 5 - # The batch size for reading data from the source database. + # # Processor type: classic or flink (default: classic) + # type: classic + # # The batch size for reading data from source database # read_batch_size: 2000 - # Time (in ms) after which data will be read from stream even if - # read_batch_size was not reached. - # duration: 100 - # The batch size for writing data to the target Redis database. Should be - # less than or equal to the read_batch_size. + # # Time (in ms) after which data will be read from stream even if read_batch_size was not reached + # read_batch_timeout_ms: 100 + # # The batch size for writing data to target Redis database. Should be less or equal to the read_batch_size # write_batch_size: 200 - # Enable deduplication mechanism (default: false). + # # Enable async processing to improve throughput and reduce latency (default: true) + # enable_async_processing: true + # # Maximum number of batches to queue for processing (default: 3) + # batch_queue_size: 3 + # # Maximum number of batches to queue for asynchronous acknowledgement (default: 10) + # ack_queue_size: 10 + # # Enable deduplication mechanism (default: false) # dedup: - # Max size of the deduplication set (default: 1024). + # # Max size of the deduplication set (default: 1024) # dedup_max_size: - # Error handling strategy: ignore - skip, dlq - store rejected messages - # in a dead letter queue. + # # Error handling strategy: ignore - skip, dlq - store rejected messages in a dead letter queue # error_handling: dlq - # Dead letter queue max messages per stream. + # # Dead letter queue max messages per stream # dlq_max_messages: 1000 - # Data type to use in Redis target database: `hash` for Redis Hash, - # `json` for JSON (which requires the RedisJSON module). + # # Target data type: hash/json - RedisJSON module must be in use in the target DB # target_data_type: hash - # Number of processes to use when syncing initial data. + # # Enable merge as the default strategy to writing JSON documents + # json_update_strategy: merge + # # Use native JSON merge if the target RedisJSON module supports it + # use_native_json_merge: true + # # Number of processes to use when syncing initial data # initial_sync_processes: 4 - # Checks if the batch has been written to the replica shard. + # # Time in milliseconds to sleep between processing batches when idle (default: 200) + # idle_sleep_time_ms: 200 + # # Time in milliseconds between checking for new streams when processor is idle (default: 1000) + # idle_streams_check_interval_ms: 1000 + # # Time in milliseconds between checking for new streams when processor is busy (default: 5000) + # busy_streams_check_interval_ms: 5000 + # # Checks if the batch has been written to the replica shard # wait_enabled: false - # Timeout in milliseconds when checking write to the replica shard. + # # Timeout in milliseconds when checking write to the replica shard # wait_timeout: 1000 - # Ensures that a batch has been written to the replica shard and keeps - # retrying if not. + # # Ensures that a batch has been written to the replica shard and keeps retrying if not # retry_on_replica_failure: true - # Enable merge as the default strategy to writing JSON documents. - # json_update_strategy: merge - # Use native JSON merge if the target RedisJSON module supports it. - # use_native_json_merge: true ``` ## Sections @@ -266,30 +273,47 @@ sudo service k3s restart The `processors` section configures the behavior of the pipeline. The [example](#example) configuration above contains the following properties: -- `on_failed_retry_interval`: Number of seconds to wait before retrying a failed operation. - The default is 5 seconds. +- `type`: Stream processor implementation to run for this pipeline. + The options are `classic` (the default) for the Python-based classic processor + and `flink` for the + [Apache Flink](https://flink.apache.org/)-based processor. + The Flink processor runs on Kubernetes only and supports the `hash` and `json` + target data types. See + [Stream processor implementations]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}) + for an overview, and + [Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) + for guidance on migrating an existing pipeline to the Flink processor. - `read_batch_size`: Maximum number of records to read from the source database. RDI will - wait for the batch to fill up to `read_batch_size` or for `duration` to elapse, + wait for the batch to fill up to `read_batch_size` or for `read_batch_timeout_ms` to elapse, whichever happens first. The default is 2000. +- `read_batch_timeout_ms`: Time (in ms) after which data will be read from the stream even if + `read_batch_size` was not reached. The default is 100 ms. +- `write_batch_size`: The batch size for writing data to the target Redis database. This should be + less than or equal to the `read_batch_size`. The default is 200. - `target_data_type`: Data type to use in the target Redis database. The options are `hash` for Redis Hash (the default), or `json` for RedisJSON, which is available only if you have added the RedisJSON module to the target database. Note that this setting is mainly useful when you don't provide any custom jobs. When you do provide jobs, you can specify the target data type in each job individually and choose from a wider range of data types. See [Job files]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples" >}}) - (which requires the RedisJSON module) for more information. -- `duration`: Time (in ms) after which data will be read from the stream even if - `read_batch_size` was not reached. The default is 100 ms. -- `write_batch_size`: The batch size for writing data to the target Redis database. This should be - less than or equal to the `read_batch_size`. The default is 200. + (which requires the RedisJSON module) for more information. - `dedup`: Boolean value to enable the deduplication mechanism. The default is `false`. + **Classic processor only.** - `dedup_max_size`: Maximum size of the deduplication set. The default is 1024. + **Classic processor only.** - `error_handling`: The strategy to use when an invalid record is encountered. The available - strategies are `ignore` and `dlq` (store rejected messages in a dead letter queue). + strategies are `ignore` and `dlq` (store rejected messages in a dead letter queue). The default is `dlq`. See [What does RDI do if the data is corrupted or invalid?]({{< relref "/integrate/redis-data-integration/faq#what-does-rdi-do-if-the-data-is-corrupted-or-invalid" >}}) for more information about the dead letter queue. +{{< note >}}When `type` is set to `flink`, fine-tuning of the processor and the +underlying Flink runtime is configured through the `processors.advanced` +section. The classic processor silently ignores `processors.advanced`. +See the +[RDI configuration file reference]({{< relref "/integrate/redis-data-integration/reference/config-yaml-reference#processors" >}}) +for the full set of available properties.{{< /note >}} + See also the [RDI configuration file reference]({{< relref "/integrate/redis-data-integration/reference/config-yaml-reference#processors" >}}) for full details of the other available properties. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md index a9e1822fce..bca50810ab 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md @@ -17,7 +17,10 @@ weight: 30 --- In the example below, data is captured from the source table named `invoice` and is written to a Redis set. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. When you specify the -`data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. +`data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. + +{{< note >}}The `set` data type is supported by the classic stream processor only. +The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} When writing to a set, you must supply an extra argument, `member`, which specifies the field that will be written. In this case, the result will be a Redis set with key names based on the key expression (for example, `invoices:Germany`, `invoices:USA`) and with an expiration of 100 seconds. If you don't supply an `expire` parameter, the keys will never expire. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md index ae9746557d..fd9b546314 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md @@ -19,6 +19,9 @@ weight: 30 In the example below, data is captured from the source table named `invoice` and is written to a Redis sorted set. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. When you specify the `data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. +{{< note >}}The `sorted_set` data type is supported by the classic stream processor only. +The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} + When writing to sorted sets, you must provide two additional arguments, `member` and `score`. These specify the field names that will be used as a member and a score to add an element to a sorted set. In this case, the result will be a Redis sorted set named `invoices:sorted` based on the key expression and with an expiration of 100 seconds for each set member. If you don't supply an `expire` parameter, the keys will never expire. ```yaml diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md index 570923cb77..8b429d3aca 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md @@ -16,8 +16,11 @@ type: integration weight: 30 --- -In the example below, data is captured from the source table named `invoice` and is written to a Redis stream. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. -When you specify the `data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. +In the example below, data is captured from the source table named `invoice` and is written to a Redis stream. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. +When you specify the `data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. + +{{< note >}}The `stream` data type is supported by the classic stream processor only. +The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} When writing to streams, you can use the optional parameter `mapping` to limit the number of fields sent in a message and to provide aliases for them. If you don't use the `mapping` parameter, all fields captured in the source will be passed as the message payload. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md index a060c9a196..4058187319 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md @@ -19,6 +19,9 @@ weight: 30 The string data type is useful for capturing a string representation of a single column from a source table. +{{< note >}}The `string` data type is supported by the classic stream processor only. +The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} + In the example job below, the `title` column is captured from the `album` table in the source. The `title` is then written to the Redis target database as a string under a custom key of the form `AlbumTitle:42`, where the `42` is the primary key value of the table (the `albumid` column). diff --git a/content/integrate/redis-data-integration/installation/install-k8s.md b/content/integrate/redis-data-integration/installation/install-k8s.md index cad1007b09..cd4369564c 100644 --- a/content/integrate/redis-data-integration/installation/install-k8s.md +++ b/content/integrate/redis-data-integration/installation/install-k8s.md @@ -267,6 +267,48 @@ oc get projects -o yaml | grep "openshift.io/sa.scc" ``` {{< /warning >}} +### Configure the Flink processor + +RDI ships with two stream processor implementations: the default *classic* +processor and the +[Apache Flink](https://flink.apache.org/)-based *Flink* processor. +See +[Stream processor implementations]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}) +for an overview of the differences. + +To configure the Flink processor at the Helm chart level, add the +`operator.dataPlane.flinkProcessor` block to your `rdi-values.yaml` file. The +snippet below shows a few of the most commonly adjusted values. See the +`flinkProcessor` block in the Helm chart's `values.yaml` for the full set of +supported values. + +```yaml +operator: + dataPlane: + flinkProcessor: + jobManager: + # JobManager pod resources. + cpu: 0.1 + memory: 1024 + taskManager: + # TaskManager pod resources. + cpu: 1 + memory: 2048 + # Number of parallel task slots per TaskManager pod. + # Total parallelism is `replicas * numberOfTaskSlots`. + numberOfTaskSlots: 1 +``` + +Configuring the Flink processor at the Helm chart level only sets the values +that the operator will use when deploying the JobManager and TaskManager workloads. +To run a specific pipeline on the Flink processor, set +[`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}) +to `flink` in that pipeline's `config.yaml`. Pipelines without this setting +continue to use the classic processor. + +For migrating existing pipelines to the Flink processor, see +[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}). + ## Check the installation To verify the status of the K8s deployment, run the following command: diff --git a/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md b/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md new file mode 100644 index 0000000000..6929bc53ab --- /dev/null +++ b/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md @@ -0,0 +1,141 @@ +--- +Title: Migrate from the classic processor to the Flink processor +alwaysopen: false +categories: +- docs +- integrate +- rs +- rdi +description: Learn how to migrate an existing RDI pipeline from the classic stream processor to the Apache Flink-based processor. +group: di +hideListLinks: false +linkTitle: Migrate to the Flink processor +summary: Redis Data Integration keeps Redis in sync with the primary database in near + real time. +type: integration +weight: 35 +--- + +RDI ships with two stream processor implementations. The default *classic* +processor is implemented in Python and runs on both VMs and Kubernetes. The +*Flink* processor is built on top of [Apache Flink](https://flink.apache.org/) +and currently runs on Kubernetes only. It can achieve much higher throughput +during snapshots, scales horizontally by changing the number of TaskManager replicas, +and uses Flink checkpointing for fault tolerance. See [Stream processor implementations]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}) +for an overview. + +This page describes how to migrate an existing pipeline from the classic +processor to the Flink processor. + +{{< note >}}The Flink processor is currently supported on Kubernetes only. VM +installations must continue to use the classic processor.{{< /note >}} + +## Before you migrate + +Confirm that your pipeline is compatible with the Flink processor: + +- The Flink processor supports `hash` and `json` target data types only. If + any of your jobs use the `set`, `sorted_set`, `stream`, or `string` data + types, those jobs must be rewritten or kept on the classic processor. +- `JSON.MERGE` semantics differ from the classic processor's Lua-based merge + when null values are involved (see + [`use_native_json_merge`]({{< relref "/integrate/redis-data-integration/reference/config-yaml-reference#processors" >}})). + The Flink processor always uses the native `JSON.MERGE` command when the + target database supports it. +- Ensure your Kubernetes cluster has enough capacity for the Flink JobManager + and TaskManager pods (see + [Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}}) + for the default sizing). + +## Step 1: Configure the Flink processor at the Helm chart level + +The Flink processor is always available — no opt-in is required at the Helm +chart level. The defaults are sized for typical workloads, so you can skip +this step if you don't need to override them. To adjust the JobManager and +TaskManager defaults, add an `operator.dataPlane.flinkProcessor` block to +your `rdi-values.yaml` file and run `helm upgrade` as described in +[Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}}). +Existing pipelines continue to run on the classic processor until you switch +them in step 2. + +## Step 2: Switch the pipeline to the Flink processor + +In the pipeline's `config.yaml`, set +[`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}) +to `flink`: + +```yaml +processors: + type: flink + ... +``` + +Then redeploy the pipeline. The operator stops the classic processor pods +and starts the Flink JobManager and TaskManager workloads for the pipeline. + +## Step 3: Adapt deprecated and Classic-only properties + +Some `processors` properties are no-ops, classic-only, or have moved to +`processors.advanced` for the Flink processor. The following table lists the +properties that need attention when migrating. + +| Property | Action when migrating to Flink | +| :-- | :-- | +| `on_failed_retry_interval` | No-op. Remove. | +| `duration` | No-op. Use `read_batch_timeout_ms` instead. | +| `dedup`, `dedup_max_size`, `dedup_strategy` | Classic-only. Remove. | +| `enable_async_processing`, `batch_queue_size`, `ack_queue_size` | Classic-only. Remove. | +| `initial_sync_processes` | Classic-only. Configure parallelism through `advanced.flink.taskmanager.numberOfTaskSlots` and `advanced.resources.taskManager.replicas` instead. | +| `idle_streams_check_interval_ms`, `busy_streams_check_interval_ms` | Classic-only. Use `processors.advanced.source.discovery.interval.ms` for a single discovery interval. | +| `idle_sleep_time_ms` | Classic-only. Remove. | +| `use_native_json_merge` | Classic-only. The Flink processor always uses `JSON.MERGE` when the target supports it. | + +The classic processor silently ignores `processors.advanced`, so keeping +both top-level properties and their `processors.advanced` equivalents lets +you switch back without further edits. + +## Step 4: Tune the Flink processor (optional) + +Fine-tune the Flink processor through the `processors.advanced` section. +For example: + +```yaml +processors: + type: flink + advanced: + source: + # Time between checks for new input streams. + discovery.interval.ms: 1000 + target: + # Verify writes are replicated before acknowledging. + wait.enabled: true + wait.write.timeout.ms: 1000 + flink: + # Number of parallel task slots per TaskManager pod. + taskmanager.numberOfTaskSlots: 2 + # Total memory budget for each TaskManager JVM process. + taskmanager.memory.process.size: 4096m + resources: + taskManager: + # Number of TaskManager pods + replicas: 2 +``` + +See the +[`processors.advanced` reference]({{< relref "/integrate/redis-data-integration/reference/config-yaml-reference#processors" >}}) +for the full set of available properties. + +## Step 5: Update observability + +The Flink processor does not use `rdi-metrics-exporter`. It exposes +Prometheus metrics directly from the Flink JobManager and TaskManager pods. +See +[Flink processor metrics]({{< relref "/integrate/redis-data-integration/observability#flink-processor-metrics" >}}) +for the `ServiceMonitor` configuration and the available metrics. + +## Rolling back + +To revert a pipeline to the classic processor, set `processors.type` back to +`classic` (or remove the property) and redeploy the pipeline. The +`processors.advanced` section is silently ignored by the classic processor, +so you don't need to remove it before switching back. diff --git a/content/integrate/redis-data-integration/installation/upgrade.md b/content/integrate/redis-data-integration/installation/upgrade.md index 48592b3ecf..19c74f1c32 100644 --- a/content/integrate/redis-data-integration/installation/upgrade.md +++ b/content/integrate/redis-data-integration/installation/upgrade.md @@ -137,19 +137,37 @@ the RDI configuration again after this step. ### Upgrading to RDI 1.8.0 or later from an earlier version -When upgrading to RDI 1.8.0 or later from an earlier version +When upgrading to RDI 1.8.0 or later from an earlier version you must adapt your `rdi-values.yaml` file to the following changes: -- All collector and processor values that were previously under `collector`, - `collectorSourceMetricsExporter`, and `processor` have been moved to +- All collector and processor values that were previously under `collector`, + `collectorSourceMetricsExporter`, and `processor` have been moved to `operator.dataPlane.collector` and `operator.dataPlane.processor`. -- `global.collectorApiEnabled` has been moved to `operator.dataPlane.collectorApi.enabled`, +- `global.collectorApiEnabled` has been moved to `operator.dataPlane.collectorApi.enabled`, and is now a boolean value, not `"0"` or `"1"`. - `api.authEnabled` is also now a boolean value, not `"0"` or `"1"`. -- The following values have been deprecated: `rdiMetricsExporter.service.protocol`, - `rdiMetricsExporter.service.port`, `rdiMetricsExporter.serviceMonitor.path`, +- The following values have been deprecated: `rdiMetricsExporter.service.protocol`, + `rdiMetricsExporter.service.port`, `rdiMetricsExporter.serviceMonitor.path`, `api.service.name`. +### The Flink processor is opt-in + +The +[Apache Flink](https://flink.apache.org/)-based stream processor introduced +alongside the classic processor is opt-in. Upgrading the Helm chart does not +change the processor used by existing pipelines, which keep running on the +classic processor until you explicitly switch them by setting +[`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}) +to `flink` in their `config.yaml`. + +To enable the Flink processor workloads on your cluster, add the +`operator.dataPlane.flinkProcessor` block to your `rdi-values.yaml` file +as described in +[Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}}), +and see +[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) +for the per-pipeline migration steps. + ### Verifying the upgrade Check that all pods have `Running` status: diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 59a23052ee..622db65a53 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -25,9 +25,11 @@ to query the metrics and plot simple graphs or with [Grafana](https://grafana.com/) to produce more complex visualizations and dashboards. -RDI exposes three endpoints: +RDI exposes the following endpoints: - **Collector metrics**: CDC collector performance and connectivity -- **Stream processor metrics**: Data processing performance and throughput +- **Stream processor metrics**: Data processing performance and throughput. The exposed metrics depend on the [stream processor implementation]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}) used by the pipeline: + - The classic processor exposes the metrics described in [Stream processor metrics](#stream-processor-metrics) through the `rdi-metrics-exporter` service. + - The Flink processor exposes the metrics described in [Flink processor metrics](#flink-processor-metrics) directly from its JobManager and TaskManager pods. The `rdi-metrics-exporter` is not deployed for Flink-based pipelines. - **Operator metrics**: Kubernetes operator health and Pipeline resource states The sections below explain these sets of metrics in more detail. @@ -97,6 +99,23 @@ For Helm installations, the metrics are available via autodiscovery in the K8s c enabled: true ``` + - For the Flink processor, enable the JobManager and TaskManager `ServiceMonitor` resources under `operator.dataPlane.flinkProcessor`: + ```yaml + operator: + dataPlane: + flinkProcessor: + jobManager: + serviceMonitor: + enabled: true + labels: + release: prometheus + taskManager: + serviceMonitor: + enabled: true + labels: + release: prometheus + ``` + {{< note >}}The Prometheus service discovery loop runs at regular intervals. This means that after deploying or updating RDI with the above configuration, it may take a few minutes for Prometheus to discover the new ServiceMonitors and start scraping metrics from the RDI components. {{< /note >}} @@ -153,11 +172,16 @@ Many metrics include context labels that specify the phase (`snapshot` or `strea ## Stream processor metrics +The metrics in this section are reported by the *classic* stream processor and +exposed through the `rdi-metrics-exporter` service. For pipelines that use +the [Flink processor]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}), +see [Flink processor metrics](#flink-processor-metrics) instead. + RDI reports metrics during the two main phases of the ingest pipeline, the *snapshot* phase and the *change data capture (CDC)* phase. (See the [pipeline lifecycle]({{< relref "/integrate/redis-data-integration/data-pipelines" >}}) docs for more information). The table below shows the full set of metrics that -RDI reports with their descriptions. +RDI reports with their descriptions. | Metric Name | Metric Type | Metric Description | Alerting Recommendations | |-------------|-------------|--------------------|-----------------------| @@ -204,6 +228,64 @@ RDI reports with their descriptions. - **Last batch metrics**: Show real-time performance data for the most recently processed batch {{< /note >}} +## Flink processor metrics + +The Flink processor exposes Prometheus metrics directly from its JobManager +and TaskManager pods. The `rdi-metrics-exporter` is not deployed for +Flink-based pipelines, and the metrics described in +[Stream processor metrics](#stream-processor-metrics) are not available. + +The full set of metrics returned by the Flink processor is large and includes +every metric emitted by the underlying Flink runtime (job, task, operator, +JVM, network, and connector metrics). See the +[Flink metrics documentation](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/ops/metrics/) +for the full reference of Flink-emitted metrics, and the +[Flink Prometheus reporter](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/metric_reporters/#prometheus) +docs for the naming scheme. + +Configure Prometheus to scrape these metrics by enabling the JobManager and +TaskManager `ServiceMonitor` resources under `operator.dataPlane.flinkProcessor`, +as shown in [Helm installation](#helm-installation) above. + +### Useful metrics + +In addition to the standard Flink metrics, the Flink processor emits a small +set of RDI-specific metrics that cover record counters, source/target +connectivity, and stream backlog. These metrics, together with a curated +subset of native Flink metrics, are surfaced through the +[RDI API v2 metric collections endpoint]({{< relref "/integrate/redis-data-integration/reference/api-reference" >}}) +and are the recommended starting point for dashboards and alerts. + +**RDI-emitted metrics** (per pipeline): + +| Metric | Description | +|---|---| +| `flink_jobmanager_job_operator_coordinator_stream_type_rdiRecords` | Per-stream record counters. Labels: `stream`, `type` (one of `incoming`, `inserted`, `updated`, `deleted`, `filtered`, `rejected`). | +| `flink_jobmanager_job_operator_coordinator_enumerator_stream_type_rdiRecords` | Per-stream backlog and freshness. Labels: `stream`, `type` (`pending` for stream length, `lastArrival` for the epoch-millisecond timestamp of the last entry). | +| `flink_taskmanager_job_task_operator_rdi_connected` | Source or target connection status (`1` = connected, `0` = disconnected). Filter by `operator_name` equal to `Source:_source` for the source and matching the regex `.*:target:_Writer$` for target writers; treat the source or target as connected if any subtask reports `1`. | +| `flink_taskmanager_job_task_operator_rdi_lastModified` | Epoch-millisecond timestamp of the last successful write to the target Redis database. Filter by `operator_name` matching `.*:target:_Writer$` and take the maximum across subtasks. | +| `flink_taskmanager_job_task_operator_pendingAck` | Number of records emitted by the source but awaiting checkpoint completion before being acknowledged. Sum across subtasks. | + +**Native Flink metrics** used by the API: + +| Metric | Description | +|---|---| +| `flink_taskmanager_job_task_operator_numRecordsInPerSecond` | Per-operator throughput. For source throughput, filter by `operator_name` equal to `Source:_source` and sum across subtasks. For sink throughput, filter by `operator_name` matching `.*:target:_Writer$` and sum across subtasks and across all target writers. | +| `flink_taskmanager_job_task_busyTimeMsPerSecond` | Time the task spends actively processing records (ms/s). Average across subtasks of the main chained task; exclude the `dlq:_Writer` task. | +| `flink_taskmanager_job_task_idleTimeMsPerSecond` | Time the task spends waiting for input (ms/s). Average across subtasks of the main chained task; exclude the `dlq:_Writer` task. | +| `flink_taskmanager_job_task_backPressuredTimeMsPerSecond` | Time the task spends back-pressured because the downstream cannot keep up (ms/s). Average across subtasks of the main chained task; exclude the `dlq:_Writer` task. | +| `flink_jobmanager_job_lastCheckpointDuration` | Duration of the most recent checkpoint (ms). | +| `flink_jobmanager_job_lastCheckpointSize` | Persisted size of the most recent checkpoint (bytes). | +| `flink_jobmanager_job_numberOfCompletedCheckpoints` | Total number of completed checkpoints. | +| `flink_jobmanager_job_numberOfFailedCheckpoints` | Total number of failed checkpoints. | +| `flink_jobmanager_job_Time` | Time spent in each job state (ms), where `` is one of `running`, `restarting`, `failing`, `cancelling`, `initializing`, `created`, or `deploying`. The metric for the current state is non-zero; all others are zero. Use this to derive both the current job status and the time spent in it. | +| `flink_jobmanager_job_numRestarts` | Total number of job restarts since submission. | + +{{< note >}}Flink runtime metric names follow Flink's own naming scheme rather +than the `rdi_` prefix used by the classic processor. When you build +dashboards that should work for both processors, query the two metric sets +separately.{{< /note >}} + ## Operator metrics The RDI operator exposes Prometheus metrics at the `/metrics` endpoint to monitor the health and state of the operator itself and the Pipeline resources it manages. From ed1dba9f84675d7b1d8a4d054769957bc9391df5 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Thu, 7 May 2026 16:58:13 +0300 Subject: [PATCH 03/13] Add classic-vs-flink.md and caching docs --- .../_index.md} | 51 +++--- .../architecture/classic-vs-flink.md | 154 ++++++++++++++++++ .../caching-expression-results.md | 138 ++++++++++++++++ .../integrate/redis-data-integration/faq.md | 25 +++ .../data-transformation/add_field.md | 22 +-- .../reference/data-transformation/cache.md | 65 ++++++++ .../reference/data-transformation/filter.md | 9 +- .../reference/data-transformation/lookup.md | 30 +++- .../reference/data-transformation/map.md | 9 +- 9 files changed, 448 insertions(+), 55 deletions(-) rename content/integrate/redis-data-integration/{architecture.md => architecture/_index.md} (87%) create mode 100644 content/integrate/redis-data-integration/architecture/classic-vs-flink.md create mode 100644 content/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results.md create mode 100644 content/integrate/redis-data-integration/reference/data-transformation/cache.md diff --git a/content/integrate/redis-data-integration/architecture.md b/content/integrate/redis-data-integration/architecture/_index.md similarity index 87% rename from content/integrate/redis-data-integration/architecture.md rename to content/integrate/redis-data-integration/architecture/_index.md index 459e97ba57..e399fcac7f 100644 --- a/content/integrate/redis-data-integration/architecture.md +++ b/content/integrate/redis-data-integration/architecture/_index.md @@ -10,6 +10,7 @@ categories: description: Discover the main components of RDI group: di headerRange: '[2]' +hideListLinks: false linkTitle: Architecture summary: Redis Data Integration keeps Redis in sync with the primary database in near real time. @@ -127,43 +128,33 @@ It includes: and exports them as [Prometheus](https://prometheus.io/) metrics. The *data plane* contains the processes that actually move the data. -It includes the *CDC collector* and the *stream processor* that implement +It includes the *CDC collector* and the *stream processor* that implement the two phases of the pipeline lifecycle (initial cache loading and change streaming). The *management plane* provides tools that let you interact -with the control plane. +with the control plane. -- Use the CLI tool to install and administer RDI and to deploy - and manage a pipeline. -- Use the pipeline editor included in Redis Insight to design +- Use the CLI tool to install and administer RDI and to deploy + and manage a pipeline. +- Use the pipeline editor included in Redis Insight to design or edit a pipeline. - + The diagram below shows all RDI components and the interactions between them: {{< image filename="images/rdi/ingest/ingest-control-plane.webp" >}} + ## Stream processor implementations -RDI provides two implementations of the stream processor. You select the -implementation per pipeline through the +RDI provides two implementations of the stream processor, *classic* and +*Flink*. You select the implementation per pipeline through the [`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}) property in `config.yaml`. The default is `classic`, so existing pipelines keep their behavior unchanged. -- The **classic** processor is implemented in Python. It is the original RDI - stream processor, supports both VM and Kubernetes deployments, and writes - to all Redis target data types (`hash`, `json`, `set`, `sorted_set`, - `stream`, `string`). - -- The **Flink** processor is implemented on top of - [Apache Flink](https://flink.apache.org/) and currently runs on Kubernetes only. - It can achieve much higher throughput during snapshots, scales horizontally - by changing the number of TaskManager replicas, uses Flink checkpointing for fault tolerance, - and exposes Prometheus metrics directly from its JobManager and TaskManager pods - (the `rdi-metrics-exporter` is not deployed for Flink-based pipelines). - The Flink processor currently supports only `hash` and `json` target data types. - See +[Differences between the classic and Flink processors]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}) +for a side-by-side comparison and [Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) for guidance on migrating an existing pipeline to the Flink processor. @@ -174,9 +165,9 @@ deploy RDI. ### RDI on your own VMs -For this deployment, you must provide two VMs. The collector and stream processor -are active on one VM, while on the other they are in standby to provide high availability. -The two operators running on both VMs use a leader election algorithm to decide which +For this deployment, you must provide two VMs. The collector and stream processor +are active on one VM, while on the other they are in standby to provide high availability. +The two operators running on both VMs use a leader election algorithm to decide which VM is the active one (the "leader"). The diagram below shows this configuration: @@ -193,16 +184,16 @@ on [Kubernetes (K8s)](https://kubernetes.io/), including Red Hat - A K8s [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) named `rdi`. You can also use a different namespace name if you prefer. -- [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and - [services](https://kubernetes.io/docs/concepts/services-networking/service/) for the +- [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and + [services](https://kubernetes.io/docs/concepts/services-networking/service/) for the [RDI operator]({{< relref "/integrate/redis-data-integration/architecture#how-rdi-is-deployed" >}}), [metrics exporter]({{< relref "/integrate/redis-data-integration/observability" >}}), and API server. -- A [service account](https://kubernetes.io/docs/concepts/security/service-accounts/) +- A [service account](https://kubernetes.io/docs/concepts/security/service-accounts/) and [RBAC resources](https://kubernetes.io/docs/reference/access-authn-authz/rbac) for the RDI operator. - A [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) with RDI database details. - [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) with the RDI database credentials and TLS certificates. -- Other optional K8s resources such as [ingresses](https://kubernetes.io/docs/concepts/services-networking/ingress/) +- Other optional K8s resources such as [ingresses](https://kubernetes.io/docs/concepts/services-networking/ingress/) that can be enabled depending on your K8s environment and needs. See [Install on Kubernetes]({{< relref "/integrate/redis-data-integration/installation/install-k8s" >}}) @@ -210,8 +201,8 @@ for more information. ### Secrets and security considerations -The credentials for the database connections, as well as the certificates +The credentials for the database connections, as well as the certificates for [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) and -[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) are saved in K8s secrets. +[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) are saved in K8s secrets. RDI stores all state and configuration data inside the Redis Enterprise cluster and does not store any other data on your RDI VMs or anywhere else outside the cluster. diff --git a/content/integrate/redis-data-integration/architecture/classic-vs-flink.md b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md new file mode 100644 index 0000000000..78542d3887 --- /dev/null +++ b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md @@ -0,0 +1,154 @@ +--- +Title: Differences between the classic and Flink processors +alwaysopen: false +categories: +- docs +- integrate +- rs +- rdi +description: Compare the classic and Flink stream processor implementations. +group: di +linkTitle: Classic vs. Flink processor +summary: Redis Data Integration keeps Redis in sync with the primary database in near + real time. +type: integration +weight: 10 +--- + +RDI ships with two stream processor implementations. Both consume the same +source streams, share the same job-level configuration model, and write to +the same Redis target, but they differ in architecture, supported features, +configuration, observability, error handling, and performance. + +This page summarizes those differences. See +[Which processor should I use?]({{< relref "/integrate/redis-data-integration/faq#which-processor-should-i-use" >}}) +in the FAQ for the recommendation, and +[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) +for a step-by-step migration guide. + +## At a glance + +| Aspect | Classic processor | Flink processor | +|---|---|---| +| Implementation | Python | Java on top of [Apache Flink](https://flink.apache.org/) | +| Deployment targets | VM and Kubernetes | Kubernetes only | +| Scaling | Single replica | Horizontal: TaskManager replicas × task slots per TaskManager | +| Fault tolerance | Source-stream consumer-group replay | Source-stream consumer-group replay plus Flink checkpointing | +| Supported `data_type` outputs | `hash`, `json`, `set`, `sorted_set`, `stream`, `string` | `hash`, `json` | +| Metrics endpoint | `rdi-metrics-exporter` pod | Flink JobManager `/metrics` (no metrics exporter) | +| Metric naming | `rdi_*` (e.g., `rdi_incoming_entries`) | `flink_*` (e.g., `flink_jobmanager_job_operator_coordinator_stream_type_rdiRecords`) | +| End-to-end latency | Bounded by the per-batch read-process-write cycle | Records flow through pipelined operator chains without a per-batch barrier | +| Snapshot throughput | Limited by single shared reader and writer | Parallelized across all task slots | +| Expression and `redis.lookup` result caching | Not supported | Optional, opt-in per transformation | + +## Architecture and deployment + +The classic processor runs as a single pod managed by the operator +and can be deployed on either VMs or Kubernetes through the RDI Helm +chart. + +The Flink processor runs as an Apache Flink application cluster: one +JobManager pod plus one or more TaskManager pods. Source, +transformation, and sink operators run as parallel subtasks across +all task slots in the cluster. The Flink processor scales +horizontally by changing the number of TaskManager replicas +(`advanced.resources.taskManager.replicas`); with adaptive +parallelism, the default parallelism is the product of TaskManager +replicas and task slots per TaskManager. The Flink processor +currently runs on Kubernetes only; VM support is planned for a future +release. + +Both processors retain at-least-once delivery semantics; the Flink +processor adds Flink checkpointing on top of the shared +consumer-group replay mechanism. + +See +[Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}}) +for the Helm settings. + +## Configuration + +The two processors share the same `config.yaml` envelope and the same +`connections`, `sources`, `targets`, and `jobs` sections. The only +differences are inside the `processors:` block, which is selected via +`processors.type` (`classic` or `flink`, default `classic`). Properties +that apply to only one implementation are annotated with +**Classic processor only.** or **Flink processor only.** in the +[pipeline configuration reference]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}), +and are silently ignored by the other implementation. The Flink +processor exposes additional fine-grained tuning under +`processors.advanced.*`. + +## Supported output formats + +The classic processor supports all `data_type` values: `hash`, `json`, +`set`, `sorted_set`, `stream`, and `string`. The Flink processor +currently supports only `hash` and `json`. Pipelines that use any other +output type must remain on the classic processor or rewrite the +affected jobs. Support for the remaining output types is planned for a +future release. + +## Transformation extensions + +The two processors support the same set of transformation blocks +(`filter`, `map`, `add_field`, `remove_field`, `rename_field`, +`redis.lookup`) and the same expression languages (JMESPath and SQL). +Pipelines written for one processor generally execute on the other +without changes. + +The Flink processor adds three optional, performance-oriented +extensions that are not available with the classic processor: + +- **Expression result caching** through a per-expression `cache:` + block on `filter`, `map`, `add_field`, and `redis.lookup` arguments. +- **`redis.lookup` result caching** through a `lookup_cache:` block. +- **`redis.lookup` batching**, which groups lookups into a single + Redis pipeline. Batching is enabled by default with sensible + defaults; the optional `batch:` block lets you override them. + +See +[Caching expression results]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results" >}}) +for examples and +[`redis.lookup`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/lookup" >}}) +for the full property list. + +## Metrics + +The two processors expose different Prometheus metric sets and use +different naming schemes, so dashboards and alerts cannot be reused +as-is between them. The classic processor exposes its metrics through +the `rdi-metrics-exporter` pod. The Flink processor emits metrics +directly from the JobManager and TaskManager pods through Flink's +native Prometheus reporter; no metrics exporter is deployed. + +See +[Observability — Flink processor metrics]({{< relref "/integrate/redis-data-integration/observability#flink-processor-metrics" >}}) +for the customer-facing list of metrics. + +## Error handling and DLQ + +Both processors implement a dead-letter queue (DLQ) at +`dlq:{stream_name}` and honor the same top-level `error_handling` +(`dlq` or `ignore`) and `dlq_max_messages` properties. The Flink +processor surfaces a few corner cases as DLQ entries that the classic +processor logs and skips (for example, missing parent +keys in nested writes and exceptions thrown by `when` expressions on +`redis.lookup`). The DLQ entry field set and value encoding also +differ: the classic processor uses Python-stringified values, +while the Flink processor uses JSON. + +## Performance + +The Flink processor delivers significantly higher throughput during +the initial snapshot and lower end-to-end latency in steady state. +The classic processor uses a sequential read-process-write batching +cycle, so each record waits for its batch to complete before being +written to the target. The Flink processor pipelines records through +operator chains without a per-batch barrier, and parallelizes work +across all task slots, which both lowers per-record latency and +raises throughput. + +The Flink processor has a larger baseline memory footprint (JVM plus +Flink runtime overhead per TaskManager) but, for most pipelines, the +performance gains and the additional features (caching, batching, +horizontal scaling, checkpointing) outweigh that cost. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results.md new file mode 100644 index 0000000000..c6b9527a11 --- /dev/null +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results.md @@ -0,0 +1,138 @@ +--- +Title: Caching expression results +alwaysopen: false +categories: +- docs +- integrate +- rs +- rdi +description: null +group: di +linkTitle: Caching expression results +summary: How to cache expression and lookup results to reduce CPU and Redis load +type: integration +weight: 50 +--- + +The Flink processor can cache the result of any expression that +produces a value (for example, an +[`add_field`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/add_field" >}}) +expression, a [`map`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/map" >}}) +expression, the arguments to a +[`redis.lookup`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/lookup" >}}), +or a custom output `key`/`expire` expression). Caching is useful when +the same expression is evaluated repeatedly with the same input field +values, for example when many incoming records share a common foreign +key. + +{{< note >}}Caching is supported only by the **Flink processor**. The +classic processor silently ignores `cache:` blocks.{{< /note >}} + +## The `cache:` block + +You enable caching by adding a `cache:` block next to the expression +you want to cache. Cache keys are derived from the values of the input +fields referenced by the expression, not from the full record. See +[`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}) +for the full property list. + +| Property | Type | Description | Default | +| ------------- | --------- | -------------------------------------------------------------- | ------- | +| `enabled` | `boolean` | Set to `true` to enable caching. | `false` | +| `max_size` | `integer` | Maximum number of entries kept in the cache. Must be positive. | `1000` | +| `ttl_seconds` | `integer` | Time-to-live for each entry, in seconds. Must be positive. | `60` | + +## Caching an `add_field` expression + +The example below adds a `country` field whose value is derived from +`country_code` and `country_name`. When the same combination of input +values appears repeatedly (for example, many customers from the same +country), caching the result avoids re-evaluating the expression. + +```yaml +name: Cached country field +source: + schema: dbo + table: customer +transform: + - uses: add_field + with: + field: country + language: sql + expression: country_code || ' - ' || UPPER(country_name) + cache: + enabled: true + max_size: 500 + ttl_seconds: 300 +``` + +## Caching a `map` expression + +```yaml +name: Cached map expression +source: + table: customer +transform: + - uses: map + with: + language: jmespath + expression: | + { + "CustomerId": customer_id, + "Country": country_code + } + cache: + enabled: true +``` + +## Caching `redis.lookup` arguments and results + +`redis.lookup` supports two independent caches. The `cache:` block +caches the *argument* expressions (the JMESPath or SQL expressions +that produce the Redis command arguments). The `lookup_cache:` block +caches the *result* of the Redis command itself, keyed by the +resolved arguments. Both blocks accept the same properties as the +`cache:` block above. + +```yaml +name: Cached lookup +source: + table: order +transform: + - uses: redis.lookup + with: + connection: target + cmd: HGETALL + args: + - concat(['customer:', customer_id]) + language: jmespath + field: customer + cache: + enabled: true + ttl_seconds: 60 + lookup_cache: + enabled: true + max_size: 10000 + ttl_seconds: 300 +``` + +## Caching `key` and `expire` output expressions + +A `cache:` block can also be added to the +[output `key` and `expire` expressions]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples/_index" >}}) +when those are dynamic. The properties are the same as above. + +```yaml +name: Cached key expression +source: + table: order +output: + - uses: redis.write + with: + data_type: hash + key: + expression: concat(['order:', order_id]) + language: jmespath + cache: + enabled: true +``` diff --git a/content/integrate/redis-data-integration/faq.md b/content/integrate/redis-data-integration/faq.md index a501baf6d9..a3a66368c1 100644 --- a/content/integrate/redis-data-integration/faq.md +++ b/content/integrate/redis-data-integration/faq.md @@ -105,3 +105,28 @@ operator: ``` This option is available in RDI 1.16.2 and later. + +## Which processor should I use? {#which-processor-should-i-use} + +RDI ships with two stream processor implementations: the *classic* +processor and the *Flink* processor. The Flink processor is available +in RDI 1.18.0 and later. We strongly recommend using the Flink +processor for new pipelines and migrating existing pipelines to it +where possible. The Flink processor delivers significantly higher +snapshot throughput, lower end-to-end latency, horizontal scaling, +and Flink checkpointing on top of the same at-least-once delivery +guarantees as the classic processor. + +Use the classic processor only if your pipeline depends on a feature +that the Flink processor does not yet support: + +- **Output `data_type` other than `hash` or `json`** (for example, + `set`, `sorted_set`, `stream`, or `string`). +- **VM installations.** The Flink processor currently runs on + Kubernetes only. + +Both limitations are expected to be lifted in a future release. See +[Differences between the classic and Flink processors]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}) +for a side-by-side comparison and +[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) +for a step-by-step migration guide. diff --git a/content/integrate/redis-data-integration/reference/data-transformation/add_field.md b/content/integrate/redis-data-integration/reference/data-transformation/add_field.md index fee1268cc2..f6296c41e2 100644 --- a/content/integrate/redis-data-integration/reference/data-transformation/add_field.md +++ b/content/integrate/redis-data-integration/reference/data-transformation/add_field.md @@ -53,11 +53,12 @@ Add one field **Properties** -| Name | Type | Description | Required | -| -------------- | -------- | --------------------------------------------- | -------- | -| **field** | `string` | Field
| yes | -| **expression** | `string` | Expression
| yes | -| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| Name | Type | Description | Required | +| -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| **field** | `string` | Field
| yes | +| **expression** | `string` | Expression
| yes | +| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| **cache** | `object` | Cache the result of the field expression. See [`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}) for the property list. **Flink processor only.**
| no | **Additional Properties:** not allowed @@ -85,11 +86,12 @@ Fields **Item Properties** -| Name | Type | Description | Required | -| -------------- | -------- | --------------------------------------------- | -------- | -| **field** | `string` | Field
| yes | -| **expression** | `string` | Expression
| yes | -| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| Name | Type | Description | Required | +| -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| **field** | `string` | Field
| yes | +| **expression** | `string` | Expression
| yes | +| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| **cache** | `object` | Cache the result of the field expression. See [`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}) for the property list. **Flink processor only.**
| no | **Item Additional Properties:** not allowed diff --git a/content/integrate/redis-data-integration/reference/data-transformation/cache.md b/content/integrate/redis-data-integration/reference/data-transformation/cache.md new file mode 100644 index 0000000000..3c54e35fac --- /dev/null +++ b/content/integrate/redis-data-integration/reference/data-transformation/cache.md @@ -0,0 +1,65 @@ +--- +Title: cache +alwaysopen: false +categories: + - docs + - integrate + - rs + - rdi +description: Cache the result of an expression or lookup +group: di +linkTitle: cache +summary: + Redis Data Integration keeps Redis in sync with the primary database in near + real time. +type: integration +weight: 10 +--- + +Cache the result of an expression or lookup. Caching avoids re-evaluating +the expression or re-querying Redis when the same input field values +appear repeatedly. Cache keys are derived from the values of the input +fields referenced by the expression, not from the full record. + +The `cache:` block can be added to the following transformations and +output expressions: + +- The expression in [`add_field`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/add_field" >}}) (single-field and per-item form). +- The expression in [`filter`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/filter" >}}). +- The expression in [`map`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/map" >}}). +- The argument expressions in [`redis.lookup`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/lookup" >}}). The same block accepts a `lookup_cache:` variant that caches the lookup result returned by Redis. +- The dynamic `key` and `expire` expressions of the `redis.write` output. + +**Flink processor only.** The classic processor silently ignores `cache:` blocks. + +**Properties** + +| Name | Type | Description | Required | Default | +| ------------- | --------- | ----------------------------------------------------------------- | -------- | ------- | +| **enabled** | `boolean` | Set to `true` to enable caching. | no | `false` | +| **max_size** | `integer` | Maximum number of entries kept in the cache. Must be positive. | no | `1000` | +| **ttl_seconds** | `integer` | Time-to-live for each entry, in seconds. Must be positive. | no | `60` | + +**Additional Properties:** not allowed + +**Example** + +```yaml +source: + schema: dbo + table: customer +transform: + - uses: add_field + with: + field: country + language: sql + expression: country_code || ' - ' || UPPER(country_name) + cache: + enabled: true + max_size: 500 + ttl_seconds: 300 +``` + +See +[Caching expression results]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results" >}}) +for additional examples. diff --git a/content/integrate/redis-data-integration/reference/data-transformation/filter.md b/content/integrate/redis-data-integration/reference/data-transformation/filter.md index 17df9de484..1f55178901 100644 --- a/content/integrate/redis-data-integration/reference/data-transformation/filter.md +++ b/content/integrate/redis-data-integration/reference/data-transformation/filter.md @@ -21,10 +21,11 @@ Filter records **Properties** -| Name | Type | Description | Required | -| -------------- | -------- | --------------------------------------------- | -------- | -| **expression** | `string` | Expression
| yes | -| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| Name | Type | Description | Required | +| -------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| **expression** | `string` | Expression
| yes | +| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| **cache** | `object` | Cache the result of the filter expression. See [`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}) for the property list. **Flink processor only.**
| no | **Additional Properties:** not allowed diff --git a/content/integrate/redis-data-integration/reference/data-transformation/lookup.md b/content/integrate/redis-data-integration/reference/data-transformation/lookup.md index 478b148161..7c138d303a 100644 --- a/content/integrate/redis-data-integration/reference/data-transformation/lookup.md +++ b/content/integrate/redis-data-integration/reference/data-transformation/lookup.md @@ -18,13 +18,16 @@ weight: 10 **Properties** -| Name | Type | Description | Required | -| ----------------- | ---------- | --------------------------------------------- | -------- | -| **connection** | `string` | Connection name | yes | -| **cmd** | `string` | The command to execute | yes | -| [**args**](#args) | `string[]` | Redis command arguments | yes | -| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | -| **field** | `string` | The target field to write the result to
| yes | +| Name | Type | Description | Required | +| --------------------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| **connection** | `string` | Connection name | yes | +| **cmd** | `string` | The command to execute | yes | +| [**args**](#args) | `string[]` | Redis command arguments | yes | +| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| **field** | `string` | The target field to write the result to
| yes | +| **cache** | `object` | Cache the result of the argument expressions. See [`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}) for the property list. **Flink processor only.**
| no | +| **lookup_cache** | `object` | Cache the lookup results returned by Redis across batches, keyed by the resolved command arguments. Uses the same property list as [`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}). **Flink processor only.**
| no | +| [**batch**](#batch) | `object` | Override the default batching behavior for `redis.lookup` lookups. **Flink processor only.**
| no | **Additional Properties:** not allowed @@ -35,3 +38,16 @@ The list of expressions that produce arguments. **Items** **Item Type:** `string` + +## batch: object {#batch} + +`redis.lookup` lookups are always batched and executed through a single Redis pipeline per batch. The processor flushes a batch when either the size or the timeout limit is reached. The defaults are sensible for most pipelines; add the `batch:` block only when you need to override them. **Flink processor only.** + +**Properties** + +| Name | Type | Description | Required | Default | +| -------------- | --------- | ---------------------------------------------------------------------------------------------------- | -------- | ------- | +| **size** | `integer` | Maximum number of lookups in a single batch. Must be positive. | no | `200` | +| **timeout_ms** | `integer` | Maximum time in milliseconds to wait before flushing a non-full batch. Must be positive. | no | `100` | + +**Additional Properties:** not allowed diff --git a/content/integrate/redis-data-integration/reference/data-transformation/map.md b/content/integrate/redis-data-integration/reference/data-transformation/map.md index 8691bf5e98..d5eed8bcfa 100644 --- a/content/integrate/redis-data-integration/reference/data-transformation/map.md +++ b/content/integrate/redis-data-integration/reference/data-transformation/map.md @@ -21,10 +21,11 @@ Map a record into a new output based on expressions **Properties** -| Name | Type | Description | Required | -| ----------------------------- | ------------------ | --------------------------------------------- | -------- | -| [**expression**](#expression) | `object`, `string` | Expression
| yes | -| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| Name | Type | Description | Required | +| ----------------------------- | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| [**expression**](#expression) | `object`, `string` | Expression
| yes | +| **language** | `string` | Language
Enum: `"jmespath"`, `"sql"`
| yes | +| **cache** | `object` | Cache the result of the map expression. See [`cache`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/cache" >}}) for the property list. **Flink processor only.**
| no | **Additional Properties:** not allowed From cae737c0b817c53b2a52ce7090dc237ab7c1bfed Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Thu, 7 May 2026 17:00:14 +0300 Subject: [PATCH 04/13] Update content/integrate/redis-data-integration/data-pipelines/pipeline-config.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- .../redis-data-integration/data-pipelines/pipeline-config.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md b/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md index 6833f32f18..3b5ca2c412 100644 --- a/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md +++ b/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md @@ -165,7 +165,7 @@ processors: # dlq_max_messages: 1000 # # Target data type: hash/json - RedisJSON module must be in use in the target DB # target_data_type: hash - # # Enable merge as the default strategy to writing JSON documents + # # Enable merge as the default strategy for writing JSON documents # json_update_strategy: merge # # Use native JSON merge if the target RedisJSON module supports it # use_native_json_merge: true From d34170792609c0bcc7e0fe2e9d253ef557f37146 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Thu, 7 May 2026 17:00:26 +0300 Subject: [PATCH 05/13] Update content/integrate/redis-data-integration/installation/upgrade.md Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- .../integrate/redis-data-integration/installation/upgrade.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/installation/upgrade.md b/content/integrate/redis-data-integration/installation/upgrade.md index 19c74f1c32..083c599e20 100644 --- a/content/integrate/redis-data-integration/installation/upgrade.md +++ b/content/integrate/redis-data-integration/installation/upgrade.md @@ -150,7 +150,7 @@ you must adapt your `rdi-values.yaml` file to the following changes: `rdiMetricsExporter.service.port`, `rdiMetricsExporter.serviceMonitor.path`, `api.service.name`. -### The Flink processor is opt-in +### Enabling the Flink processor The [Apache Flink](https://flink.apache.org/)-based stream processor introduced From 78efe05cc7c8cd83768794f4b95e601982118483 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Thu, 7 May 2026 17:17:05 +0300 Subject: [PATCH 06/13] Address comments --- .../architecture/classic-vs-flink.md | 4 ++-- .../installation/migration-classic-to-flink.md | 7 ++++--- content/integrate/redis-data-integration/observability.md | 4 ++-- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/content/integrate/redis-data-integration/architecture/classic-vs-flink.md b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md index 78542d3887..041eb7b209 100644 --- a/content/integrate/redis-data-integration/architecture/classic-vs-flink.md +++ b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md @@ -35,7 +35,7 @@ for a step-by-step migration guide. | Scaling | Single replica | Horizontal: TaskManager replicas × task slots per TaskManager | | Fault tolerance | Source-stream consumer-group replay | Source-stream consumer-group replay plus Flink checkpointing | | Supported `data_type` outputs | `hash`, `json`, `set`, `sorted_set`, `stream`, `string` | `hash`, `json` | -| Metrics endpoint | `rdi-metrics-exporter` pod | Flink JobManager `/metrics` (no metrics exporter) | +| Metrics endpoint | `rdi-metrics-exporter` service | Flink JobManager `/metrics` (no metrics exporter) | | Metric naming | `rdi_*` (e.g., `rdi_incoming_entries`) | `flink_*` (e.g., `flink_jobmanager_job_operator_coordinator_stream_type_rdiRecords`) | | End-to-end latency | Bounded by the per-batch read-process-write cycle | Records flow through pipelined operator chains without a per-batch barrier | | Snapshot throughput | Limited by single shared reader and writer | Parallelized across all task slots | @@ -117,7 +117,7 @@ for the full property list. The two processors expose different Prometheus metric sets and use different naming schemes, so dashboards and alerts cannot be reused as-is between them. The classic processor exposes its metrics through -the `rdi-metrics-exporter` pod. The Flink processor emits metrics +the `rdi-metrics-exporter` service. The Flink processor emits metrics directly from the JobManager and TaskManager pods through Flink's native Prometheus reporter; no metrics exporter is deployed. diff --git a/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md b/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md index 6929bc53ab..0c8a1bc0c8 100644 --- a/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md +++ b/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md @@ -90,7 +90,8 @@ properties that need attention when migrating. | `idle_sleep_time_ms` | Classic-only. Remove. | | `use_native_json_merge` | Classic-only. The Flink processor always uses `JSON.MERGE` when the target supports it. | -The classic processor silently ignores `processors.advanced`, so keeping +The classic processor silently ignores `processors.advanced`, +and the Flink processor silently ignores classic-only top-level properties, so keeping both top-level properties and their `processors.advanced` equivalents lets you switch back without further edits. @@ -127,8 +128,8 @@ for the full set of available properties. ## Step 5: Update observability -The Flink processor does not use `rdi-metrics-exporter`. It exposes -Prometheus metrics directly from the Flink JobManager and TaskManager pods. +The Flink processor exposes Prometheus metrics directly +from the Flink JobManager and TaskManager pods. See [Flink processor metrics]({{< relref "/integrate/redis-data-integration/observability#flink-processor-metrics" >}}) for the `ServiceMonitor` configuration and the available metrics. diff --git a/content/integrate/redis-data-integration/observability.md b/content/integrate/redis-data-integration/observability.md index 622db65a53..0508c3458f 100644 --- a/content/integrate/redis-data-integration/observability.md +++ b/content/integrate/redis-data-integration/observability.md @@ -29,7 +29,7 @@ RDI exposes the following endpoints: - **Collector metrics**: CDC collector performance and connectivity - **Stream processor metrics**: Data processing performance and throughput. The exposed metrics depend on the [stream processor implementation]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}) used by the pipeline: - The classic processor exposes the metrics described in [Stream processor metrics](#stream-processor-metrics) through the `rdi-metrics-exporter` service. - - The Flink processor exposes the metrics described in [Flink processor metrics](#flink-processor-metrics) directly from its JobManager and TaskManager pods. The `rdi-metrics-exporter` is not deployed for Flink-based pipelines. + - The Flink processor exposes the metrics described in [Flink processor metrics](#flink-processor-metrics) directly from its JobManager and TaskManager pods. The `rdi-metrics-exporter` service is not deployed for Flink-based pipelines. - **Operator metrics**: Kubernetes operator health and Pipeline resource states The sections below explain these sets of metrics in more detail. @@ -231,7 +231,7 @@ RDI reports with their descriptions. ## Flink processor metrics The Flink processor exposes Prometheus metrics directly from its JobManager -and TaskManager pods. The `rdi-metrics-exporter` is not deployed for +and TaskManager pods. The `rdi-metrics-exporter` service is not deployed for Flink-based pipelines, and the metrics described in [Stream processor metrics](#stream-processor-metrics) are not available. From e9e92a167dd36e8622fc8a301f1b6f023fd7454b Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Thu, 7 May 2026 17:55:13 +0300 Subject: [PATCH 07/13] Remove advanced.processors.log.level --- .../redis-data-integration/reference/config-yaml-reference.md | 1 - 1 file changed, 1 deletion(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 452e9cc55a..0f30bee448 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -573,7 +573,6 @@ Advanced configuration properties for the processor. **Flink processor only.** |**default\.data\.type**
(Default target data type)|`string`|Data type to use in Redis when not overridden per job: `hash` for Redis Hash, `json` for RedisJSON. Alias for `processors.target_data_type`; takes priority when both are set.
Default: `"hash"`
Enum: `"hash"`, `"json"`
|| |**default\.json\.update\.strategy**
(Default JSON update strategy)|`string`|Strategy for updating JSON data in Redis: `replace` to overwrite the entire JSON object, `merge` to merge new data with the existing JSON object. Alias for `processors.json_update_strategy`; takes priority when both are set.
Default: `"replace"`
Enum: `"replace"`, `"merge"`
|| |**dlq\.enabled**
(Enable DLQ)|`boolean`|When `true`, rejected messages are stored in the dead-letter queue; when `false`, errors are silently skipped. Alias for `processors.error_handling`; takes priority when both are set.
Default: `true`
|| -|**log\.level**
(Processor log level)|`string`|Log level for the processor. Takes priority over `processors.logging.level` when both are set.
Enum: `"trace"`, `"debug"`, `"info"`, `"warn"`, `"error"`
|| **Additional Properties** From ffeeb3ab93948adf56c29b66edb6e220059e62b2 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Thu, 7 May 2026 19:02:40 +0300 Subject: [PATCH 08/13] Minor improvements, hide cache.md --- .../architecture/_index.md | 1 - .../transform-examples/redis-set-example.md | 2 +- .../redis-sorted-set-example.md | 2 +- .../redis-stream-example.md | 2 +- .../redis-string-example.md | 2 +- .../migration-classic-to-flink.md | 14 +++++------ .../installation/upgrade.md | 24 ++++++++++--------- .../reference/data-transformation/cache.md | 3 +++ 8 files changed, 27 insertions(+), 23 deletions(-) diff --git a/content/integrate/redis-data-integration/architecture/_index.md b/content/integrate/redis-data-integration/architecture/_index.md index e399fcac7f..f9d09f0453 100644 --- a/content/integrate/redis-data-integration/architecture/_index.md +++ b/content/integrate/redis-data-integration/architecture/_index.md @@ -143,7 +143,6 @@ The diagram below shows all RDI components and the interactions between them: {{< image filename="images/rdi/ingest/ingest-control-plane.webp" >}} - ## Stream processor implementations RDI provides two implementations of the stream processor, *classic* and diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md index bca50810ab..7247e71330 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-set-example.md @@ -19,7 +19,7 @@ weight: 30 In the example below, data is captured from the source table named `invoice` and is written to a Redis set. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. When you specify the `data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. -{{< note >}}The `set` data type is supported by the classic stream processor only. +{{< note >}}The `set` data type is supported by the classic processor only. The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} When writing to a set, you must supply an extra argument, `member`, which specifies the field that will be written. In this case, the result will be a Redis set with key names based on the key expression (for example, `invoices:Germany`, `invoices:USA`) and with an expiration of 100 seconds. If you don't supply an `expire` parameter, the keys will never expire. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md index fd9b546314..dfd81bf6b2 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-sorted-set-example.md @@ -19,7 +19,7 @@ weight: 30 In the example below, data is captured from the source table named `invoice` and is written to a Redis sorted set. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. When you specify the `data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. -{{< note >}}The `sorted_set` data type is supported by the classic stream processor only. +{{< note >}}The `sorted_set` data type is supported by the classic processor only. The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} When writing to sorted sets, you must provide two additional arguments, `member` and `score`. These specify the field names that will be used as a member and a score to add an element to a sorted set. In this case, the result will be a Redis sorted set named `invoices:sorted` based on the key expression and with an expiration of 100 seconds for each set member. If you don't supply an `expire` parameter, the keys will never expire. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md index 8b429d3aca..96c4277135 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-stream-example.md @@ -19,7 +19,7 @@ weight: 30 In the example below, data is captured from the source table named `invoice` and is written to a Redis stream. The `connection` is an optional parameter that refers to the corresponding connection name defined in `config.yaml`. When you specify the `data_type` parameter for the job, it overrides the system-wide setting `target_data_type` defined in `config.yaml`. -{{< note >}}The `stream` data type is supported by the classic stream processor only. +{{< note >}}The `stream` data type is supported by the classic processor only. The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} When writing to streams, you can use the optional parameter `mapping` to limit the number of fields sent in a message and to provide aliases for them. If you don't use the `mapping` parameter, all fields captured in the source will be passed as the message payload. diff --git a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md index 4058187319..13eeab9fca 100644 --- a/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md +++ b/content/integrate/redis-data-integration/data-pipelines/transform-examples/redis-string-example.md @@ -19,7 +19,7 @@ weight: 30 The string data type is useful for capturing a string representation of a single column from a source table. -{{< note >}}The `string` data type is supported by the classic stream processor only. +{{< note >}}The `string` data type is supported by the classic processor only. The Flink processor currently supports only `hash` and `json` outputs.{{< /note >}} In the example job below, the `title` column is captured from the `album` table in the source. diff --git a/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md b/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md index 0c8a1bc0c8..4cce19e6c3 100644 --- a/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md +++ b/content/integrate/redis-data-integration/installation/migration-classic-to-flink.md @@ -6,7 +6,7 @@ categories: - integrate - rs - rdi -description: Learn how to migrate an existing RDI pipeline from the classic stream processor to the Apache Flink-based processor. +description: Learn how to migrate an existing RDI pipeline from the classic processor to the Apache Flink-based processor. group: di hideListLinks: false linkTitle: Migrate to the Flink processor @@ -19,8 +19,8 @@ weight: 35 RDI ships with two stream processor implementations. The default *classic* processor is implemented in Python and runs on both VMs and Kubernetes. The *Flink* processor is built on top of [Apache Flink](https://flink.apache.org/) -and currently runs on Kubernetes only. It can achieve much higher throughput -during snapshots, scales horizontally by changing the number of TaskManager replicas, +and currently runs on Kubernetes only. It can achieve much higher throughput +during snapshots, scales horizontally by changing the number of TaskManager replicas, and uses Flink checkpointing for fault tolerance. See [Stream processor implementations]({{< relref "/integrate/redis-data-integration/architecture#stream-processor-implementations" >}}) for an overview. @@ -73,7 +73,7 @@ processors: Then redeploy the pipeline. The operator stops the classic processor pods and starts the Flink JobManager and TaskManager workloads for the pipeline. -## Step 3: Adapt deprecated and Classic-only properties +## Step 3: Adapt deprecated and classic-only properties Some `processors` properties are no-ops, classic-only, or have moved to `processors.advanced` for the Flink processor. The following table lists the @@ -90,7 +90,7 @@ properties that need attention when migrating. | `idle_sleep_time_ms` | Classic-only. Remove. | | `use_native_json_merge` | Classic-only. The Flink processor always uses `JSON.MERGE` when the target supports it. | -The classic processor silently ignores `processors.advanced`, +The classic processor silently ignores `processors.advanced`, and the Flink processor silently ignores classic-only top-level properties, so keeping both top-level properties and their `processors.advanced` equivalents lets you switch back without further edits. @@ -118,7 +118,7 @@ processors: taskmanager.memory.process.size: 4096m resources: taskManager: - # Number of TaskManager pods + # Number of TaskManager pods. replicas: 2 ``` @@ -128,7 +128,7 @@ for the full set of available properties. ## Step 5: Update observability -The Flink processor exposes Prometheus metrics directly +The Flink processor exposes Prometheus metrics directly from the Flink JobManager and TaskManager pods. See [Flink processor metrics]({{< relref "/integrate/redis-data-integration/observability#flink-processor-metrics" >}}) diff --git a/content/integrate/redis-data-integration/installation/upgrade.md b/content/integrate/redis-data-integration/installation/upgrade.md index 083c599e20..687c1467f4 100644 --- a/content/integrate/redis-data-integration/installation/upgrade.md +++ b/content/integrate/redis-data-integration/installation/upgrade.md @@ -153,20 +153,22 @@ you must adapt your `rdi-values.yaml` file to the following changes: ### Enabling the Flink processor The -[Apache Flink](https://flink.apache.org/)-based stream processor introduced -alongside the classic processor is opt-in. Upgrading the Helm chart does not -change the processor used by existing pipelines, which keep running on the -classic processor until you explicitly switch them by setting +[Apache Flink](https://flink.apache.org/)-based stream processor is +available after upgrading to RDI 1.18.0 or later. Once the upgrade +completes, it is always available — no Helm-level opt-in is required, and +the chart defaults are sized for typical workloads. Upgrading the Helm +chart does not change the processor used by existing pipelines, which keep +running on the classic processor until you explicitly switch them by +setting [`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}) to `flink` in their `config.yaml`. -To enable the Flink processor workloads on your cluster, add the -`operator.dataPlane.flinkProcessor` block to your `rdi-values.yaml` file -as described in -[Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}}), -and see -[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) -for the per-pipeline migration steps. +To override the Flink processor defaults, add an +`operator.dataPlane.flinkProcessor` block to your `rdi-values.yaml` file as +described in +[Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}}). +For the per-pipeline migration steps, see +[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}). ### Verifying the upgrade diff --git a/content/integrate/redis-data-integration/reference/data-transformation/cache.md b/content/integrate/redis-data-integration/reference/data-transformation/cache.md index 3c54e35fac..ee6cb4f4e3 100644 --- a/content/integrate/redis-data-integration/reference/data-transformation/cache.md +++ b/content/integrate/redis-data-integration/reference/data-transformation/cache.md @@ -8,12 +8,15 @@ categories: - rdi description: Cache the result of an expression or lookup group: di +hidden: true linkTitle: cache summary: Redis Data Integration keeps Redis in sync with the primary database in near real time. type: integration weight: 10 +_build: + list: never --- Cache the result of an expression or lookup. Caching avoids re-evaluating From d1bcd5cb432d2800494f527245e84323f10ed97f Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Fri, 8 May 2026 10:41:02 +0300 Subject: [PATCH 09/13] Add Flink collector back, update images --- .../architecture/classic-vs-flink.md | 4 +- .../installation/install-k8s.md | 26 ++-- .../reference/config-yaml-reference.md | 129 ++++++++++++++++-- 3 files changed, 140 insertions(+), 19 deletions(-) diff --git a/content/integrate/redis-data-integration/architecture/classic-vs-flink.md b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md index 041eb7b209..4dd4239ff9 100644 --- a/content/integrate/redis-data-integration/architecture/classic-vs-flink.md +++ b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md @@ -150,5 +150,5 @@ raises throughput. The Flink processor has a larger baseline memory footprint (JVM plus Flink runtime overhead per TaskManager) but, for most pipelines, the -performance gains and the additional features (caching, batching, -horizontal scaling, checkpointing) outweigh that cost. +performance gains and the additional features (horizontal scaling, caching) +outweigh that cost. diff --git a/content/integrate/redis-data-integration/installation/install-k8s.md b/content/integrate/redis-data-integration/installation/install-k8s.md index cd4369564c..c619618cf2 100644 --- a/content/integrate/redis-data-integration/installation/install-k8s.md +++ b/content/integrate/redis-data-integration/installation/install-k8s.md @@ -87,19 +87,29 @@ You need the following RDI images with tags matching the RDI version you want to - [redis/rdi-collector-api](https://hub.docker.com/r/redis/rdi-collector-api) - [redis/rdi-collector-initializer](https://hub.docker.com/r/redis/rdi-collector-initializer) -If you plan to use Spanner as a source for your pipeline, you’ll need an additional image.: [redis/rdi-flink-collector](https://hub.docker.com/r/redis/rdi-flink-collector). +If you plan to use the Flink processor for any of your pipelines, you'll also need: + +- [redis/rdi-flink-processor](https://hub.docker.com/r/redis/rdi-flink-processor) +- [redis/rdi-metrics-aggregator](https://hub.docker.com/r/redis/rdi-metrics-aggregator) + +If you plan to use the Flink processor exclusively, the `redis/rdi-processor` +and `redis/rdi-monitor` images are not required. + +If you plan to use Spanner as a source for your pipeline, you'll also need +[redis/rdi-flink-collector](https://hub.docker.com/r/redis/rdi-flink-collector). + +If you plan to use Snowflake as a source for any of your pipelines, you'll also need +[riotx/riotx:v1.8.0](https://hub.docker.com/r/riotx/riotx): +[RIOT-X](https://redis.github.io/riotx/), a data ingestion and replication tool for Redis. In addition, the RDI Helm chart uses the following 3rd party images: -- [redislabs/debezium-server:3.0.8.Final-rdi.1](https://hub.docker.com/r/redislabs/debezium-server), - based on `quay.io/debezium/server/3.0.8.Final` with minor modifications: +- [redislabs/debezium-server:3.5.0.Final-rdi.1](https://hub.docker.com/r/redislabs/debezium-server), + based on `quay.io/debezium/server/3.5.0.Final` with minor modifications: [Debezium](https://debezium.io/), an open source distributed platform for change data capture. -- [redis/reloader:v1.1.0](https://hub.docker.com/r/redis/reloader), originally `ghcr.io/stakater/reloader:v1.1.0`: - [Reloader](https://github.com/stakater/Reloader), a K8s controller to watch changes to ConfigMaps +- [redis/reloader:v1.4.13](https://hub.docker.com/r/redis/reloader), originally `ghcr.io/stakater/reloader:v1.4.13`: + [Reloader](https://github.com/stakater/Reloader), a K8s controller to watch changes to ConfigMaps and Secrets and do rolling upgrades. -- [redis/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6](https://hub.docker.com/r/redis/kube-webhook-certgen), - originally `registry.k8s.io/ingress-nginx/kube-webhook-certgen/v20221220-controller-v1.5.1-58-g787ea74b6`: - [kube-webhook-certgen](https://github.com/jet/kube-webhook-certgen), K8s webhook certificate generator and patcher. The example below shows how to specify the registry and image pull secret in your [`rdi-values.yaml`](#the-valuesyaml-file) file for the Helm chart: diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 0f30bee448..5795b3aa88 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -7,7 +7,6 @@ alwaysopen: false categories: ["redis-di"] aliases: --- -# Redis Data Integration Configuration File Configuration file for Redis Data Integration (RDI) source collectors and target connections. @@ -35,7 +34,7 @@ Source collectors that capture changes from upstream databases. Each key is a un |----|----|-----------|--------| |**connection**|||yes| |**name**
(Source name)|`string`|Human-readable name for the source collector. Maximum 100 characters.
Maximal Length: `100`
|no| -|**type**
(Collector type)|`string`|Type of the source collector. Use `cdc` (default) for change data capture using [Debezium](https://debezium.io/). Use `riotx` for Snowflake CDC using [RIOT-X](https://redis.github.io/riotx/).
Default: `"cdc"`
Enum: `"cdc"`, `"riotx"`
|yes| +|**type**
(Collector type)|`string`|Type of the source collector. Use `cdc` (default) for change data capture using [Debezium](https://debezium.io/). Use `flink` for Spanner change streams using the Apache Flink-based collector. Use `riotx` for Snowflake CDC using [RIOT-X](https://redis.github.io/riotx/).
Default: `"cdc"`
Enum: `"cdc"`, `"flink"`, `"riotx"`
|yes| |**active**
(Collector enabled)|`boolean`|When `true`, the collector runs; when `false`, the collector is disabled and produces no events.
Default: `true`
|no| |[**logging**](#sourceslogging)
(Logging configuration)|`object`|Logging settings for this source collector.
|no| |[**tables**](#sourcestables)
(Tables to capture)|`object`|Tables to capture from the source database, keyed by table name. The value configures column selection and key handling for that table.
|no| @@ -130,9 +129,10 @@ Advanced configuration that overrides the underlying engine's defaults. Only req |Name|Type|Description|Required| |----|----|-----------|--------| -|[**sink**](#sourcesadvancedsink)
(RDI Collector stream writer configuration)|`object`|Advanced configuration properties for the RDI Collector stream writer connection and behaviour. **Only applies to the `cdc` collector type.** See the full list of properties at [Debezium Server — Redis Stream sink](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream). When using a property from that page, omit the `debezium.sink.` prefix.
|| -|[**source**](#sourcesadvancedsource)
(Advanced source settings)|`object`|Advanced configuration properties for the source database connection and CDC behavior. **Only applies to the `cdc` collector type.** Available properties depend on the source database type — refer to the relevant Debezium connector documentation: [MySQL](https://debezium.io/documentation/reference/stable/connectors/mysql.html), [MariaDB](https://debezium.io/documentation/reference/stable/connectors/mariadb.html), [PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html), [Oracle](https://debezium.io/documentation/reference/stable/connectors/oracle.html), [SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html), [Db2](https://debezium.io/documentation/reference/stable/connectors/db2.html), [MongoDB](https://debezium.io/documentation/reference/stable/connectors/mongodb.html). When using a property from those pages, omit the `debezium.source.` prefix.
|| +|[**sink**](#sourcesadvancedsink)
(RDI Collector stream writer configuration)|`object`|Advanced configuration properties for the RDI Collector stream writer connection and behaviour. **Applies to the `cdc` and `flink` collector types.**
|| +|[**source**](#sourcesadvancedsource)
(Advanced source settings)|`object`|Advanced configuration properties for the source database connection and CDC behavior. **Applies to the `cdc` and `flink` collector types.**
|| |[**quarkus**](#sourcesadvancedquarkus)
(Quarkus runtime settings)|`object`|Advanced configuration properties for the Quarkus runtime that hosts Debezium Server. **Only applies to the `cdc` collector type.** See the [Debezium Server documentation](https://debezium.io/documentation/reference/stable/operations/debezium-server.html) for runtime configuration options. When using a property from that page, omit the `quarkus.` prefix.
|| +|[**flink**](#sourcesadvancedflink)
(Advanced Flink settings)|`object`|Advanced configuration properties forwarded to the Flink runtime that hosts the collector. Any property listed in the [Flink configuration documentation](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/config/) can be set here and will override the RDI default. **Only applies to the `flink` collector type.**
|| |[**resources**](#sourcesadvancedresources)
(Collector resource settings)|`object`|Compute resources allocated to the collector. **Only applies to the `cdc` collector type.**
|| |[**riotx**](#sourcesadvancedriotx)
(Advanced RIOT\-X settings)|`object`|Advanced configuration properties for the RIOT-X Snowflake collector. **Only applies to the `riotx` collector type.**
|| |**java\_options**
(Advanced Java options)|`string`|These Java options will be passed to the command line command when launching the source collector. **Only applies to the `cdc` collector type.**
|| @@ -142,9 +142,31 @@ Advanced configuration that overrides the underlying engine's defaults. Only req **Example** ```yaml -sink: {} -source: {} +sink: + redis.batch.size: 1000 + redis.flush.interval.ms: 100 + redis.connection.timeout.ms: 2000 + redis.socket.timeout.ms: 2000 + redis.retry.max.attempts: 5 + redis.retry.initial.delay.ms: 100 + redis.retry.max.delay.ms: 3000 + redis.retry.backoff.multiplier: 2 + redis.oom.retry.initial.delay.ms: 1000 + redis.oom.retry.max.delay.ms: 10000 + redis.oom.retry.backoff.multiplier: 2 + redis.wait.enabled: false + redis.wait.write.timeout.ms: 1000 + redis.wait.retry.enabled: false + redis.wait.retry.delay.ms: 1000 +source: + spanner.version.retention.period.hours: 1 + spanner.fetch.timeout.ms: 500 + spanner.fetch.heartbeat.ms: 100 + spanner.max.rows.per.partition: 10000 + spanner.dialect: GOOGLESQL quarkus: {} +flink: + taskmanager.numberOfTaskSlots: 1 resources: {} riotx: poll: 30s @@ -158,9 +180,29 @@ riotx: #### sources\.advanced\.sink: RDI Collector stream writer configuration -Advanced configuration properties for the RDI Collector stream writer connection and behaviour. **Only applies to the `cdc` collector type.** See the full list of properties at [Debezium Server — Redis Stream sink](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream). When using a property from that page, omit the `debezium.sink.` prefix. +Advanced configuration properties for the RDI Collector stream writer connection and behaviour. **Applies to the `cdc` and `flink` collector types.**

For the `cdc` collector type, see the full list of properties at [Debezium Server — Redis Stream sink](https://debezium.io/documentation/reference/stable/operations/debezium-server.html#_redis_stream). When using a property from that page, omit the `debezium.sink.` prefix.

**The properties listed below only apply to the `flink` collector type.** +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**redis\.batch\.size**
(Sink batch size)|`integer`|Maximum number of records the collector sink writes to Redis in a single batch.
Default: `1000`
Minimum: `1`
|| +|**redis\.flush\.interval\.ms**
(Sink flush interval)|`integer`|Maximum time in milliseconds the collector sink waits to fill a batch before flushing it to Redis.
Default: `100`
Minimum: `1`
|| +|**redis\.connection\.timeout\.ms**
(Sink connection timeout)|`integer`|Connection timeout in milliseconds for the target Redis client used by the collector sink.
Default: `2000`
Minimum: `1`
|| +|**redis\.socket\.timeout\.ms**
(Sink socket timeout)|`integer`|Socket read/write timeout in milliseconds for the target Redis client used by the collector sink.
Default: `2000`
Minimum: `1`
|| +|**redis\.retry\.max\.attempts**
(Sink retry max attempts)|`integer`|Maximum number of retry attempts for failed Redis operations.
Default: `5`
Minimum: `1`
|| +|**redis\.retry\.initial\.delay\.ms**
(Sink retry initial delay)|`integer`|Initial delay in milliseconds before the first retry of a failed Redis operation.
Default: `100`
Minimum: `1`
|| +|**redis\.retry\.max\.delay\.ms**
(Sink retry max delay)|`integer`|Maximum delay in milliseconds between retry attempts for failed Redis operations.
Default: `3000`
Minimum: `1`
|| +|**redis\.retry\.backoff\.multiplier**
(Sink retry backoff multiplier)|`number`|Exponential backoff multiplier between retry attempts for failed Redis operations.
Default: `2`
Minimum: `1`
|| +|**redis\.oom\.retry\.initial\.delay\.ms**
(Sink OOM retry initial delay)|`integer`|Initial delay in milliseconds before the first retry after a Redis out-of-memory error.
Default: `1000`
Minimum: `1`
|| +|**redis\.oom\.retry\.max\.delay\.ms**
(Sink OOM retry max delay)|`integer`|Maximum delay in milliseconds between retry attempts after a Redis out-of-memory error.
Default: `10000`
Minimum: `1`
|| +|**redis\.oom\.retry\.backoff\.multiplier**
(Sink OOM retry backoff multiplier)|`number`|Exponential backoff multiplier between retry attempts after a Redis out-of-memory error.
Default: `2`
Minimum: `1`
|| +|**redis\.wait\.enabled**
(Sink replica wait enabled)|`boolean`|When `true`, the collector verifies that each write has been replicated to the configured number of Redis replica shards before acknowledging it.
Default: `false`
|| +|**redis\.wait\.write\.timeout\.ms**
(Sink replica wait timeout)|`integer`|Maximum time in milliseconds to wait for replica write acknowledgements.
Default: `1000`
Minimum: `1`
|| +|**redis\.wait\.retry\.enabled**
(Sink replica wait retry enabled)|`boolean`|When `true`, the collector keeps retrying a write until replica acknowledgement succeeds; when `false`, it gives up after the first failure.
Default: `false`
|| +|**redis\.wait\.retry\.delay\.ms**
(Sink replica wait retry delay)|`integer`|Delay in milliseconds between replica wait retry attempts.
Default: `1000`
Minimum: `1`
|| + **Additional Properties** |Name|Type|Description|Required| @@ -168,11 +210,42 @@ Advanced configuration properties for the RDI Collector stream writer connection |**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 +**Example** + +```yaml +redis.batch.size: 1000 +redis.flush.interval.ms: 100 +redis.connection.timeout.ms: 2000 +redis.socket.timeout.ms: 2000 +redis.retry.max.attempts: 5 +redis.retry.initial.delay.ms: 100 +redis.retry.max.delay.ms: 3000 +redis.retry.backoff.multiplier: 2 +redis.oom.retry.initial.delay.ms: 1000 +redis.oom.retry.max.delay.ms: 10000 +redis.oom.retry.backoff.multiplier: 2 +redis.wait.enabled: false +redis.wait.write.timeout.ms: 1000 +redis.wait.retry.enabled: false +redis.wait.retry.delay.ms: 1000 + +``` + #### sources\.advanced\.source: Advanced source settings -Advanced configuration properties for the source database connection and CDC behavior. **Only applies to the `cdc` collector type.** Available properties depend on the source database type — refer to the relevant Debezium connector documentation: [MySQL](https://debezium.io/documentation/reference/stable/connectors/mysql.html), [MariaDB](https://debezium.io/documentation/reference/stable/connectors/mariadb.html), [PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html), [Oracle](https://debezium.io/documentation/reference/stable/connectors/oracle.html), [SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html), [Db2](https://debezium.io/documentation/reference/stable/connectors/db2.html), [MongoDB](https://debezium.io/documentation/reference/stable/connectors/mongodb.html). When using a property from those pages, omit the `debezium.source.` prefix. +Advanced configuration properties for the source database connection and CDC behavior. **Applies to the `cdc` and `flink` collector types.**

For the `cdc` collector type, available properties depend on the source database — refer to the relevant Debezium connector documentation: [MySQL](https://debezium.io/documentation/reference/stable/connectors/mysql.html), [MariaDB](https://debezium.io/documentation/reference/stable/connectors/mariadb.html), [PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html), [Oracle](https://debezium.io/documentation/reference/stable/connectors/oracle.html), [SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html), [Db2](https://debezium.io/documentation/reference/stable/connectors/db2.html), [MongoDB](https://debezium.io/documentation/reference/stable/connectors/mongodb.html). When using a property from those pages, omit the `debezium.source.` prefix.

**The properties listed below only apply to the `flink` collector type (Spanner).** + + +**Properties** +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**spanner\.version\.retention\.period\.hours**
(Spanner version retention period)|`integer`|Retention period in hours for Spanner change stream versions. Determines how far back the collector can resume after an outage.
Default: `1`
Minimum: `1`
|| +|**spanner\.fetch\.timeout\.ms**
(Spanner fetch timeout)|`integer`|Timeout in milliseconds for a single change stream fetch request to Spanner.
Default: `500`
Minimum: `1`
|| +|**spanner\.fetch\.heartbeat\.ms**
(Spanner fetch heartbeat interval)|`integer`|Interval in milliseconds at which Spanner sends heartbeat records when no data changes are available.
Default: `100`
Minimum: `1`
|| +|**spanner\.max\.rows\.per\.partition**
(Spanner max rows per partition)|`integer`|Maximum number of rows the collector reads from a single Spanner change stream partition before yielding.
Default: `10000`
Minimum: `1`
|| +|**spanner\.dialect**
(Spanner SQL dialect)|`string`|SQL dialect of the Spanner database. Use `GOOGLESQL` for Google Standard SQL or `POSTGRESQL` for the PostgreSQL interface.
Default: `"GOOGLESQL"`
Enum: `"GOOGLESQL"`, `"POSTGRESQL"`
|| **Additional Properties** @@ -181,6 +254,17 @@ Advanced configuration properties for the source database connection and CDC beh |**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 +**Example** + +```yaml +spanner.version.retention.period.hours: 1 +spanner.fetch.timeout.ms: 500 +spanner.fetch.heartbeat.ms: 100 +spanner.max.rows.per.partition: 10000 +spanner.dialect: GOOGLESQL + +``` + #### sources\.advanced\.quarkus: Quarkus runtime settings @@ -194,6 +278,34 @@ Advanced configuration properties for the Quarkus runtime that hosts Debezium Se |**Additional Properties**|`string`, `number`, `boolean`||| **Minimal Properties:** 1 + +#### sources\.advanced\.flink: Advanced Flink settings + +Advanced configuration properties forwarded to the Flink runtime that hosts the collector. Any property listed in the [Flink configuration documentation](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/config/) can be set here and will override the RDI default. **Only applies to the `flink` collector type.**

The properties listed below are the ones most likely to require adjustment. **Changing any other Flink property is not recommended unless instructed by Redis support.** + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**parallelism\.default**
(Default parallelism)|`integer`|Default parallelism for Flink jobs and operators. When unset, Flink uses the number of available task slots across all task managers (`taskManager.replicas × taskmanager.numberOfTaskSlots`). See [parallel execution](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/dev/datastream/execution/parallel/).
Minimum: `1`
|| +|**taskmanager\.numberOfTaskSlots**
(Task slots per task manager)|`integer`|Number of parallel task slots per task manager pod. Each slot can run one parallel pipeline instance, so this caps the parallelism a single task manager can absorb. See [task slots and resources](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/concepts/flink-architecture/#task-slots-and-resources).
Default: `1`
Minimum: `1`
|| +|**taskmanager\.memory\.process\.size**
(Task manager process memory)|`string`|Total memory budget for each task manager JVM process, expressed with a unit suffix such as `2048m` or `4g`. See [task manager memory configuration](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/memory/mem_setup_tm/).
|| + +**Additional Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**Additional Properties**|`string`, `number`, `boolean`||| + +**Minimal Properties:** 1 +**Example** + +```yaml +taskmanager.numberOfTaskSlots: 1 + +``` + #### sources\.advanced\.resources: Collector resource settings @@ -719,4 +831,3 @@ Optional metadata describing this pipeline, such as a display name and descripti |**description**
(Pipeline description)|`string`|Free-form description of what the pipeline does. Maximum 500 characters.
Maximal Length: `500`
|| **Additional Properties:** not allowed - From a2c3f046a6b93afd32325e7128ecb8e9d7b23f52 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Fri, 8 May 2026 11:32:02 +0300 Subject: [PATCH 10/13] Update 1.18 release notes --- .../redis-data-integration/release-notes/rdi-1-18-0.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md b/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md index af66292943..ab88a07965 100644 --- a/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md +++ b/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md @@ -6,6 +6,7 @@ categories: - operate - rs description: | + New Flink-based stream processor for Kubernetes deployments, with horizontal scaling, Flink checkpointing, and significantly higher throughput. Preview for Snowflake source support for Helm installations, including multi-schema capture and system truststore support. New API v2 endpoints for DLQ inspection, flushing the target database, and CDC-readiness validation. Better deployment reliability, new validation and resource controls, and security refreshes across core images. @@ -18,7 +19,11 @@ weight: 971 ### Compatibility Notes -- **`rdi-metrics-exporter` moved to the data plane**: The `rdi-metrics-exporter` is now deployed by the pipeline Helm chart (managed by the operator) instead of the main RDI Helm chart. Helm values previously under the top-level `rdiMetricsExporter:` block must be moved under `operator.dataPlane.metricsExporter:` in your custom values file. During the upgrade, there will be a brief (seconds) gap in Prometheus scraping that does not affect the data path. The exporter is not deployed for Flink-based pipelines. +- **`rdi-metrics-exporter` moved to the data plane**: The `rdi-metrics-exporter` is now deployed by the pipeline Helm chart (managed by the operator) instead of the main RDI Helm chart. Helm values previously under the top-level `rdiMetricsExporter:` block must be moved under `operator.dataPlane.metricsExporter:` in your custom values file. During the upgrade, there will be a brief (seconds) gap in Prometheus scraping that does not affect the data path. The exporter is not deployed for pipelines using the new Flink processor. + +### Flink Processor + +- **New Flink-based stream processor for Kubernetes**: RDI 1.18.0 introduces a new stream processor implementation built on [Apache Flink](https://flink.apache.org/) alongside the existing classic processor. The Flink processor delivers significantly higher snapshot throughput, lower end-to-end latency, horizontal scaling across TaskManager replicas and task slots, and Flink checkpointing on top of the same at-least-once delivery guarantees. We recommend using the Flink processor for new pipelines and migrating existing pipelines to it where possible. To enable it, set `processors.type: flink` in your pipeline configuration. The Flink processor is available for Kubernetes deployments only and currently supports the `hash` and `json` target data types. See [Classic vs. Flink processor]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}) for a full comparison and [Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) for the step-by-step migration guide. ### Snowflake and Source Integration From e26eb819b19884da61cc8ab671ee6f676d28bd63 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Fri, 8 May 2026 14:23:46 +0300 Subject: [PATCH 11/13] Fix source connection not being shown in the docs --- .../reference/config-yaml-reference.md | 219 +++++++++++++++++- 1 file changed, 217 insertions(+), 2 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 5795b3aa88..fa30fe2110 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -32,7 +32,7 @@ Source collectors that capture changes from upstream databases. Each key is a un |Name|Type|Description|Required| |----|----|-----------|--------| -|**connection**|||yes| +|[**connection**](#sourcesconnection)
(Source database connection)|`object`|Connection configuration for a non-Redis source database. The exact set of properties depends on the database type.
|yes| |**name**
(Source name)|`string`|Human-readable name for the source collector. Maximum 100 characters.
Maximal Length: `100`
|no| |**type**
(Collector type)|`string`|Type of the source collector. Use `cdc` (default) for change data capture using [Debezium](https://debezium.io/). Use `flink` for Spanner change streams using the Apache Flink-based collector. Use `riotx` for Snowflake CDC using [RIOT-X](https://redis.github.io/riotx/).
Default: `"cdc"`
Enum: `"cdc"`, `"flink"`, `"riotx"`
|yes| |**active**
(Collector enabled)|`boolean`|When `true`, the collector runs; when `false`, the collector is disabled and produces no events.
Default: `true`
|no| @@ -43,6 +43,221 @@ Source collectors that capture changes from upstream databases. Each key is a un |[**advanced**](#sourcesadvanced)
(Advanced configuration)|`object`|Advanced configuration that overrides the underlying engine's defaults. Only required for non-standard tuning.
|no| + +### sources\.connection: Source database connection + +Connection configuration for a non-Redis source database. The exact set of properties depends on the database type. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**SQL database**](#sourcesconnectionsqldatabase)
(SQL database)|`object`|Connection configuration for a supported SQL database.
|| +|[**MongoDB**](#sourcesconnectionmongodb)|`object`|Connection configuration for a MongoDB database.
|yes| +|[**Spanner**](#sourcesconnectionspanner)|`object`|Connection configuration for a Google Cloud Spanner database.
|yes| +|[**Snowflake**](#sourcesconnectionsnowflake)|`object`|Connection configuration for a Snowflake database.
|yes| + +**Example** + +```yaml +SQL database: + hr: + type: postgresql + host: localhost + port: 5432 + database: postgres + user: postgres + password: postgres +MongoDB: + mongodb-source: + type: mongodb + connection_string: mongodb://localhost:27017/?replicaSet=rs0 + user: debezium + password: dbz + database: db1,db2 +Spanner: + spanner-source: + type: spanner + project_id: example-12345 + instance_id: example + database_id: example + change_streams: + change_stream_all: + retention_period_hours: 24 +Snowflake: + snowflake: + type: snowflake + url: jdbc:snowflake://myaccount.snowflakecomputing.com/ + user: myuser + password: mypassword + database: MYDB + warehouse: COMPUTE_WH + +``` + + +#### sources\.connection\.SQL database: SQL database + +Connection configuration for a supported SQL database. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Database type)|`string`|SQL database engine.
Enum: `"mariadb"`, `"mysql"`, `"oracle"`, `"postgresql"`, `"sqlserver"`
|| +|**host**
(Database host)|`string`|Hostname or IP address of the SQL database server.
|| +|**port**
(Database port)|`integer`|Network port on which the SQL database server is listening.
Minimum: `1`
Maximum: `65535`
|| +|**database**
(Database name)|`string`|Name of the database to connect to.
|| +|**user**
(Database user)|`string`|Username for authentication to the SQL database.
|| +|**password**
(Database password)|`string`|Password for authentication to the SQL database.
|| + +**Additional Properties:** not allowed +**Example** + +```yaml +hr: + type: postgresql + host: localhost + port: 5432 + database: postgres + user: postgres + password: postgres + +``` + +**Example** + +```yaml +my-oracle: + type: oracle + host: 172.17.0.4 + port: 1521 + user: c##dbzuser + password: dbz + +``` + + +#### sources\.connection\.MongoDB: MongoDB + +Connection configuration for a MongoDB database. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Database type)|`string`|Database type identifier. Always `mongodb` for this connection.
Constant Value: `"mongodb"`
|yes| +|**connection\_string**|`string`|MongoDB connection URI including host, port, and any connection options.
|yes| +|**user**
(MongoDB user)|`string`|Username for authentication to MongoDB.
|no| +|**password**
(MongoDB password)|`string`|Password for authentication to MongoDB.
|no| +|**database**
(MongoDB databases)|`string`|Comma-separated list of MongoDB databases to monitor.
|no| + +**Additional Properties:** not allowed +**Example** + +```yaml +mongodb-source: + type: mongodb + connection_string: mongodb://localhost:27017/?replicaSet=rs0 + user: debezium + password: dbz + database: db1,db2 + +``` + + +#### sources\.connection\.Spanner: Spanner + +Connection configuration for a Google Cloud Spanner database. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Database type)|`string`|Database type identifier. Always `spanner` for this connection.
Constant Value: `"spanner"`
|yes| +|**project\_id**
(Spanner project ID)|`string`|Google Cloud project ID that hosts the Spanner instance.
|yes| +|**instance\_id**
(Spanner instance ID)|`string`|Spanner instance identifier within the project.
|yes| +|**database\_id**
(Spanner database ID)|`string`|Spanner database identifier within the instance.
|yes| +|**emulator\_host**
(Spanner emulator host)|`string`|Host and port of the Spanner emulator. Used for local development; leave unset against real Spanner.
|no| +|**use\_credentials\_file**|`boolean`|When `true`, RDI authenticates using a service account credentials file; when `false`, it uses application default credentials.
Default: `false`
|no| +|[**change\_streams**](#sourcesconnectionspannerchange_streams)
(Change streams configuration)|`object`|Spanner change streams to capture, keyed by change stream name.
|yes| + +**Additional Properties:** not allowed +**Example** + +```yaml +spanner-source: + type: spanner + project_id: example-12345 + instance_id: example + database_id: example + change_streams: + change_stream_all: + retention_period_hours: 24 + +``` + + +##### sources\.connection\.Spanner\.change\_streams: Change streams configuration + +Spanner change streams to capture, keyed by change stream name. + + +**Additional Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|[**Additional Properties**](#sourcesconnectionspannerchange_streamsadditionalproperties)|`object`, `null`||| + +**Minimal Properties:** 1 + +###### sources\.connection\.Spanner\.change\_streams\.additionalProperties: object,null + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**retention\_period\_hours**
(Change stream retention period hours)|`integer`, `string`|Retention period for the change stream, in hours.
Pattern: `^\${.*}$`
Minimum: `1`
|| + +**Additional Properties:** not allowed + +#### sources\.connection\.Snowflake: Snowflake + +Connection configuration for a Snowflake database. + + +**Properties** + +|Name|Type|Description|Required| +|----|----|-----------|--------| +|**type**
(Database type)|`string`|Database type identifier. Always `snowflake` for this connection.
Constant Value: `"snowflake"`
|yes| +|**url**
(JDBC URL)|`string`|Snowflake JDBC connection URL, for example `jdbc:snowflake://account.snowflakecomputing.com/`.
|yes| +|**user**
(Snowflake user)|`string`|Username for authentication to Snowflake.
|yes| +|**password**
(Snowflake password)|`string`|Password for authentication to Snowflake. For key-pair authentication, omit this field and provide the private key via the `source-db-ssl` secret (`client.key` field).
|no| +|**database**
(Snowflake database)|`string`|Name of the Snowflake database to connect to.
|yes| +|**warehouse**
(Snowflake warehouse)|`string`|Name of the Snowflake warehouse used for compute.
|yes| +|**role**
(Snowflake role)|`string`|Snowflake role used for the connection.
|no| +|**cdcDatabase**
(CDC database)|`string`|Database hosting the CDC streams. Defaults to the main `database` if not set.
|no| +|**cdcSchema**
(CDC schema)|`string`|Schema hosting the CDC streams. Defaults to the main schema if not set.
|no| + +**Additional Properties:** not allowed +**Example** + +```yaml +snowflake: + type: snowflake + url: jdbc:snowflake://myaccount.snowflakecomputing.com/ + user: myuser + password: mypassword + database: MYDB + warehouse: COMPUTE_WH + +``` + ### sources\.logging: Logging configuration @@ -384,7 +599,7 @@ Connection configuration for a Redis database. |----|----|-----------|--------| |**type**
(Database type)||Database type identifier. Always `redis` for this connection.
Constant Value: `"redis"`
|yes| |**host**
(Database host)|`string`|Hostname or IP address of the Redis server.
|yes| -|**port**
(Database port)||Network port on which the Redis server is listening.
|yes| +|**port**
(Database port)|`integer`|Network port on which the Redis server is listening.
Minimum: `1`
Maximum: `65535`
|yes| |**user**
(Database user)|`string`|Username for authentication to the Redis database.
|no| |**password**
(Database password)|`string`|Password for authentication to the Redis database.
|no| |**key**
(Private key file)|`string`|Path to the private key file used for SSL/TLS client authentication.
|no| From c2974af33ab01a3df983520facaf8a1189eef777 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Mon, 11 May 2026 15:01:50 +0300 Subject: [PATCH 12/13] Add Preview, update faq.md --- content/embeds/rdi-when-to-use.md | 14 +++++------ .../integrate/redis-data-integration/faq.md | 25 +++++++++++-------- .../release-notes/rdi-1-18-0.md | 4 +-- 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/content/embeds/rdi-when-to-use.md b/content/embeds/rdi-when-to-use.md index f5421d4d5c..8f2b6911be 100644 --- a/content/embeds/rdi-when-to-use.md +++ b/content/embeds/rdi-when-to-use.md @@ -9,13 +9,13 @@ RDI is a good fit when: - Your app can tolerate *eventual* consistency of data in the Redis cache. - You want a self-managed solution or AWS based solution. - The source data changes frequently in small increments. -- There are no more than 10K changes per second in the source database. -- RDI throughput during - [full sync]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) would not exceed 30K records per second (for an average 1KB record size) and during - [CDC]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) - would not exceed 10K records per second (for an average 1KB record size). -- The total data size is not larger than 100GB (since this would typically exceed the throughput - limits just mentioned for full sync). +- The source database has no more than 20K changes per second. +- RDI throughput during [full sync]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) + stays below 60K records per second (assuming an average record size of 1KB and a pipeline without transformations). +- RDI throughput during [CDC]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) + stays below 20K records per second (assuming an average record size of 1KB and a pipeline without transformations). +- The total data size is no larger than 200GB, so a full sync completes in under an hour without exceeding the throughput + limits above. - You don’t need to perform join operations on the data from several tables into a [nested Redis JSON object]({{< relref "/integrate/redis-data-integration/data-pipelines/data-denormalization#joining-one-to-many-relationships" >}}). - RDI supports the [data transformations]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples" >}}) you need for your app. diff --git a/content/integrate/redis-data-integration/faq.md b/content/integrate/redis-data-integration/faq.md index 987c1e310b..ef34ece0db 100644 --- a/content/integrate/redis-data-integration/faq.md +++ b/content/integrate/redis-data-integration/faq.md @@ -117,16 +117,21 @@ This option is available in RDI 1.16.2 and later. ## Which processor should I use? {#which-processor-should-i-use} RDI ships with two stream processor implementations: the *classic* -processor and the *Flink* processor. The Flink processor is available -in RDI 1.18.0 and later. We strongly recommend using the Flink -processor for new pipelines and migrating existing pipelines to it -where possible. The Flink processor delivers significantly higher -snapshot throughput, lower end-to-end latency, horizontal scaling, -and Flink checkpointing on top of the same at-least-once delivery -guarantees as the classic processor. - -Use the classic processor only if your pipeline depends on a feature -that the Flink processor does not yet support: +processor and the *Flink* processor. The classic processor is the +production-supported default. The Flink processor was introduced in +RDI 1.18.0 as a **Preview** and is not yet supported for production +use; we encourage you to try it on new, non-production pipelines and +share feedback so we can prioritize improvements before general +availability. Regular preview terms apply. + +The Flink processor delivers significantly higher snapshot throughput, +lower end-to-end latency, horizontal scaling, and Flink checkpointing +on top of the same at-least-once delivery guarantees as the classic +processor. + +Continue to use the classic processor for production pipelines, and +in any of the following cases where the Flink processor does not yet +apply: - **Output `data_type` other than `hash` or `json`** (for example, `set`, `sorted_set`, `stream`, or `string`). diff --git a/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md b/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md index ab88a07965..391c0d220f 100644 --- a/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md +++ b/content/integrate/redis-data-integration/release-notes/rdi-1-18-0.md @@ -6,7 +6,7 @@ categories: - operate - rs description: | - New Flink-based stream processor for Kubernetes deployments, with horizontal scaling, Flink checkpointing, and significantly higher throughput. + New Flink-based stream processor for Kubernetes deployments (Preview), with horizontal scaling, Flink checkpointing, and significantly higher throughput. Preview for Snowflake source support for Helm installations, including multi-schema capture and system truststore support. New API v2 endpoints for DLQ inspection, flushing the target database, and CDC-readiness validation. Better deployment reliability, new validation and resource controls, and security refreshes across core images. @@ -23,7 +23,7 @@ weight: 971 ### Flink Processor -- **New Flink-based stream processor for Kubernetes**: RDI 1.18.0 introduces a new stream processor implementation built on [Apache Flink](https://flink.apache.org/) alongside the existing classic processor. The Flink processor delivers significantly higher snapshot throughput, lower end-to-end latency, horizontal scaling across TaskManager replicas and task slots, and Flink checkpointing on top of the same at-least-once delivery guarantees. We recommend using the Flink processor for new pipelines and migrating existing pipelines to it where possible. To enable it, set `processors.type: flink` in your pipeline configuration. The Flink processor is available for Kubernetes deployments only and currently supports the `hash` and `json` target data types. See [Classic vs. Flink processor]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}) for a full comparison and [Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) for the step-by-step migration guide. +- **New Flink-based stream processor for Kubernetes (Preview)**: RDI 1.18.0 introduces a new stream processor implementation built on [Apache Flink](https://flink.apache.org/) alongside the existing classic processor. The Flink processor delivers significantly higher snapshot throughput, lower end-to-end latency, horizontal scaling across TaskManager replicas and task slots, and Flink checkpointing on top of the same at-least-once delivery guarantees. The Flink processor is available as a Preview and is not yet supported for production use; we encourage you to try it on new, non-production pipelines and share feedback so we can prioritize improvements before general availability. Regular preview terms apply. To enable it, set `processors.type: flink` in your pipeline configuration. The Flink processor is available for Kubernetes deployments only and currently supports the `hash` and `json` target data types. See [Classic vs. Flink processor]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}) for a full comparison and [Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}}) for the step-by-step migration guide. ### Snowflake and Source Integration From b69be3e0a7550c8f208b25ddd1eb5d414c705982 Mon Sep 17 00:00:00 2001 From: Stoyan Rachev Date: Mon, 11 May 2026 15:14:45 +0300 Subject: [PATCH 13/13] Update rdi-when-to-use.md --- content/embeds/rdi-when-to-use.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/content/embeds/rdi-when-to-use.md b/content/embeds/rdi-when-to-use.md index 8f2b6911be..cec8ffa371 100644 --- a/content/embeds/rdi-when-to-use.md +++ b/content/embeds/rdi-when-to-use.md @@ -9,12 +9,12 @@ RDI is a good fit when: - Your app can tolerate *eventual* consistency of data in the Redis cache. - You want a self-managed solution or AWS based solution. - The source data changes frequently in small increments. -- The source database has no more than 20K changes per second. +- The source database has no more than 10K changes per second. - RDI throughput during [full sync]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) - stays below 60K records per second (assuming an average record size of 1KB and a pipeline without transformations). + stays below 30K records per second, assuming an average record size of 1KB and a pipeline without transformations. - RDI throughput during [CDC]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) - stays below 20K records per second (assuming an average record size of 1KB and a pipeline without transformations). -- The total data size is no larger than 200GB, so a full sync completes in under an hour without exceeding the throughput + stays below 10K records per second, assuming an average record size of 1KB and a pipeline without transformations. +- The total data size is no larger than 100GB, so a full sync completes in under an hour without exceeding the throughput limits above. - You don’t need to perform join operations on the data from several tables into a [nested Redis JSON object]({{< relref "/integrate/redis-data-integration/data-pipelines/data-denormalization#joining-one-to-many-relationships" >}}). @@ -23,6 +23,10 @@ RDI is a good fit when: - Your database administrator has reviewed RDI's requirements for the source database and confirmed that they are acceptable. +{{< note >}}The throughput and data-size limits above assume the +[classic processor]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}). +The Flink processor (currently in Preview) roughly doubles each limit.{{< /note >}} + ### When not to use RDI RDI is not a good fit when: