redis · andy-stark-redis · May 11, 2026 · May 5, 2026 · May 5, 2026 · May 7, 2026
diff --git a/content/embeds/rdi-when-to-use.md b/content/embeds/rdi-when-to-use.md
@@ -9,20 +9,24 @@ RDI is a good fit when:
 - Your app can tolerate *eventual* consistency of data in the Redis cache.
 - You want a self-managed solution or AWS based solution.
 - The source data changes frequently in small increments.
-- There are no more than 10K changes per second in the source database.
-- RDI throughput during
-  [full sync]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) would not exceed 30K records per second (for an average 1KB record size) and during
-  [CDC]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}})
-  would not exceed 10K records per second (for an average 1KB record size).
-- The total data size is not larger than 100GB (since this would typically exceed the throughput 
-  limits just mentioned for full sync).
+- The source database has no more than 10K changes per second.
+- RDI throughput during [full sync]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}})
+  stays below 30K records per second, assuming an average record size of 1KB and a pipeline without transformations.
+- RDI throughput during [CDC]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}})
+  stays below 10K records per second, assuming an average record size of 1KB and a pipeline without transformations.
+- The total data size is no larger than 100GB, so a full sync completes in under an hour without exceeding the throughput
+  limits above.
 - You don’t need to perform join operations on the data from several tables
   into a [nested Redis JSON object]({{< relref "/integrate/redis-data-integration/data-pipelines/data-denormalization#joining-one-to-many-relationships" >}}).
 - RDI supports the [data transformations]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples" >}}) you need for your app.
 - Your data caching needs are too complex or demanding to implement and maintain yourself.
 - Your database administrator has reviewed RDI's requirements for the source database and
   confirmed that they are acceptable.
 
+{{< note >}}The throughput and data-size limits above assume the
+[classic processor]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}}).
+The Flink processor (currently in Preview) roughly doubles each limit.{{< /note >}}
+
 ### When not to use RDI
 
 RDI is not a good fit when:

diff --git a/...te/redis-data-integration/architecture.md → ...s-data-integration/architecture/_index.md b/...te/redis-data-integration/architecture.md → ...s-data-integration/architecture/_index.md
@@ -10,6 +10,7 @@ categories:
 description: Discover the main components of RDI
 group: di
 headerRange: '[2]'
+hideListLinks: false
 linkTitle: Architecture
 summary: Redis Data Integration keeps Redis in sync with the primary database in near
   real time.
@@ -127,29 +128,45 @@ It includes:
     and exports them as [Prometheus](https://prometheus.io/) metrics.
 
 The *data plane* contains the processes that actually move the data.
-It includes the *CDC collector* and the *stream processor* that implement 
+It includes the *CDC collector* and the *stream processor* that implement
 the two phases of the pipeline lifecycle (initial cache loading and change streaming).
 
 The *management plane* provides tools that let you interact
-with the control plane. 
+with the control plane.
 
--   Use the CLI tool to install and administer RDI and to deploy 
-    and manage a pipeline. 
--   Use the pipeline editor included in Redis Insight to design 
+-   Use the CLI tool to install and administer RDI and to deploy
+    and manage a pipeline.
+-   Use the pipeline editor included in Redis Insight to design
     or edit a pipeline.
-    
+
 The diagram below shows all RDI components and the interactions between them:
 
 {{< image filename="images/rdi/ingest/ingest-control-plane.webp" >}}
 
+## Stream processor implementations
+
+RDI provides two implementations of the stream processor, *classic* and
+*Flink*. You select the implementation per pipeline through the
+[`processors.type`]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}})
+property in `config.yaml`. The default is `classic`, so existing pipelines
+keep their behavior unchanged.
+
+See
+[Differences between the classic and Flink processors]({{< relref "/integrate/redis-data-integration/architecture/classic-vs-flink" >}})
+for a side-by-side comparison and
+[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}})
+for guidance on migrating an existing pipeline to the Flink processor.
+
+## VM and Kubernetes deployments
+
 The following sections describe the VM configurations you can use to
 deploy RDI.
 
 ### RDI on your own VMs
 
-For this deployment, you must provide two VMs. The collector and stream processor 
-are active on one VM, while on the other they are in standby to provide high availability. 
-The two operators running on both VMs use a leader election algorithm to decide which 
+For this deployment, you must provide two VMs. The collector and stream processor
+are active on one VM, while on the other they are in standby to provide high availability.
+The two operators running on both VMs use a leader election algorithm to decide which
 VM is the active one (the "leader").
 The diagram below shows this configuration:
 
@@ -166,25 +183,25 @@ on [Kubernetes (K8s)](https://kubernetes.io/), including Red Hat
 
 -   A K8s [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) named `rdi`.
     You can also use a different namespace name if you prefer.
--   [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and 
-    [services](https://kubernetes.io/docs/concepts/services-networking/service/) for the 
+-   [Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and
+    [services](https://kubernetes.io/docs/concepts/services-networking/service/) for the
     [RDI operator]({{< relref "/integrate/redis-data-integration/architecture#how-rdi-is-deployed" >}}),
     [metrics exporter]({{< relref "/integrate/redis-data-integration/observability" >}}), and API server.
--   A [service account](https://kubernetes.io/docs/concepts/security/service-accounts/) 
+-   A [service account](https://kubernetes.io/docs/concepts/security/service-accounts/)
     and [RBAC resources](https://kubernetes.io/docs/reference/access-authn-authz/rbac) for the RDI operator.
 -   A [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) with RDI database details.
 -   [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/)
     with the RDI database credentials and TLS certificates.
--   Other optional K8s resources such as [ingresses](https://kubernetes.io/docs/concepts/services-networking/ingress/) 
+-   Other optional K8s resources such as [ingresses](https://kubernetes.io/docs/concepts/services-networking/ingress/)
     that can be enabled depending on your K8s environment and needs.
 
 See [Install on Kubernetes]({{< relref "/integrate/redis-data-integration/installation/install-k8s" >}})
 for more information.
 
 ### Secrets and security considerations
 
-The credentials for the database connections, as well as the certificates 
+The credentials for the database connections, as well as the certificates
 for [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) and
-[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) are saved in K8s secrets. 
+[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) are saved in K8s secrets.
 RDI stores all state and configuration data inside the Redis Enterprise cluster
 and does not store any other data on your RDI VMs or anywhere else outside the cluster.
diff --git a/content/integrate/redis-data-integration/architecture/classic-vs-flink.md b/content/integrate/redis-data-integration/architecture/classic-vs-flink.md
@@ -0,0 +1,154 @@
+---
+Title: Differences between the classic and Flink processors
+alwaysopen: false
+categories:
+- docs
+- integrate
+- rs
+- rdi
+description: Compare the classic and Flink stream processor implementations.
+group: di
+linkTitle: Classic vs. Flink processor
+summary: Redis Data Integration keeps Redis in sync with the primary database in near
+  real time.
+type: integration
+weight: 10
+---
+
+RDI ships with two stream processor implementations. Both consume the same
+source streams, share the same job-level configuration model, and write to
+the same Redis target, but they differ in architecture, supported features,
+configuration, observability, error handling, and performance.
+
+This page summarizes those differences. See
+[Which processor should I use?]({{< relref "/integrate/redis-data-integration/faq#which-processor-should-i-use" >}})
+in the FAQ for the recommendation, and
+[Migrate from the classic processor to the Flink processor]({{< relref "/integrate/redis-data-integration/installation/migration-classic-to-flink" >}})
+for a step-by-step migration guide.
+
+## At a glance
+
+| Aspect | Classic processor | Flink processor |
+|---|---|---|
+| Implementation | Python | Java on top of [Apache Flink](https://flink.apache.org/) |
+| Deployment targets | VM and Kubernetes | Kubernetes only |
+| Scaling | Single replica | Horizontal: TaskManager replicas × task slots per TaskManager |
+| Fault tolerance | Source-stream consumer-group replay | Source-stream consumer-group replay plus Flink checkpointing |
+| Supported `data_type` outputs | `hash`, `json`, `set`, `sorted_set`, `stream`, `string` | `hash`, `json` |
+| Metrics endpoint | `rdi-metrics-exporter` service | Flink JobManager `/metrics` (no metrics exporter) |
+| Metric naming | `rdi_*` (e.g., `rdi_incoming_entries`) | `flink_*` (e.g., `flink_jobmanager_job_operator_coordinator_stream_type_rdiRecords`) |
+| End-to-end latency | Bounded by the per-batch read-process-write cycle | Records flow through pipelined operator chains without a per-batch barrier |
+| Snapshot throughput | Limited by single shared reader and writer | Parallelized across all task slots |
+| Expression and `redis.lookup` result caching | Not supported | Optional, opt-in per transformation |
+
+## Architecture and deployment
+
+The classic processor runs as a single pod managed by the operator
+and can be deployed on either VMs or Kubernetes through the RDI Helm
+chart.
+
+The Flink processor runs as an Apache Flink application cluster: one
+JobManager pod plus one or more TaskManager pods. Source,
+transformation, and sink operators run as parallel subtasks across
+all task slots in the cluster. The Flink processor scales
+horizontally by changing the number of TaskManager replicas
+(`advanced.resources.taskManager.replicas`); with adaptive
+parallelism, the default parallelism is the product of TaskManager
+replicas and task slots per TaskManager. The Flink processor
+currently runs on Kubernetes only; VM support is planned for a future
+release.
+
+Both processors retain at-least-once delivery semantics; the Flink
+processor adds Flink checkpointing on top of the shared
+consumer-group replay mechanism.
+
+See
+[Configure the Flink processor]({{< relref "/integrate/redis-data-integration/installation/install-k8s#configure-the-flink-processor" >}})
+for the Helm settings.
+
+## Configuration
+
+The two processors share the same `config.yaml` envelope and the same
+`connections`, `sources`, `targets`, and `jobs` sections. The only
+differences are inside the `processors:` block, which is selected via
+`processors.type` (`classic` or `flink`, default `classic`). Properties
+that apply to only one implementation are annotated with
+**Classic processor only.** or **Flink processor only.** in the
+[pipeline configuration reference]({{< relref "/integrate/redis-data-integration/data-pipelines/pipeline-config#processors" >}}),
+and are silently ignored by the other implementation. The Flink
+processor exposes additional fine-grained tuning under
+`processors.advanced.*`.
+
+## Supported output formats
+
+The classic processor supports all `data_type` values: `hash`, `json`,
+`set`, `sorted_set`, `stream`, and `string`. The Flink processor
+currently supports only `hash` and `json`. Pipelines that use any other
+output type must remain on the classic processor or rewrite the
+affected jobs. Support for the remaining output types is planned for a
+future release.
+
+## Transformation extensions
+
+The two processors support the same set of transformation blocks
+(`filter`, `map`, `add_field`, `remove_field`, `rename_field`,
+`redis.lookup`) and the same expression languages (JMESPath and SQL).
+Pipelines written for one processor generally execute on the other
+without changes.
+
+The Flink processor adds three optional, performance-oriented
+extensions that are not available with the classic processor:
+
+-   **Expression result caching** through a per-expression `cache:`
+    block on `filter`, `map`, `add_field`, and `redis.lookup` arguments.
+-   **`redis.lookup` result caching** through a `lookup_cache:` block.
+-   **`redis.lookup` batching**, which groups lookups into a single
+    Redis pipeline. Batching is enabled by default with sensible
+    defaults; the optional `batch:` block lets you override them.
+
+See
+[Caching expression results]({{< relref "/integrate/redis-data-integration/data-pipelines/transform-examples/caching-expression-results" >}})
+for examples and
+[`redis.lookup`]({{< relref "/integrate/redis-data-integration/reference/data-transformation/lookup" >}})
+for the full property list.
+
+## Metrics
+
+The two processors expose different Prometheus metric sets and use
+different naming schemes, so dashboards and alerts cannot be reused
+as-is between them. The classic processor exposes its metrics through
+the `rdi-metrics-exporter` service. The Flink processor emits metrics
+directly from the JobManager and TaskManager pods through Flink's
+native Prometheus reporter; no metrics exporter is deployed.
+
+See
+[Observability — Flink processor metrics]({{< relref "/integrate/redis-data-integration/observability#flink-processor-metrics" >}})
+for the customer-facing list of metrics.
+
+## Error handling and DLQ
+
+Both processors implement a dead-letter queue (DLQ) at
+`dlq:{stream_name}` and honor the same top-level `error_handling`
+(`dlq` or `ignore`) and `dlq_max_messages` properties. The Flink
+processor surfaces a few corner cases as DLQ entries that the classic
+processor logs and skips (for example, missing parent
+keys in nested writes and exceptions thrown by `when` expressions on
+`redis.lookup`). The DLQ entry field set and value encoding also
+differ: the classic processor uses Python-stringified values,
+while the Flink processor uses JSON.
+
+## Performance
+
+The Flink processor delivers significantly higher throughput during
+the initial snapshot and lower end-to-end latency in steady state.
+The classic processor uses a sequential read-process-write batching
+cycle, so each record waits for its batch to complete before being
+written to the target. The Flink processor pipelines records through
+operator chains without a per-batch barrier, and parallelizes work
+across all task slots, which both lowers per-record latency and
+raises throughput.
+
+The Flink processor has a larger baseline memory footprint (JVM plus
+Flink runtime overhead per TaskManager) but, for most pipelines, the
+performance gains and the additional features (horizontal scaling, caching)
+outweigh that cost.