JiT Worker in CLI and RPC infrastructure by rafal-hawrylak · Pull Request #2109 · dataform-co/dataform

rafal-hawrylak · 2026-03-09T19:43:35Z

This change establishes the foundation for executing JiT compilation in an isolated environment. It introduces the worker process management, the RPC bridge for database access during JiT, and the necessary Bazel targets.

cli/api/commands/base_worker.ts

cli/api/commands/jit_compile_child_process.ts

cli/api/commands/jit/compiler.ts

cli/api/commands/jit_compile_child_process.ts

cli/api/commands/base_worker.ts

cli/api/utils/constants.ts

cli/vm/jit_worker.ts

cli/api/commands/jit/rpc_test.ts

kolina · 2026-03-10T10:46:21Z

cli/api/commands/jit/rpc.ts

@@ -0,0 +1,101 @@
+import { IDbAdapter, IDbClient } from "df/cli/api/dbadapters";


To clarify: is this protobuf bridge protocol the same for GCP and CLI implementations?

Both the CLI and GCP implementations follow the dataform.DbAdapter service defined in jit.proto.

They both use the same binary "payloads" for the actual data (e.g., ExecuteRequest, ExecuteResponse).

In CLI all 4 methods are implemented (Execute, ListTables, GetTable, and DeleteTable) and in GCP only Execute as of now (https://source.corp.google.com/piper///depot/google3/cloud/dataform/compilation/jitrunner/worker/engine.cc;rcl=881961593;l=163)

See my comment below on the issue with "Execute" compatibility

PTAL on the approach with executeRaw

apilaskowski · 2026-03-11T14:07:55Z

cli/api/commands/jit_compile_child_process.ts

+  }
+
+  constructor() {
+    super("../../vm/jit_loader");


I wonder if it is possible to avoid such hard-coded links. I can imagine that if we create jit directory for jit related actions it will fail.

protos/jit.proto

ikholopov-omni · 2026-03-11T19:17:36Z

cli/api/commands/jit/rpc.ts

+  return new Uint8Array();
+}
+
+function mapRowToProto(row: { [key: string]: any }): google.protobuf.IStruct {


jit.proto declares that struct contains:

// Rows. For BigQuery, see // https://docs.cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults.

In other words, right now we expect a raw API result of "f,v" JSON struct, not bespoke conversion.

The BigQuery client for Node is strange in respect for this - it forcefully decodes those and removes rows from the original request. We could either:
a) Use googleapis package client instead for JiT
b) Come up with another protocol for encoding rows and implement it both here and in GCP. Let me know on chat if you need code pointers for GCP part.

PTAL on the approach with executeRaw

It has exactly the same problem. query() inside rawExecute already removes rows from response and only returns decoded ones as first component in the tuple [rows, _, response]

- Enhance error coercion in common/errors/errors.ts - Sync protos/execution.proto and protos/jit.proto with new fields

* feat: add worker process management for JiT compilation - Introduce base worker and JiT-specific child process logic - Implement RPC bridge for database access during JiT - Add VM scripts and loader for isolated execution - Add unit tests for the RPC mechanism

kolina · 2026-03-13T19:59:25Z

cli/api/commands/jit/rpc.ts

+    bigquery: {
+      labels: {
+        ...(options?.labels || {}),
+        ...(requestOptions?.labels || {})
+      },
+      location: requestOptions?.location || options?.location,
+      jobPrefix: [options?.jobPrefix, requestOptions?.jobPrefix].filter(Boolean).join("-") || undefined,
+      dryRun: !!(options?.dryRun || requestOptions?.dryRun)
+    }


I think it should be a consistent order of what takes priority here.

Also we'd need to update this logic every time a new option is added to IBigQueryExecutionOptions? I'd suggest using the full value of options by default (and overwrite with requestOptions where it makes sense)

kolina · 2026-03-13T20:09:44Z

cli/api/commands/jit/rpc.ts

+  const targets = await dbadapter.tables();
+  const tablesMetadata = await Promise.all(
+    targets
+      .filter(target => !listTablesRequest.schema || target.schema === listTablesRequest.schema)


The interface in CLI has only parameterless method:
tables(): Promise<dataform.ITarget[]>;

Will it be working correctly then? JiT request of ListTablesRequest can have different values of database, it may not be the same as whatever project is used by dbadapter.tables() to list tables in.

kolina · 2026-03-13T20:16:54Z

cli/api/commands/jit/rpc.ts

+  const getTableRequest = dataform.GetTableRequest.decode(request);
+  const tableMetadata = await dbadapter.table(getTableRequest.target);
+  if (!tableMetadata) {
+    return dataform.TableMetadata.encode(dataform.TableMetadata.create({})).finish();


I'd rather rethrow NOT_FOUND back to the compiler in such cases

kolina · 2026-03-13T23:07:12Z

cli/api/dbadapters/bigquery.ts

+            location: options.bigquery?.location,
+            maxResults: options.rowLimit,
+            useLegacySql: false,
+            labels: options.bigquery?.labels,
+            jobPrefix: options.bigquery?.jobPrefix,
+            dryRun: options.bigquery?.dryRun


Have a common logic how we set options for BQ job calls?

kolina · 2026-03-13T23:17:46Z

cli/api/dbadapters/bigquery.ts

+                  const [, , apiResponse] = (await job[0].getQueryResults({
+                    maxResults: rowLimit,
+                    location
+                  })) as any;


Not sure that it's correct way to generate one more API request here

kolina · 2026-03-14T00:06:23Z

cli/vm/jit_worker.ts

+
+    const requestMessage = dataform.JitCompilationRequest.fromObject(request);
+    const requestBytes = dataform.JitCompilationRequest.encode(requestMessage).finish();
+    const requestBase64 = Buffer.from(requestBytes).toString("base64");


Do we need to do base64 encoding here?

kolina · 2026-03-14T00:21:30Z

cli/vm/jit_worker.ts

+        },
+        root: projectDir,
+        mock: hasProjectLocalCore ? {} : {
+          "@dataform/core": require("@dataform/core")


I don't think we have such a fallback for regular compilation?

kolina · 2026-03-14T00:39:23Z

cli/vm/jit_worker.ts

+        const requestBytes = new Uint8Array(Buffer.from(requestBase64, "base64"));
+
+        const internalRpcCallback = (method, reqBytes, callback) => {
+           const reqBase64 = Buffer.from(reqBytes).toString("base64");


same: do we need base64 encoding here?

rafal-hawrylak requested review from andrzej-grudzien and apilaskowski March 9, 2026 19:43

rafal-hawrylak force-pushed the cli_jit_workers branch from a08d4da to 01fa6ba Compare March 9, 2026 19:55

rafal-hawrylak requested review from ikholopov-omni and kolina March 9, 2026 19:59

rafal-hawrylak marked this pull request as ready for review March 9, 2026 19:59

rafal-hawrylak requested a review from a team as a code owner March 9, 2026 19:59

rafal-hawrylak enabled auto-merge (squash) March 9, 2026 19:59

rafal-hawrylak self-assigned this Mar 9, 2026

kolina requested changes Mar 10, 2026

View reviewed changes

rafal-hawrylak force-pushed the cli_jit_workers branch from 01fa6ba to 33b9f19 Compare March 10, 2026 14:39

apilaskowski reviewed Mar 11, 2026

View reviewed changes

ikholopov-omni requested changes Mar 11, 2026

View reviewed changes

rafal-hawrylak force-pushed the cli_jit_workers branch from 33b9f19 to d6583e7 Compare March 11, 2026 20:01

JiT: protos update to enable execution in CLI

5b62867

- Enhance error coercion in common/errors/errors.ts - Sync protos/execution.proto and protos/jit.proto with new fields

rafal-hawrylak force-pushed the cli_jit_workers branch 8 times, most recently from 6eb0cd1 to a05cf58 Compare March 12, 2026 14:39

rafal-hawrylak force-pushed the cli_jit_workers branch from a05cf58 to be56edb Compare March 12, 2026 15:50

kolina requested changes Mar 14, 2026

View reviewed changes

		@@ -0,0 +1,101 @@
		import { IDbAdapter, IDbClient } from "df/cli/api/dbadapters";

Conversation

rafal-hawrylak commented Mar 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ikholopov-omni Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ikholopov-omni Mar 12, 2026 •

edited

Loading