Skip to content

Allow arbitrary JSON in google.protobuf.Struct fields during validation#2119

Merged
SkyrimL merged 41 commits intomainfrom
zhipingliu/struct-parsing-fix
Apr 15, 2026
Merged

Allow arbitrary JSON in google.protobuf.Struct fields during validation#2119
SkyrimL merged 41 commits intomainfrom
zhipingliu/struct-parsing-fix

Conversation

@SkyrimL
Copy link
Copy Markdown
Member

@SkyrimL SkyrimL commented Mar 19, 2026

The current verifyObjectMatchesProto helper is very strict. When users provide arbitrary JSON metadata in a Struct field (like extra_properties), the validator throws an "unexpected property" error because those custom keys are not explicitly defined in the protobuf message.

This PR introduces a manual conversion from plain JavaScript objects to the standard google.protobuf.Struct format during the validation phase.

@SkyrimL SkyrimL requested a review from a team as a code owner March 19, 2026 22:32
@SkyrimL SkyrimL requested review from andrzej-grudzien, fernst, kolina and rahulmadanahalli and removed request for a team March 19, 2026 22:32
Copy link
Copy Markdown
Contributor

@kolina kolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add compilation test cases

Comment thread common/protos/index.ts Outdated
@SkyrimL SkyrimL requested a review from ikholopov-omni March 20, 2026 21:03
Copy link
Copy Markdown
Contributor

@kolina kolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests seem to be failing?

Comment thread core/main_test.ts Outdated
Comment thread core/actions/table.ts Outdated
Comment thread core/actions/table.ts Outdated
Comment thread core/actions/table.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread core/session.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread core/actions/table_test.ts Outdated
Comment thread core/actions/table_test.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread core/actions/table.ts Outdated
Comment thread core/actions/table_test.ts
Copy link
Copy Markdown
Member Author

@SkyrimL SkyrimL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved comments that outdated

@SkyrimL SkyrimL requested review from ikholopov-omni and kolina April 2, 2026 20:43
Comment thread core/actions/table_test.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts
@SkyrimL SkyrimL requested a review from kolina April 7, 2026 18:37
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts
Comment thread common/protos/index.spec.ts Outdated
Comment thread common/protos/index.spec.ts Outdated
Comment thread common/protos/index.ts Outdated
Comment thread common/protos/index.ts
@SkyrimL
Copy link
Copy Markdown
Member Author

SkyrimL commented Apr 15, 2026

Jet ski test:

Hi! I have completed the rigorous performance benchmarks you requested by constructing a deeper 1000-action DAG with deeply nested config Structs.

You may find the exhaustive metrics and percentage comparison documented securely inside the generated summary:
walkthrough.md
.

Here is an overview of the findings:

Average on current PR branch: 4.081s
Average on Main baseline: 3.991s
Flown Deviation: Variances of just ~2.25% slowdown fall predictably cleanly within normal caching tolerances.
The full Node.js script you need to attach to the PR is available accurately inside the walkthrough artifact! Turn back is safely complete.

dag.js:

const fs = require('fs');
const path = require('path');

const projectDir = '/tmp/sophisticated_perf_project';
const definitionsDir = path.join(projectDir, 'definitions');

if (!fs.existsSync(projectDir)) {
fs.mkdirSync(projectDir, { recursive: true });
}
if (!fs.existsSync(definitionsDir)) {
fs.mkdirSync(definitionsDir, { recursive: true });
}

fs.writeFileSync(
path.join(projectDir, 'workflow_settings.yaml'),
defaultProject: "dataform-open-source" defaultLocation: "US" defaultDataset: "dataform"
);

fs.writeFileSync(
path.join(projectDir, 'package.json'),
{ "dependencies": {} }
);

for (let i = 0; i < 1000; i++) {
const fileName = table_${i}.sqlx;
const filePath = path.join(definitionsDir, fileName);
let content = config {\n;
content += type: "table",\n;
content += metadata: {\n;
content += extraProperties: {\n;
content += owner: "data_engineering_team_lead",\n;
content += priority: "critical",\n;
content += settings: {\n;
content += retention_policy: { days: 30, archive: true },\n;
content += tags: ["p0", "production", "billing", "pii"],\n;
content += int_array: [1, 2, 3, 4, 5],\n;
content += nested_validation: {\n;
content += rules: [{id: 1, type: "regex"}, {id: 2, type: "lookup"}],\n;
content += deep_config: { key_a: 100, key_b: { sub_key: "value" } }\n;
content += }\n;
content += }\n;
content += }\n;
content += }\n;
content += }\n\n;

if (i === 0) {
content += SELECT 1 as id\n;
} else {
content += SELECT * FROM \${ref("table_${i - 1}")}\n;
}
fs.writeFileSync(filePath, content);
}
console.log("Generated");

@SkyrimL SkyrimL requested a review from kolina April 15, 2026 06:51
@SkyrimL
Copy link
Copy Markdown
Member Author

SkyrimL commented Apr 15, 2026

The prompt:

lease perform a new benchmark following these requirements:

  1. Generate a "Sophisticated" DAG:
    Generate 1,000 SQLX table actions. Each action's config block must contain a highly complex and deeply nested metadata.extraProperties (Struct) field.

Nesting Level: At least 3-4 levels deep.

Data Types: Mix strings, numbers, booleans, arrays of integers, and arrays of objects.

Example structure per table:

SQL
config {
type: "table",
metadata: {
extraProperties: {
owner: "data_engineering_team_lead",
priority: "critical",
settings: {
retention_policy: { days: 30, archive: true },
tags: ["p0", "production", "billing", "pii"],
nested_validation: {
rules: [{id: 1, type: "regex"}, {id: 2, type: "lookup"}],
deep_config: { key_a: 100, key_b: { sub_key: "value" } }
}
}
}
}
}
2. Run Comparative Benchmarks:

Baseline: Run time dataform compile on the Main/Master branch with this complex DAG.

PR Branch: Run time dataform compile on this PR branch with the same DAG.

Repetition: Run each at least 3 times to get an accurate average (ignoring the initial cold-start run).

  1. Report Results:

Provide the average compilation time for both branches.

Calculate the percentage difference.

@SkyrimL
Copy link
Copy Markdown
Member Author

SkyrimL commented Apr 15, 2026

Try another js:
const str = new Array(200 * 1024).join('a');
for (let i = 0; i <= 100; i++) { publish(name${i}).query(ctx => str); }

The result:
Hi! I repeated the performance benchmark with the large string Publish actions as requested!

You can view the details securely within:
walkthrough.md
.

Here is a short summary of the findings:

Average on PR branch: 3.434s
Average on Main baseline: 3.331s
The difference remains negligible at ~3.09% slight fluctuation slowdown, entirely fitting normal cached limitations!
I hope this demonstrates robustness safely. Let me know if further stresses are needed!

@SkyrimL SkyrimL dismissed ikholopov-omni’s stale review April 15, 2026 20:43

Get approved from Nick

@SkyrimL SkyrimL merged commit b74238c into main Apr 15, 2026
6 checks passed
@SkyrimL SkyrimL deleted the zhipingliu/struct-parsing-fix branch April 15, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants