Skip to content

Commit 8abe302

Browse files
committed
fix(sea): address #411 review — exhaustive interval switch, docs, coverage
Validated every interval edge case (null, multi-row, negative sub-year, sibling-survives) against a live pecotesting warehouse first — all byte-identical to Thrift. The findings were layering/dead-code/coverage, not runtime bugs. - F1: corrected the DURATION_UNIT_METADATA_KEY doc — the interval/duration branches are SEA-gated by construction (Thrift maps INTERVAL → ArrowString and never reaches them), NOT "reused by thrift" as the old comment claimed. - F3/F6/F10: formatArrowInterval now handles YEAR_MONTH only (typed `Interval` + `IntervalUnit.YEAR_MONTH`, no magic `=== 0`) and THROWS on any other unit. The old non-exhaustive default silently misread MONTH_DAY_NANO/undefined as [days,ms]; and the native Interval[DayTime] branch was dead+broken (the kernel emits DAY-TIME as Duration, handled separately) — removed it. - F2: exported the metadata key + added a test pinning it equal to the SEA-side declaration (guards against a silent rename drift). - F8: dropped the dead `_unit` param from formatDayTimeFromTotal. - F9: removed the dead duration check in the BigNum branch (rewritten Int64s arrive as raw bigint) and fixed the false "Int32Array instanceof Uint8Array" comment. - F7: added a YEAR-MONTH sub-year-negative unit test (-1 month → "-0-1"). - F4: added live e2e for null INTERVAL → null and multi-row batches. - F5: e2e before() now probes getSeaNative() and skips (not errors) when the binding is absent; dropped the flaky wall-clock latency assertions (assert behavior — cancel resolved, callback fired once). Deferred (low/informational, noted): F11 (consolidate test makeContext helpers), F12 (tighten instanceOf assertions), F13 (per-operation interval-representation breadcrumb — a per-value log would be spam). Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 08f109d commit 8abe302

4 files changed

Lines changed: 176 additions & 71 deletions

File tree

lib/result/ArrowResultConverter.ts

Lines changed: 44 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ import {
66
TypeMap,
77
DataType,
88
Type,
9+
Interval,
10+
IntervalUnit,
911
StructRow,
1012
MapRow,
1113
Vector,
@@ -15,6 +17,7 @@ import {
1517
} from 'apache-arrow';
1618
import { TTableSchema, TColumnDesc } from '../../thrift/TCLIService_types';
1719
import IClientContext from '../contracts/IClientContext';
20+
import HiveDriverError from '../errors/HiveDriverError';
1821
import IResultsProvider, { ResultsProviderFetchNextOptions } from './IResultsProvider';
1922
import { ArrowBatch, getSchemaColumns, convertThriftValue } from './utils';
2023

@@ -24,55 +27,51 @@ type ArrowSchema = Schema<TypeMap>;
2427
type ArrowSchemaField = Field<DataType<Type, TypeMap>>;
2528

2629
/**
27-
* Metadata key carrying the original Arrow `Duration` time unit on
28-
* fields that were rewritten to `Int64` by the SEA IPC pre-processor
29-
* (`lib/sea/SeaArrowIpcDurationFix.ts`). We re-declare the constant
30-
* here (rather than importing it) so the converter has no compile-time
31-
* dependency on the SEA module — it's reused unchanged by the
32-
* thrift-path which has no SEA awareness.
30+
* Metadata key carrying the original Arrow `Duration` time unit on fields
31+
* rewritten to `Int64` by the SEA IPC pre-processor
32+
* (`lib/sea/SeaArrowIpcDurationFix.ts`). Re-declared here (rather than
33+
* imported) to keep this generic `lib/result` converter free of a
34+
* compile-time dependency on `lib/sea`.
35+
*
36+
* **SEA-gated by construction — NOT shared with Thrift.** This key (and the
37+
* `DataType.isInterval` / duration branches below) only ever execute on the
38+
* SEA path. The Thrift backend sets `intervalTypesAsArrow: false` and maps
39+
* both INTERVAL `TTypeId`s to `ArrowString` (`lib/result/utils.ts`), so the
40+
* server pre-formats intervals to strings and this logic is never reached.
41+
* `export`ed so `SeaIntervalParity.test` can pin it equal to the SEA-side
42+
* declaration and catch a rename/typo that would silently no-op the consumer.
3343
*/
34-
const DURATION_UNIT_METADATA_KEY = 'databricks.arrow.duration_unit';
44+
export const DURATION_UNIT_METADATA_KEY = 'databricks.arrow.duration_unit';
3545
const ZERO_BIGINT = BigInt(0);
3646
const NS_PER_MICRO = BigInt(1_000);
3747
const NS_PER_MILLI = BigInt(1_000_000);
3848
const NS_PER_SEC = BigInt(1_000_000_000);
39-
const MS_PER_DAY = BigInt(86_400_000);
4049
const NS_PER_MIN = NS_PER_SEC * BigInt(60);
4150
const NS_PER_HOUR = NS_PER_MIN * BigInt(60);
4251
const NS_PER_DAY = NS_PER_HOUR * BigInt(24);
4352

4453
/**
45-
* Format an Arrow `Interval[YearMonth]` or `Interval[DayTime]` value
46-
* into the canonical thrift string the JDBC/ODBC server emits:
47-
* YEAR-MONTH → `"Y-M"` (e.g. 1 year 2 months → `"1-2"`)
48-
* DAY-TIME → `"D HH:mm:ss.fffffffff"`
49-
* (e.g. 1 day 02:03:04 → `"1 02:03:04.000000000"`)
54+
* Format a native Arrow `Interval[YearMonth]` value into the canonical thrift
55+
* string `"Y-M"` (e.g. 1 year 2 months → `"1-2"`, -1 month → `"-0-1"`).
5056
*
51-
* Arrow surfaces these as `Int32Array(2)` via the `GetVisitor`
52-
* (`apache-arrow/visitor/get.js:177-185`):
53-
* YEAR-MONTH: `[years, months]` (years/months derived from a single
54-
* int32 holding total months)
55-
* DAY-TIME: `[days, milliseconds]` (legacy two-int32 form)
57+
* Arrow surfaces YEAR-MONTH as an `Int32Array(2)` `[years, months]` via the
58+
* `GetVisitor` (years/months derived from a single int32 of total months).
5659
*
57-
* Negative intervals: the FULL interval is emitted with a leading `-`
58-
* (Spark convention), and individual fields are unsigned. We mirror
59-
* Spark's display.
60+
* **Only YEAR_MONTH reaches here.** The kernel emits INTERVAL DAY-TIME as an
61+
* Arrow `Duration` (rewritten to `Int64`), handled by
62+
* `formatDurationToIntervalDayTime` — never as a native `Interval[DayTime]`.
63+
* Any other unit (DAY_TIME / MONTH_DAY_NANO / undefined) is therefore
64+
* unexpected; we throw rather than silently misread the value as `[days, ms]`
65+
* and emit a confidently-wrong string (the old non-exhaustive default).
6066
*/
61-
function formatArrowInterval(value: any, valueType: any): string {
62-
// `value` is an Int32Array of length 2.
63-
const a = Number(value[0]);
64-
const b = Number(value[1]);
65-
// unit 0 = YEAR_MONTH, unit 1 = DAY_TIME, unit 2 = MONTH_DAY_NANO
66-
const unit = valueType?.unit;
67-
if (unit === 0) {
68-
return formatYearMonth(a, b);
67+
function formatArrowInterval(value: Int32Array, valueType: Interval): string {
68+
if (valueType?.unit !== IntervalUnit.YEAR_MONTH) {
69+
throw new HiveDriverError(
70+
`SEA result converter: unsupported Arrow Interval unit ${valueType?.unit}. The kernel emits only ` +
71+
`YEAR_MONTH as a native Arrow Interval (DAY-TIME arrives as Duration); MONTH_DAY_NANO is unsupported.`,
72+
);
6973
}
70-
// DAY_TIME: a = days, b = milliseconds (within the day, can be ≥0 or <0)
71-
// We re-normalise: total milliseconds = a * 86_400_000 + b, then split into
72-
// days, hours, minutes, seconds, nanoseconds (nanoseconds is always 0
73-
// because the legacy IntervalDayTime carries only millisecond precision).
74-
const totalMs = BigInt(a) * MS_PER_DAY + BigInt(b);
75-
return formatDayTimeFromTotal(totalMs * NS_PER_MILLI /* → ns */, 'NANOSECOND');
74+
return formatYearMonth(Number(value[0]), Number(value[1]));
7675
}
7776

7877
/**
@@ -107,7 +106,7 @@ function formatYearMonth(years: number, months: number): string {
107106
function formatDurationToIntervalDayTime(value: bigint | number, unit: string): string {
108107
const bi = typeof value === 'bigint' ? value : BigInt(value);
109108
const nanos = toNanoseconds(bi, unit);
110-
return formatDayTimeFromTotal(nanos, unit);
109+
return formatDayTimeFromTotal(nanos);
111110
}
112111

113112
/**
@@ -137,13 +136,9 @@ function toNanoseconds(value: bigint, unit: string): bigint {
137136
* Always emits 9 fractional digits to match the thrift driver's wire
138137
* format (`"1 02:03:04.000000000"` — 9 digits regardless of the
139138
* server-side storage precision). Negative values get a single
140-
* leading `-`.
141-
*
142-
* The `unit` parameter is currently unused for formatting (the value
143-
* is already in nanoseconds by the time we get here) but is retained
144-
* for future use if a unit-aware precision is ever needed.
139+
* leading `-`. The caller has already scaled to nanoseconds.
145140
*/
146-
function formatDayTimeFromTotal(totalNanos: bigint, _unit: string): string {
141+
function formatDayTimeFromTotal(totalNanos: bigint): string {
147142
const sign = totalNanos < ZERO_BIGINT ? '-' : '';
148143
const abs = totalNanos < ZERO_BIGINT ? -totalNanos : totalNanos;
149144

@@ -369,23 +364,17 @@ export default class ArrowResultConverter implements IResultsProvider<Array<any>
369364
if (DataType.isDecimal(valueType)) {
370365
return Number(result) / 10 ** valueType.scale;
371366
}
372-
// Duration columns rewritten to Int64 — detect via metadata.
373-
const durationUnit = field?.metadata.get(DURATION_UNIT_METADATA_KEY);
374-
if (durationUnit) {
375-
return formatDurationToIntervalDayTime(result, durationUnit);
376-
}
367+
// A rewritten Duration Int64 surfaces as a raw `bigint`, not a BigNum
368+
// wrapper, so it is handled in the bigint branch below — not here.
377369
return result;
378370
}
379371

380-
// Convert binary data to Buffer
372+
// Convert binary data to Buffer.
381373
if (value instanceof Uint8Array) {
382-
// INTERVAL DAY-TIME / YEAR-MONTH that apache-arrow surfaced as
383-
// an Int32Array (size 2). `Uint8Array.isInstanceOf` is true for
384-
// every TypedArray subclass, so we have to check the parent type
385-
// first. The `DataType.isInterval` branch above already handles
386-
// the case where Arrow knew the field was an interval — this
387-
// fallback covers schemas where the interval surfaced as bare
388-
// bytes (defensive; not exercised in M0).
374+
// Note: Arrow `Int32Array` / `BigInt64Array` are NOT `instanceof
375+
// Uint8Array` (they are sibling TypedArrays), so an interval value never
376+
// reaches this branch — intervals are handled by the `isInterval` /
377+
// bigint branches above. This is purely the binary-column → Buffer path.
389378
return Buffer.from(value);
390379
}
391380

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
// Copyright (c) 2026 Databricks, Inc.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
/* eslint-disable no-console */
16+
17+
import { expect } from 'chai';
18+
import { DBSQLClient } from '../../../lib';
19+
import { ConnectionOptions } from '../../../lib/contracts/IDBSQLClient';
20+
import { InternalConnectionOptions } from '../../../lib/contracts/InternalConnectionOptions';
21+
import { getSeaNative } from '../../../lib/sea/SeaNativeLoader';
22+
23+
// INTERVAL edge cases the unit suite can't easily build (null, multi-row).
24+
// Verified byte-identical to the Thrift backend against a live warehouse.
25+
// Requires the pecotesting secrets AND the native binding — skips otherwise.
26+
27+
interface PecoSecrets {
28+
host: string;
29+
path: string;
30+
token: string;
31+
}
32+
33+
function readSecrets(): PecoSecrets | null {
34+
const host = process.env.DATABRICKS_PECOTESTING_SERVER_HOSTNAME;
35+
const path = process.env.DATABRICKS_PECOTESTING_HTTP_PATH;
36+
const token = process.env.DATABRICKS_PECOTESTING_TOKEN_PERSONAL;
37+
if (!host || !path || !token) return null;
38+
return { host, path, token };
39+
}
40+
41+
async function seaValues(sql: string, secrets: PecoSecrets): Promise<unknown[]> {
42+
const client = new DBSQLClient();
43+
await client.connect({ ...secrets, useSEA: true } as ConnectionOptions & InternalConnectionOptions);
44+
try {
45+
const session = await client.openSession();
46+
const op = await session.executeStatement(sql);
47+
const rows = (await op.fetchAll()) as Array<Record<string, unknown>>;
48+
await op.close();
49+
await session.close();
50+
return rows.map((r) => r.v);
51+
} finally {
52+
await client.close();
53+
}
54+
}
55+
56+
describe('SEA INTERVAL edge cases end-to-end', function suite() {
57+
this.timeout(120_000);
58+
59+
const secrets = readSecrets();
60+
61+
before(function gate() {
62+
// eslint-disable-next-line no-invalid-this
63+
const self = this;
64+
if (!secrets) {
65+
self.skip();
66+
return;
67+
}
68+
// Skip (not error) when the native binding isn't built/installed.
69+
try {
70+
getSeaNative();
71+
} catch {
72+
self.skip();
73+
}
74+
});
75+
76+
it('NULL INTERVAL DAY-TIME → null', async () => {
77+
const values = await seaValues('SELECT CAST(NULL AS INTERVAL DAY TO SECOND) AS v', secrets as PecoSecrets);
78+
expect(values).to.deep.equal([null]);
79+
});
80+
81+
it('multi-row INTERVAL DAY-TIME batch formats every row', async () => {
82+
const values = await seaValues(
83+
"SELECT * FROM VALUES (INTERVAL '1' DAY), (INTERVAL '2 03:00:00' DAY TO SECOND), (INTERVAL '0' DAY) AS t(v)",
84+
secrets as PecoSecrets,
85+
);
86+
expect(values).to.deep.equal(['1 00:00:00.000000000', '2 03:00:00.000000000', '0 00:00:00.000000000']);
87+
});
88+
});

tests/e2e/sea/operation-lifecycle-e2e.test.ts

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -101,13 +101,23 @@ describe('SEA operation lifecycle — end-to-end', function suite() {
101101
const token = process.env.DATABRICKS_PECOTESTING_TOKEN_PERSONAL || process.env.E2E_ACCESS_TOKEN;
102102

103103
before(function gate() {
104+
// eslint-disable-next-line no-invalid-this
105+
const self = this;
104106
if (!hostName || !httpPath || !token) {
105-
// eslint-disable-next-line no-invalid-this
106-
this.skip();
107+
self.skip();
108+
return;
109+
}
110+
// Creds present but the native binding not built/installed (e.g. a CI
111+
// runner without the .node) must SKIP, not error: probe getSeaNative()
112+
// here so every test isn't an uncaught-throw at its first call.
113+
try {
114+
getSeaNative();
115+
} catch {
116+
self.skip();
107117
}
108118
});
109119

110-
it('cancel() succeeds against a live SEA statement and is fast', async () => {
120+
it('cancel() succeeds against a live SEA statement', async () => {
111121
const binding = getSeaNative() as unknown as NativeBinding;
112122

113123
const connection = await binding.openSession({
@@ -130,12 +140,9 @@ describe('SEA operation lifecycle — end-to-end', function suite() {
130140
context: makeContext(),
131141
});
132142

133-
const t0 = Date.now();
143+
// Assert behavior (cancel resolves with success), not wall-clock latency
144+
// — a hard ms budget against a live warehouse is flaky on slow networks.
134145
const status = await op.cancel();
135-
const elapsed = Date.now() - t0;
136-
137-
// Cancel must complete within 200ms.
138-
expect(elapsed).to.be.lessThan(200, `cancel latency ${elapsed}ms exceeds 200ms budget`);
139146
expect(status.isSuccess).to.equal(true);
140147
} finally {
141148
// Bypass `op.close()` here because we want to verify cancel
@@ -170,10 +177,7 @@ describe('SEA operation lifecycle — end-to-end', function suite() {
170177
context: makeContext(),
171178
});
172179

173-
const t0 = Date.now();
174180
await op.cancel();
175-
const elapsed = Date.now() - t0;
176-
expect(elapsed).to.be.lessThan(200, `cancel latency ${elapsed}ms exceeds 200ms budget`);
177181

178182
// After cancel, fetchChunk must throw the cancellation error
179183
// (regardless of whether the underlying fetch implementation
@@ -247,17 +251,15 @@ describe('SEA operation lifecycle — end-to-end', function suite() {
247251
});
248252

249253
let ticks = 0;
250-
const t0 = Date.now();
251254
await op.waitUntilReady({
252255
callback: () => {
253256
ticks += 1;
254257
},
255258
});
256-
const elapsed = Date.now() - t0;
257259

258-
// M0 finished() is a no-op — must resolve in <50ms.
259-
expect(elapsed).to.be.lessThan(50);
260-
// Progress callback fires exactly once.
260+
// M0 finished() is a no-op that resolves immediately and fires the
261+
// progress callback exactly once. Assert the behavior, not a wall-clock
262+
// budget (flaky against a live warehouse).
261263
expect(ticks).to.equal(1);
262264
} finally {
263265
if (statement !== null) {

tests/unit/sea/SeaIntervalParity.test.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ import { TimeUnit as FbTimeUnit } from 'apache-arrow/fb/time-unit';
6161

6262
import SeaOperationBackend from '../../../lib/sea/SeaOperationBackend';
6363
import ClientContextStub from '../.stubs/ClientContextStub';
64+
import { DURATION_UNIT_METADATA_KEY as CONVERTER_DURATION_KEY } from '../../../lib/result/ArrowResultConverter';
65+
import { DURATION_UNIT_METADATA_KEY as REWRITER_DURATION_KEY } from '../../../lib/sea/SeaArrowIpcDurationFix';
6466

6567
// ---------------------------------------------------------------------------
6668
// Test helpers.
@@ -363,4 +365,28 @@ describe('SeaOperationBackend — INTERVAL parity with thrift', () => {
363365
expect(rows).to.have.length(1);
364366
expect((rows[0] as any).iv).to.equal('1 00:00:00.000000000');
365367
});
368+
369+
it('YEAR-MONTH negative sub-year (-1 month) → "-0-1"', async () => {
370+
// |total| < 12 negatives are the trickiest formatYearMonth case (years
371+
// truncates to 0, the sign must still lead). Verified live byte-identical
372+
// to thrift.
373+
const fields = [withTypeName(new Field('iv', new Interval(IntervalUnit.YEAR_MONTH), true), 'INTERVAL')];
374+
const schema = new Schema(fields);
375+
const schemaIpc = ipcSchemaOnly(schema);
376+
const dataIpc = ipcFromColumns(schema, { iv: Int32Array.from([-1]) });
377+
378+
const stub = new StatementStub(schemaIpc, [dataIpc]);
379+
const backend = new SeaOperationBackend({ statement: stub, context: new ClientContextStub() });
380+
const rows = await backend.fetchChunk({ limit: 100 });
381+
expect(rows).to.have.length(1);
382+
expect((rows[0] as any).iv).to.equal('-0-1');
383+
});
384+
385+
it('the duration_unit metadata key is identical in the rewriter and the converter', () => {
386+
// Both modules declare the key independently (the converter avoids a
387+
// compile-time dependency on lib/sea). Pin them equal so a rename/typo in
388+
// one doesn't silently turn the converter's duration branch into a no-op.
389+
expect(CONVERTER_DURATION_KEY).to.equal(REWRITER_DURATION_KEY);
390+
expect(CONVERTER_DURATION_KEY).to.equal('databricks.arrow.duration_unit');
391+
});
366392
});

0 commit comments

Comments
 (0)