Skip to content

Commit 413fa07

Browse files
committed
fix(cli): surface uncaught exception as user error (FAILED) not system failure
Following on from the prior commit that wired UNCAUGHT_EXCEPTION to fail the attempt: the parseExecuteError branch returned an INTERNAL_ERROR with code TASK_EXECUTION_FAILED, which made the run show as "System failure" in the dashboard. The exception was raised by user code (or a dependency the user controls — e.g. an EventEmitter "error" event with no listener), so it should surface as a regular task failure ("Failed" status), not as a platform fault. Widen parseExecuteError's return to TaskRunError and have the UncaughtExceptionError branch return a BUILT_IN_ERROR carrying the original error name, message, and stack. This routes through the same finalization path as a thrown user error: status=FAILED, normal retry policy, catchError / handleError hooks fire as expected. Both call sites (managed/execution.ts, dev-run-controller.ts) already pass the result into TaskRunFailedExecutionResult.error, which accepts the full TaskRunError union — no caller-side changes needed.
1 parent ae48697 commit 413fa07

3 files changed

Lines changed: 26 additions & 19 deletions

File tree

.changeset/uncaught-exception-fail-attempt.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,4 @@
22
"trigger.dev": patch
33
---
44

5-
Fix runs hanging to `MAX_DURATION_EXCEEDED` after an uncaught exception. When a Node `EventEmitter` (e.g. `node-redis`) emits an `"error"` event with no listener attached, Node escalates it to `process.on("uncaughtException")` in the task worker. The worker reported the error via the `UNCAUGHT_EXCEPTION` IPC event but did not exit, and the supervisor-side handler in `taskRunProcess` only logged the message at debug level — leaving the `run()` promise orphaned until `maxDuration` fired and producing empty attempts (`durationMs=0`, `costInCents=0`).
6-
7-
The supervisor now rejects the in-flight attempt with an `UncaughtExceptionError` and gracefully terminates the worker (preserving the OTEL flush window) on `UNCAUGHT_EXCEPTION`. The attempt fails fast with `TASK_EXECUTION_FAILED`, surfacing the original error name, message, and stack trace, and falls under the normal retry policy. This mirrors the existing indexing-side behavior. Apply the same handling to unhandled promise rejections, which Node already routes through `uncaughtException` by default.
8-
9-
Customers should still attach `client.on("error", ...)` listeners to long-lived clients (Redis, Postgres, etc.) and let awaited command rejections drive failure semantics — but a missed listener will no longer silently consume the entire `maxDuration` budget.
5+
Fail attempts on uncaught exceptions instead of hanging to `MAX_DURATION_EXCEEDED`. A Node `EventEmitter` (e.g. `node-redis`) emitting `"error"` with no `.on("error", ...)` listener escalates to `uncaughtException`, which the worker previously reported but did not act on — runs drifted to maxDuration with empty attempts. They now fail fast with the original error and status `FAILED`. You should still attach `.on("error", ...)` listeners to long-lived clients to handle errors gracefully.

packages/cli-v3/src/executions/taskRunProcess.test.ts

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ describe("TaskRunProcess", () => {
120120
});
121121

122122
describe("parseExecuteError(UncaughtExceptionError)", () => {
123-
it("surfaces the original error name/message/stack as TASK_EXECUTION_FAILED", () => {
123+
it("surfaces the original error as a BUILT_IN_ERROR so the run shows as Failed, not System failure", () => {
124124
const error = new UncaughtExceptionError(
125125
{
126126
name: "Error",
@@ -133,22 +133,27 @@ describe("TaskRunProcess", () => {
133133

134134
const result = TaskRunProcess.parseExecuteError(error);
135135

136-
expect(result.type).toBe("INTERNAL_ERROR");
137-
expect(result.code).toBe("TASK_EXECUTION_FAILED");
138-
expect(result.message).toBe("Uncaught uncaughtException: read ECONNRESET");
139-
expect(result.stackTrace).toContain("TCP.onStreamRead");
136+
expect(result.type).toBe("BUILT_IN_ERROR");
137+
if (result.type === "BUILT_IN_ERROR") {
138+
expect(result.name).toBe("Error");
139+
expect(result.message).toBe("read ECONNRESET");
140+
expect(result.stackTrace).toContain("TCP.onStreamRead");
141+
}
140142
});
141143

142-
it("preserves origin=unhandledRejection in the surfaced message", () => {
144+
it("preserves the original error for unhandledRejection origin too", () => {
143145
const error = new UncaughtExceptionError(
144-
{ name: "Error", message: "boom" },
146+
{ name: "TypeError", message: "boom" },
145147
"unhandledRejection"
146148
);
147149

148150
const result = TaskRunProcess.parseExecuteError(error);
149151

150-
expect(result.code).toBe("TASK_EXECUTION_FAILED");
151-
expect(result.message).toBe("Uncaught unhandledRejection: boom");
152+
expect(result.type).toBe("BUILT_IN_ERROR");
153+
if (result.type === "BUILT_IN_ERROR") {
154+
expect(result.name).toBe("TypeError");
155+
expect(result.message).toBe("boom");
156+
}
152157
});
153158
});
154159
});

packages/cli-v3/src/executions/taskRunProcess.ts

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import {
88
TaskRunExecution,
99
TaskRunExecutionPayload,
1010
TaskRunExecutionResult,
11+
type TaskRunError,
1112
type TaskRunInternalError,
1213
tryCatch,
1314
WorkerManifest,
@@ -555,7 +556,7 @@ export class TaskRunProcess {
555556
return this._child.connected;
556557
}
557558

558-
static parseExecuteError(error: unknown, dockerMode = true): TaskRunInternalError {
559+
static parseExecuteError(error: unknown, dockerMode = true): TaskRunError {
559560
if (error instanceof CancelledProcessError) {
560561
return {
561562
type: "INTERNAL_ERROR",
@@ -590,11 +591,16 @@ export class TaskRunProcess {
590591
}
591592

592593
if (error instanceof UncaughtExceptionError) {
594+
// Surface the customer's original error as a regular task failure (user
595+
// error → "Failed" status) rather than an internal error → "System
596+
// failure" status. The exception was raised by user code (or a
597+
// dependency the user controls, e.g. an EventEmitter "error" event with
598+
// no listener); it isn't a platform fault.
593599
return {
594-
type: "INTERNAL_ERROR",
595-
code: TaskRunErrorCodes.TASK_EXECUTION_FAILED,
596-
message: `Uncaught ${error.origin}: ${error.originalError.message}`,
597-
stackTrace: error.originalError.stack,
600+
type: "BUILT_IN_ERROR",
601+
name: error.originalError.name,
602+
message: error.originalError.message,
603+
stackTrace: error.originalError.stack ?? "",
598604
};
599605
}
600606

0 commit comments

Comments
 (0)