Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions rfcs/0001-trace-archival/0001-trace-archival.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ Each `traces.pb` file contains a single `TracesData` message with all spans for

**Current OSS behavior:** In the current open-source version of MLflow, traces are always written to the database. The server persists the trace record and span content in the tracking store (e.g. `SqlAlchemyStore`); span location is `TRACKING_STORE` and full trace/span data is stored in the DB.

**Existing trace artifact upload path (V3 / artifact-backed traces):** When a trace is created (e.g. via the MLflow V3 exporter in `mlflow/tracing/export/mlflow_v3.py` or the inference-table exporter), the server creates the trace record in the tracking store and sets a tag `MLFLOW_ARTIFACT_LOCATION` on the trace with the URI where trace data should be stored. That URI is the experiment's artifact location plus `/traces/<trace_id>/artifacts/` (see `SqlAlchemyStore._get_trace_artifact_location_tag`). The **client** then uploads the full trace payload as a single JSON file named `traces.json` to that URI using the artifact repository (`ArtifactRepository.upload_trace_data` in `mlflow/store/artifact/artifact_repo.py`). The client uses `get_artifact_uri_for_trace(trace_info)` from `mlflow/tracing/utils/artifact_utils.py` to resolve the URI from the trace's tags and performs the upload; when not in proxy mode, the client therefore needs credentials for the artifact store. When the UI or API needs full trace data, the same URI is used to download `traces.json` and parse it. This path is used when the server marks the trace's span location as `ARTIFACT_REPO` (via the `mlflow.trace.spansLocation` tag), so span content is read from the artifact store rather than the DB. This mechanism exists in OSS today and coexists with DB-backed span storage (e.g. spans written via `log_spans`).
**Existing trace artifact upload path (V3 / artifact-backed traces):** When a trace is created (e.g. via the MLflow V3 exporter in `mlflow/tracing/export/mlflow_v3.py` or the inference-table exporter), the server creates the trace record in the tracking store and sets a tag `MLFLOW_ARTIFACT_LOCATION` on the trace with the URI where trace artifacts should be stored. That URI is the experiment's artifact location plus `/traces/<trace_id>/artifacts/` (see `SqlAlchemyStore._get_trace_artifact_location_tag`). The **client** then uploads the full trace payload as a single JSON file named `traces.json` to that URI using the artifact repository (`ArtifactRepository.upload_trace_data` in `mlflow/store/artifact/artifact_repo.py`). The client uses `get_artifact_uri_for_trace(trace_info)` from `mlflow/tracing/utils/artifact_utils.py` to resolve the URI from the trace's tags and performs the upload; when not in proxy mode, the client therefore needs credentials for the artifact store. When the UI or API needs full trace data, the same URI is used to download `traces.json` and parse it. The same trace artifact root is also used for trace attachments (`attachments/<attachment_id>`). This path is used when the server marks the trace's span location as `ARTIFACT_REPO` (via the `mlflow.trace.spansLocation` tag), so span content is read from the artifact store rather than the DB. This mechanism exists in OSS today and coexists with DB-backed span storage (e.g. spans written via `log_spans`).

**Relationship to the proposed archival:** Today there are already two trace storage paths in OSS: (1) DB-backed spans written via `log_spans` (span content in the `spans` table), and (2) artifact-backed `traces.json` payloads uploaded by V3-style exporters to the artifact repository. The proposed archival mechanism is a third, separate path: it stores span content in OTLP protobuf format at a configurable archival repository (`--trace-archival-location`), which may or may not be the same as the artifact store. The proposed `traces.pb` archival is for offloading span content from the DB for traces that were initially written there via the DB-backed path. These three mechanisms can coexist.

Expand All @@ -210,9 +210,9 @@ Each `traces.pb` file contains a single `TracesData` message with all spans for
- **`ARCHIVE_REPO`** (new): Span content has been archived to the archival repository. The `spans.content` column is cleared for archived traces; span metadata rows are retained for index-based filtering. Retrieval loads span data from the archival repository.
- **`ARTIFACT_REPO`** (existing): Retained for existing behavior (e.g. V3 API traces whose spans are stored in the artifact store). Distinct from `ARCHIVE_REPO` for the proposed OTLP archival repository.

Traces stay in `TRACKING_STORE` until archival completes (export to the archival repository + clear DB content + set tag to `ARCHIVE_REPO`). If the process crashes mid-archival, the trace remains `TRACKING_STORE` and will be selected again on the next run (re-export overwrites the file, then clear and set tag).
Traces stay in `TRACKING_STORE` until archival completes (export to the archival repository + clear DB content + set tags to `ARCHIVE_REPO` and the archive URI). If the process crashes mid-archival, the trace remains `TRACKING_STORE` and will be selected again on the next run (re-export overwrites the file, then clear and set tags).

The archival repository URI for archived traces (where to read `traces.pb`) continues to be recorded via the existing `mlflow.artifactLocation` tag. The `mlflow.trace.spansLocation` tag disambiguates how that URI is interpreted: when span data is in `ARTIFACT_REPO`, it points to the existing artifact-backed trace path; when span data is in `ARCHIVE_REPO`, it points to the archived file location in the archival repository for `traces.pb`. The **effective archival repository root** for a given trace is the workspace's `trace_archival_location` when set, otherwise the server's global `--trace-archival-location` (or default artifact root).
Archived traces should use a **separate trace tag** such as `mlflow.trace.archiveLocation` to record the archival repository URI for `traces.pb`. The existing `mlflow.artifactLocation` tag must continue to point at the trace's ordinary artifact root so that existing artifact-backed traces and trace attachments keep working unchanged. The `mlflow.trace.spansLocation` tag then identifies which location to use for span payload retrieval: `ARTIFACT_REPO` uses `mlflow.artifactLocation`, while `ARCHIVE_REPO` uses `mlflow.trace.archiveLocation`. The **effective archival repository root** for a given trace is the workspace's `trace_archival_location` when set, otherwise the server's global `--trace-archival-location` (or default artifact root).

**Workspaces table: per-workspace archival repository and retention overrides.** Add a column to the `workspaces` table to store an optional trace archival location (server-side configuration) for each workspace. When set, it overrides the server's global archival repository for that workspace (for both archival write and retrieval). This supports multi-tenant deployments where different workspaces use different trace storage locations.

Expand Down Expand Up @@ -250,8 +250,8 @@ At the start of each scheduler pass:
2. **Process emergency archival first:** Check each workspace for experiments marked with the experiment tag `mlflow.trace.archiveNow` and process those experiments before ordinary retention-based work. When the tag value is `{}` or otherwise omits `older_than`, all traces in `TRACKING_STORE` for that experiment are eligible. When the tag value includes `older_than`, only traces older than that threshold are eligible.
3. **Resolve the effective retention policy:** For each experiment considered for normal archival, resolve retention from server, workspace, and experiment settings using the inheritance rules described above.
4. **Select traces to archive:** Query `trace_info` for traces older than the effective threshold whose `mlflow.trace.spansLocation` tag is `TRACKING_STORE` (or missing, treated as `TRACKING_STORE`).
5. **Export span content:** For each trace, read all spans from the `spans` table, convert to `TracesData` protobuf, and write to the archival repository. The archival repository root used for the write is the workspace's `trace_archival_location` (if set) for that trace's experiment/workspace, else the server's global trace archival location. Record the archival repository location in trace metadata using the same unambiguous representation described above.
6. **Clear DB span content and set tag:** Update `spans.content` to an empty string and set the trace tag `mlflow.trace.spansLocation` = `SpansLocation.ARCHIVE_REPO`.
5. **Export span content:** For each trace, read all spans from the `spans` table, convert to `TracesData` protobuf, and write to the archival repository. The archival repository root used for the write is the workspace's `trace_archival_location` (if set) for that trace's experiment/workspace, else the server's global trace archival location. Record the resulting archive URI in a dedicated trace tag such as `mlflow.trace.archiveLocation`, while leaving `mlflow.artifactLocation` unchanged.
6. **Clear DB span content and set tags:** Update `spans.content` to an empty string, set `mlflow.trace.spansLocation = SpansLocation.ARCHIVE_REPO`, and persist the archive URI tag from step 5.
7. **Clear the emergency tag:** If the archival was triggered by `archive now` and the emergency archival pass completes successfully, clear the tag so it behaves as a one-shot failsafe.

**Crash recovery:** Traces remain in `TRACKING_STORE` until step 6 completes. If the process crashes after step 5 but before step 6, those traces are still `TRACKING_STORE` and will be selected again on the next run. The archival then re-exports (overwriting the file) and completes step 6. Re-running on traces already in `ARCHIVE_REPO` is a no-op (they are not candidates). If an emergency-tagged experiment does not complete successfully, the tag should remain so a later scheduler pass retries the emergency archival, unless the only remaining non-archived traces are terminal failures marked with `mlflow.trace.archivalFailure`.
Expand All @@ -268,9 +268,9 @@ The archival is performed in batches to avoid locking the database for extended

#### Retrieval Changes

Retrieval of archived trace data uses a **handler-based dispatch** pattern. The tracking store (`SqlAlchemyStore`) attempts to load spans from the database; when the trace's `SpansLocation` tag indicates the spans are not in the DB (e.g. `ARCHIVE_REPO`), the store raises an `MlflowTracingException`. The server handler catches this and dispatches to the appropriate loader based on the `SpansLocation` tag value. A shared `load_archived_spans()` function in `mlflow/tracing/archive_repo.py` resolves the artifact URI from the store and downloads the `traces.pb` file from the archival repository.
Retrieval of archived trace data should dispatch based on `mlflow.trace.spansLocation`. A shared `load_archived_spans()` function in `mlflow/tracing/archive_repo.py` resolves the archive URI from the trace tags and downloads the `traces.pb` file from the archival repository.

The `AbstractStore` provides a `get_archival_repository_artifact_uri(trace_info)` method that concrete stores implement to resolve the archival repository URI for a given trace. The dispatch logic lives in the server handler layer rather than the store, keeping the store layer focused on DB operations and archival primitives.
`mlflow.artifactLocation` remains the source of truth for trace attachments and existing artifact-backed `traces.json` payloads. Archived payload resolution should instead use `mlflow.trace.archiveLocation`, avoiding any need to move or rewrite attachments during archival.

```python
# Server handler (mlflow/server/handlers.py) — retrieval dispatch:
Expand All @@ -287,7 +287,7 @@ if trace_data is None:

# Archival repository loader (mlflow/tracing/archive_repo.py):
def load_archived_spans(store, trace_info: TraceInfo) -> list[Span]:
uri = store.get_archival_repository_artifact_uri(trace_info)
uri = get_archive_uri_for_trace(trace_info)
artifact_repo = get_artifact_repository(uri)
return artifact_repo.download_trace_data_pb()
```
Expand Down Expand Up @@ -361,7 +361,7 @@ Although archival execution is server-owned, the CLI should still expose experim

#### Trace Deletion and Archived File Cleanup

When traces are permanently deleted (via `mlflow traces delete` or `_delete_traces`), the implementation must also clean up archived span files in the archival repository. For traces whose `mlflow.trace.spansLocation` tag is `ARCHIVE_REPO`, the corresponding `<archive-root>/<experiment_id>/traces/<trace_id>/artifacts/traces.pb` file must be deleted from the archival repository. The archival root is the workspace's `trace_archival_location` if set, else the server's global trace archival location. Failure to do so would leave orphaned files. The deletion of repository files should be best-effort: if the repository is unavailable, the DB records should still be deleted (the orphaned file is harmless and can be cleaned up later by a separate sweep).
When traces are permanently deleted (via `mlflow traces delete` or `_delete_traces`), the implementation must also clean up archived span files in the archival repository. For traces whose `mlflow.trace.spansLocation` tag is `ARCHIVE_REPO`, the corresponding `<archive-root>/<experiment_id>/traces/<trace_id>/artifacts/traces.pb` file must be deleted using the URI stored in `mlflow.trace.archiveLocation`. The trace's regular artifact root (`mlflow.artifactLocation`) remains responsible for attachments and any existing `traces.json` payloads. Failure to delete the archived file should be best-effort: if the repository is unavailable, the DB records should still be deleted (the orphaned file is harmless and can be cleaned up later by a separate sweep).

### Option 2: Database Partitioning + Cold Table

Expand Down Expand Up @@ -435,11 +435,11 @@ Protobuf binary offers the best combination of size reduction, speed, standards
| :------------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `SqlAlchemyStore` | Moderate | Add `SpansLocation` tag awareness to `get_trace`, `batch_get_traces`; add `_get_archival_repository_uri`, `_load_spans_from_archival_repository`; add archival methods; when selecting candidates, resolve effective retention from server, workspace, and experiment policy (see Archival Process). |
| `FileStore` | None | Not supported for archival (archival requires `SqlAlchemyStore`); no changes needed |
| `AbstractStore` | Moderate | Add archival primitives used by the server-owned scheduler (`collect_archive_candidates`, `read_trace_for_archive`, `mark_trace_archived`, `find_archived_trace_uris`, `get_archival_repository_artifact_uri`) |
| `AbstractStore` | Moderate | Add archival primitives used by the server-owned scheduler (`collect_archive_candidates`, `read_trace_for_archive`, `mark_trace_archived`, `find_archived_trace_uris`) |
| Server jobs / Huey | Moderate | Add a periodic archival task that resolves policy, prioritizes emergency `archive now` work, randomizes workspace order when workspaces are enabled, and skips overlapping runs via the existing lock-based scheduler pattern |
| `mlflow server` CLI | Low | Add `--trace-archival-location` and server-owned archival policy configuration |
| REST API handlers | Low | No new client-triggered archival endpoint is required, but existing retrieval handlers gain dispatch logic for archived traces |
| Proto definitions | Low | Extend `SpansLocation` enum (`ARCHIVE_REPO`); trace tags already expose `mlflow.trace.spansLocation` in `TraceInfoV3` |
| Proto definitions | Low | Extend `SpansLocation` enum (`ARCHIVE_REPO`) and expose the new `mlflow.trace.archiveLocation` tag alongside `mlflow.trace.spansLocation` in `TraceInfoV3` |
| DB migrations | Low | Alembic migration: add `trace_archival_location` (TEXT, nullable) and `trace_archival_retention` (TEXT, nullable) on `workspaces` for per-workspace archival repository and retention overrides. |
| `search_traces` | Low | Trace-level and column-backed span filters continue to work from retained DB metadata; JSON-based span filters naturally exclude archived traces once `spans.content` is cleared |
| Python client | Low | `get_trace` / `search_traces` APIs unchanged; no new archival trigger API required |
Expand Down Expand Up @@ -540,7 +540,7 @@ A future enhancement could introduce a second span storage mode alongside the cu
**Key characteristics of `repository` mode:**

- Span content is written to the archival repository using the same OTLP protobuf format and file layout as archival (`<archive-root>/<experiment_id>/traces/<trace_id>/artifacts/traces.pb`)
- The `mlflow.trace.spansLocation` tag is set to `ARCHIVE_REPO` at ingestion time (no intermediate `TRACKING_STORE` state)
- The `mlflow.trace.spansLocation` tag is set to `ARCHIVE_REPO` at ingestion time (no intermediate `TRACKING_STORE` state), and the archive payload URI is recorded in `mlflow.trace.archiveLocation`
- Trace-level metadata (trace info, tags, request metadata, metrics) is still stored in the DB for search and filtering
- No span-level search is available (trace-level search only)
- Retention policies do not apply (there is no span content in the DB to archive)
Expand Down