Skip to content

Clarify trace archival payload location.#7

Open
HumairAK wants to merge 1 commit intomlflow:mainfrom
HumairAK:update_archival_rfc
Open

Clarify trace archival payload location.#7
HumairAK wants to merge 1 commit intomlflow:mainfrom
HumairAK:update_archival_rfc

Conversation

@HumairAK
Copy link
Copy Markdown
Contributor

@HumairAK HumairAK commented Apr 7, 2026

This updates the trace archival RFC to keep mlflow.artifactLocation as the existing trace artifact root and to introduce a separate tag for archived traces.pb payloads.

The main reason for the change is that mlflow.artifactLocation already does more than point to trace JSON. It is also the location used for trace attachments. If we repoint that tag during archival, archived traces would start looking for attachments in the archive repo, even though those files were never moved there. That would make attachment reads fail unless archival also copied attachment files over.

I considered that alternative, but it adds extra work and more failure cases to the archival flow. Archival would need to discover, copy, and validate attachments before updating the trace location, and it would need to stay safe and retryable if part of that process failed. That is a lot of extra complexity for something we do not actually need.

Using a separate archive-location tag is simpler and safer. It preserves the current meaning of mlflow.artifactLocation, keeps existing attachment behavior intact, and makes the archived span payload location explicit. It also keeps the retrieval logic easier to understand: regular trace artifacts and attachments continue to use the existing tag, while archived span payloads use the new one.

Here's an example script illustrating the point:

attachments_example.py

from pathlib import Path

import mlflow
from mlflow.tracing.attachments import Attachment
from mlflow.tracing.client import TracingClient


base_dir = Path("./mlflow-sqlite-demo").resolve()
base_dir.mkdir(exist_ok=True)

db_path = base_dir / "mlflow.db"
artifacts_path = base_dir / "artifacts"
artifacts_path.mkdir(exist_ok=True)

mlflow.set_tracking_uri(f"sqlite:///{db_path}")
experiment_name = "attachment-demo-sqlite"

client = mlflow.MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)
if experiment is None:
    experiment_id = client.create_experiment(
        experiment_name,
        artifact_location=artifacts_path.as_uri(),
    )
else:
    experiment_id = experiment.experiment_id

mlflow.set_experiment(experiment_name)

image_bytes = b"\x89PNG\r\n\x1a\nfake-png-content"

with mlflow.start_span(name="demo-span") as span:
    span.set_inputs(
        {
            "prompt": "describe this image",
            "image": Attachment(content_type="image/png", content_bytes=image_bytes),
        }
    )
    trace_id = span.trace_id

trace = mlflow.get_trace(trace_id)
root_span = trace.data.spans[0]
image_ref = root_span.inputs["image"]
parsed = Attachment.parse_ref(image_ref)

print(f"trace_id: {trace_id}")
print(f"stored image field in trace: {image_ref}")
print(f"parsed attachment id: {parsed['attachment_id']}")

tracing_client = TracingClient(mlflow.get_tracking_uri())
trace_info = tracing_client.get_trace_info(trace_id)
artifact_repo = tracing_client._get_artifact_repo_for_trace(trace_info)
stored_bytes = artifact_repo.download_trace_attachment(parsed['attachment_id'])

print(f"trace artifact location: {trace_info.tags['mlflow.artifactLocation']}")
print(f"downloaded attachment bytes: {stored_bytes!r}")

if hasattr(artifact_repo, 'artifact_dir'):
    print(f"local artifact directory: {artifact_repo.artifact_dir}")

Keep mlflow.artifactLocation reserved for trace artifacts and attachments, and document a separate archive-location tag for archived traces so archival does not break attachment reads.
Copy link
Copy Markdown
Collaborator

@mprahl mprahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense and I don't see an alternative such as archiving the attachments in the trace archival either because the artifact repository can be user provided.

@mprahl
Copy link
Copy Markdown
Collaborator

mprahl commented Apr 7, 2026

@B-Step62 could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants