Skip to content

[SPEC] Add relative paths to v4 spec#15630

Open
danielcweeks wants to merge 7 commits into
apache:mainfrom
danielcweeks:relative-paths-spec
Open

[SPEC] Add relative paths to v4 spec#15630
danielcweeks wants to merge 7 commits into
apache:mainfrom
danielcweeks:relative-paths-spec

Conversation

@danielcweeks
Copy link
Copy Markdown
Contributor

Adds text to the spec for relative paths.

See full proposal at #13141

@github-actions github-actions Bot added the Specification Issues that may introduce spec changes. label Mar 13, 2026
Copy link
Copy Markdown
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented a bit on .. in path resolution.

it'd be a good test to submit a v3 manifest with file:/tables/table1/../../etc/passwd as a path and see if relativizing it detected the invalid path at that point

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
@danielcweeks danielcweeks marked this pull request as ready for review March 20, 2026 20:13
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Copy link
Copy Markdown
Contributor

@rambleraptor rambleraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a couple stylistic things to help improve readability

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Copy link
Copy Markdown
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, it looks good to me. just some minor comments/questions

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Copy link
Copy Markdown
Contributor

@wypoon wypoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that you're trying to avoid mentioning path separators, and that that makes things unclear and confusing. I feel that it makes more sense to say that table location does not end in a path separator, that relative paths do not begin with a path separator, and when appending relative paths, we need to add the path separators in the appropriate places.

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md
Comment thread format/spec.md
Comment thread format/spec.md Outdated
Comment thread format/spec.md
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated

All location fields in format versions 3 and prior contain fully-qualified paths.

Version 4 of the Iceberg spec adds support for relative locations in metadata, enabling tables to be relocated without rewriting metadata files. Relative locations are allowed in all metadata tracked location fields and are resolved against the table's base location. The table's location may be fixed in table metadata or inferred, but is intended to be managed and supplied by a catalog. Requirements for relativization and resolution are in [Relative Paths](#path-resolution)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing . at the end.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to link to #paths-in-metadata?

Copy link
Copy Markdown
Contributor

@zhjwpku zhjwpku May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that includes Path Relativization and Path Resolution. Relative Paths is a little confusing if it does not link to the section with the same name.

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md
Comment thread format/spec.md
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

slight nit for clarification

Comment thread format/spec.md Outdated
Comment thread format/spec.md
Path relativization is the process of converting an absolute path to a relative path by removing the table location prefix. This is used when persisting paths to metadata files.

* If an absolute path starts with the table location immediately followed by a separator character, the relative path is the remainder of the string after the separator character.
* If an absolute path does not start with the table location immediately followed by the separator character, it is stored as an absolute path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be helpful to explicitly highlight stored as an absolute path without modification

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to clarify. I think that's important to say for consuming from metadata, but how the persisted path is arrived at is different. If something is producing the path, it can pretty much do what ever it wants with the structure as long as it's absolute when it's first persisted.

This is a little nuanced, but I think it would be overreaching in this particular context.

Comment thread format/spec.md Outdated
Comment thread format/spec.md

### Paths in Metadata

Path strings stored in Iceberg metadata location fields are classified as one of two types:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There are a few references to "fully qualified path" later in the context of v3 and prior, without it being explicitly defined. Since we're classifying paths into two types
below, it might be worth briefly noting that fully qualified paths from v3 and prior are considered absolute paths. This could help connect the dots more easily.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to do this (see other comments on this topic). We don't want to go back and define things that weren't defined for prior versions since it could introduce additional requirements on older versions. The prior spec only referred to "fully-qualified" and "URI with Scheme" for fields and we're not trying to rewrite those versions of the spec.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan, found it in #15630 (comment)

Comment thread format/spec.md Outdated
* If the path contains a URI scheme, it is absolute and is used without modification.
* If the path does not contain a URI scheme, the resolved path is the table location followed by the relative path joined by the URI separator character `/`.

The relative portion is joined to the prefix (table location) without consideration of any additional separator characters. The recommended convention for table location is to not end in a path separator because the join process would add a second separator character. (See example below).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need the examples for duplicate separator. I think that's pretty straight forward?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others asked for this explicitly to show what is expected if you sufix/prefix with a separator and what the behavior would look like. The point is to show that you do not de-dup or strip them.

Comment thread format/spec.md Outdated

#### Table Location Specification

When the `location` field is present in table metadata, it is used directly as the table's base location. When the `location` field is not present (v4 and later), the table location must be provided. How the table location is persisted or determined when not specified in metadata is not a table-level concern; catalogs should provide a table's location
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the `location` field is present in table metadata, it is used directly as the table's base location. When the `location` field is not present (v4 and later), the table location must be provided. How the table location is persisted or determined when not specified in metadata is not a table-level concern; catalogs should provide a table's location
When the `location` field is present in table metadata, it is used directly as the table's base location. When the `location` field is not present (v4 and later), the table location must be maintained and provided by the catalog. ```

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to restrict this to catalogs only.

Please see this comment: #15630 (comment)

Comment thread format/spec.md
Comment thread format/spec.md
Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

Comment thread format/spec.md

The relative portion is joined to the prefix (table location) without consideration of any additional separator characters. The recommended convention for table location is to not end in a path separator because the join process would add a second separator character. (See example below.)

Paths in manifests produced prior to v4 are fully-qualified and must be produced with a URI scheme if the scheme was omitted to be consistent with V4 paths.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I had to read this sentence multiple times to understand it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too ;), might need a comma somewhere.

Comment thread format/spec.md

The relative portion is joined to the prefix (table location) without consideration of any additional separator characters. The recommended convention for table location is to not end in a path separator because the join process would add a second separator character. (See example below.)

Paths in manifests produced prior to v4 are fully-qualified and must be produced with a URI scheme if the scheme was omitted to be consistent with V4 paths.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks we use v4 instead of V4 in all other places.

rdblue

This comment was marked as off-topic.

Comment thread format/spec.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.