Skip to content

Fix typos, grammar, and comment inconsistencies in parquet.thrift#573

Open
iemejia wants to merge 2 commits into
apache:masterfrom
iemejia:fix/thrift-comments
Open

Fix typos, grammar, and comment inconsistencies in parquet.thrift#573
iemejia wants to merge 2 commits into
apache:masterfrom
iemejia:fix/thrift-comments

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented Jun 2, 2026

Summary

Fix typos, grammar, and comment inconsistencies in the canonical Thrift schema definition.

Changes

  • Fix typos: "to be be", "documention", "not necessary"
  • Remove off-by-one in DataPageHeaderV2 is_compressed comment
  • Fix article agreement ("a element" -> "an element", "a OffsetIndex" -> "an OffsetIndex")
  • Disambiguate compressed_page_size comment in PageLocation (it includes the header; the field of the same name on PageHeader does not)
  • Fix "edges interpolation" -> "edge interpolation" in Geospatial comments
  • Capitalize proper nouns: Hive, Pig; normalize GZIP casing
  • Add terminal periods for consistency
  • Clarify BIT_PACKED is superseded by RLE (cross-reference Encodings.md)
  • Missing space before parenthesis in frameworks list

Validation

Thrift definition compiles cleanly after all changes. No semantic/behavioral changes to the format specification.

Split from #572 for easier review.

- Fix typos: "to be be", "documention", "not necessary"
- Remove off-by-one in DataPageHeaderV2 is_compressed comment
- Fix article agreement ("a element" -> "an element", "a OffsetIndex" -> "an OffsetIndex")
- Disambiguate compressed_page_size comment in PageLocation
- Fix "edges interpolation" -> "edge interpolation" in Geospatial comments
- Capitalize proper nouns: Hive, Pig; normalize GZIP casing
- Add terminal periods for consistency
- Clarify BIT_PACKED is superseded by RLE (cross-reference Encodings.md)
- Missing space before parenthesis in frameworks list

Thrift validation passes after these edits.
Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. Just a few minor nits.

Comment thread src/main/thrift/parquet.thrift Outdated
* Embedded Geometry logical type annotation
*
* Geospatial features in the Well-Known Binary (WKB) format and edges interpolation
* Geospatial features in the Well-Known Binary (WKB) format and edge interpolation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "edges" is the common usage in GeoParquet. Maybe this should be in backticks (edges) here and elsewhere.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed to edges interpolation with backticks.

Comment thread src/main/thrift/parquet.thrift Outdated
* plain type.
* in a data page use RLE_DICTIONARY instead.
* in a Dictionary page use PLAIN instead
* in a Dictionary page use PLAIN instead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* in a Dictionary page use PLAIN instead.
* In a Dictionary page use PLAIN instead.

If we're adding punctuation, then shouldn't we also capitalize? And now that I think of it, should it be "For a..." rather than "In a..."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed both sentences to start with "For a ..." and capitalized: For a data page use RLE_DICTIONARY instead. / For a Dictionary page use PLAIN instead.

Comment thread src/main/thrift/parquet.thrift Outdated
RLE = 3;

/** Bit packed encoding. This can only be used if the data has a known max
/** Deprecated: Bit packed encoding. This can only be used if the data has a known max
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're pretty inconsistent here. Sometimes "DEPRECATED:", others "Deprecated:". Perhaps we should standardize?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, standardized all label-style occurrences to DEPRECATED: (uppercase), inspired by the RFC convention for keywords. Inline prose uses (e.g. "this field is now deprecated") are left lowercase as normal English.

Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just a few more "edges" to clean up.

*
* Geospatial features in the WKB format with an explicit (non-linear/non-planar)
* edges interpolation algorithm.
* edge interpolation algorithm.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* edge interpolation algorithm.
* `edges` interpolation algorithm.

* defaults to "OGC:CRS84".
*
* An optional algorithm can be set to correctly interpret edges interpolation
* An optional algorithm can be set to correctly interpret edge interpolation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* An optional algorithm can be set to correctly interpret edge interpolation
* An optional algorithm can be set to correctly interpret `edges` interpolation

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @iemejia and @etseidl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants