Skip to content

Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578

Open
iemejia wants to merge 2 commits into
apache:masterfrom
iemejia:fix/encryption-meta-docs
Open

Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578
iemejia wants to merge 2 commits into
apache:masterfrom
iemejia:fix/encryption-meta-docs

Conversation

@iemejia
Copy link
Copy Markdown
Member

@iemejia iemejia commented Jun 2, 2026

Summary

Fix typos, grammar, and formatting across ancillary specification documents: Encryption, Contributing guide, and Binary Protocol Extensions.

Changes

Encryption.md

  • Fix double-negative; align GCM invocation limit to NIST
  • "Data PageHeader" -> "Data Page Header" (spacing consistency)
  • Replace "allows to" with idiomatic English
  • Fix smart quotes to ASCII for magic-bytes literal
  • Remove double spaces; fix "the the FileMetaData"
  • "explictly" -> "explicitly"
  • Hyphenate compound adjectives ("2 byte short" -> "2-byte short")
  • Fix section heading numbering ("## 5 File Format" -> "## 5. File Format")
  • Fix mass noun article ("from a secret data" -> "from secret data")

CONTRIBUTING.md

  • Fix 7 typos: docuemnt, demostrate, interopability, libaries, highlighed, compatiblity, an prototype
  • Fix possessive: "features desirability" -> "a feature's desirability"
  • Fix agreement: "an external dependencies" -> "an external dependency"
  • Add commas after introductory clauses
  • Fix comma splice -> semicolon

BinaryProtocolExtensions.md

  • Fix "FileMetadata" -> "FileMetaData" (4 occurrences; match thrift struct)
  • Fix "Flatbuffers"/"flatbuffer" -> "FlatBuffers" (5 occurrences; official capitalization)
  • Fix "implementers which" -> "implementers who" (people)
  • Fix missing copula: "extension shared" -> "extension is shared"

Validation

No semantic/behavioral changes to the format specification. All fixes are documentation-only.

Split from #572 for easier review.

…BinaryProtocolExtensions docs

Encryption.md:
- Fix double-negative; align GCM invocation limit to NIST
- "Data PageHeader" -> "Data Page Header" (spacing consistency)
- Replace "allows to" with idiomatic English
- Fix smart quotes to ASCII for magic-bytes literal
- Remove double spaces; fix "the the FileMetaData"
- "explictly" -> "explicitly"
- Hyphenate compound adjectives ("2 byte short" -> "2-byte short")
- Fix section heading numbering ("## 5 File Format" -> "## 5. File Format")
- Fix mass noun article ("from a secret data" -> "from secret data")

CONTRIBUTING.md:
- Fix 7 typos: docuemnt, demostrate, interopability, libaries,
  highlighed, compatiblity, an prototype
- Fix possessive: "features desirability" -> "a feature's desirability"
- Fix agreement: "an external dependencies" -> "an external dependency"
- Add commas after introductory clauses
- Fix comma splice -> semicolon

BinaryProtocolExtensions.md:
- Fix "FileMetadata" -> "FileMetaData" (4 occurrences; match thrift struct)
- Fix "Flatbuffers"/"flatbuffer" -> "FlatBuffers" (5 occurrences; official capitalization)
- Fix "implementers which" -> "implementers who" (people)
- Fix missing copula: "extension shared" -> "extension is shared"
Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits, thanks!

Comment thread BinaryProtocolExtensions.md Outdated
* Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL).

Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers.
Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers who MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gads this could use some line breaks 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, broke the paragraph into shorter lines.

Comment thread CONTRIBUTING.md Outdated

2. New encodings should be fully specified in this repository and not
rely on an external dependencies for implementation (i.e. `parquet-format` is
rely on an external dependency for implementation (i.e. `parquet-format` is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rely on an external dependency for implementation (i.e. `parquet-format` is
rely on external dependencies for implementation (i.e. `parquet-format` is

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied, thanks!

Comment thread Encryption.md
key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet files encrypted with
multiple keys (footer and column keys), the constraint on the number of invocations is applied
to each key separately.
key shall not be used for more than 2^32 total module encryptions, as per the NIST specification.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should still point out that 2^32 modules means in practice 2^31 pages.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a sentence clarifying that since each data page requires two module encryptions (header + data), 2^32 modules means in practice no more than 2^31 pages per key.

Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants