Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578
Open
iemejia wants to merge 2 commits into
Open
Fix typos, grammar, and consistency in Encryption, Contributing, and BinaryProtocolExtensions docs#578iemejia wants to merge 2 commits into
iemejia wants to merge 2 commits into
Conversation
…BinaryProtocolExtensions docs
Encryption.md:
- Fix double-negative; align GCM invocation limit to NIST
- "Data PageHeader" -> "Data Page Header" (spacing consistency)
- Replace "allows to" with idiomatic English
- Fix smart quotes to ASCII for magic-bytes literal
- Remove double spaces; fix "the the FileMetaData"
- "explictly" -> "explicitly"
- Hyphenate compound adjectives ("2 byte short" -> "2-byte short")
- Fix section heading numbering ("## 5 File Format" -> "## 5. File Format")
- Fix mass noun article ("from a secret data" -> "from secret data")
CONTRIBUTING.md:
- Fix 7 typos: docuemnt, demostrate, interopability, libaries,
highlighed, compatiblity, an prototype
- Fix possessive: "features desirability" -> "a feature's desirability"
- Fix agreement: "an external dependencies" -> "an external dependency"
- Add commas after introductory clauses
- Fix comma splice -> semicolon
BinaryProtocolExtensions.md:
- Fix "FileMetadata" -> "FileMetaData" (4 occurrences; match thrift struct)
- Fix "Flatbuffers"/"flatbuffer" -> "FlatBuffers" (5 occurrences; official capitalization)
- Fix "implementers which" -> "implementers who" (people)
- Fix missing copula: "extension shared" -> "extension is shared"
etseidl
reviewed
Jun 2, 2026
Contributor
etseidl
left a comment
There was a problem hiding this comment.
Just a few nits, thanks!
| * Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL). | ||
|
|
||
| Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. | ||
| Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers who MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. |
Contributor
There was a problem hiding this comment.
Gads this could use some line breaks 😅
Member
Author
There was a problem hiding this comment.
Done, broke the paragraph into shorter lines.
|
|
||
| 2. New encodings should be fully specified in this repository and not | ||
| rely on an external dependencies for implementation (i.e. `parquet-format` is | ||
| rely on an external dependency for implementation (i.e. `parquet-format` is |
Contributor
There was a problem hiding this comment.
Suggested change
| rely on an external dependency for implementation (i.e. `parquet-format` is | |
| rely on external dependencies for implementation (i.e. `parquet-format` is |
| key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet files encrypted with | ||
| multiple keys (footer and column keys), the constraint on the number of invocations is applied | ||
| to each key separately. | ||
| key shall not be used for more than 2^32 total module encryptions, as per the NIST specification. |
Contributor
There was a problem hiding this comment.
Perhaps we should still point out that 2^32 modules means in practice 2^31 pages.
Member
Author
There was a problem hiding this comment.
Good point. Added a sentence clarifying that since each data page requires two module encryptions (header + data), 2^32 modules means in practice no more than 2^31 pages per key.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix typos, grammar, and formatting across ancillary specification documents: Encryption, Contributing guide, and Binary Protocol Extensions.
Changes
Encryption.md
CONTRIBUTING.md
BinaryProtocolExtensions.md
Validation
No semantic/behavioral changes to the format specification. All fixes are documentation-only.
Split from #572 for easier review.