Skip to content

Can the writer support the scenario when the output is a positional file with header&data? #843

@Il-Pela

Description

@Il-Pela

Question

Can the writer support the scenario when the output is a positional file with header&data?

Considering header and data as two different sets of records having different schemas, usually header is 1 row while data is N row.

What I was considering:

  • manage header and data in same Spark Dataframe. Resulting in a Df with schema the union of columns of header and columns of data. Headers column will have values only in first row and nulls in the rest N rows, while data columns will have nulls in first row and values only in N rows after the first one (where N is the number of records in my df)
  • write the df with 2 different copybooks: 1 with the header schema (writing only top 1 row), 1 with the data schema (writing all rows except first one). when writing all the columns not in the copybook will not be written to outputs
  • the results: 2 separate files
  • merge them together following the order header>data.

Do you think it will work? Is there something smarter?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions