Add file_extension fields to BlobType#3406
Conversation
caaa6f8 to
8ec4ae4
Compare
| "docker>=4.0.0", | ||
| "docstring-parser>=0.9.0", | ||
| "flyteidl>=1.16.1,<2.0.0a0", | ||
| "flyteidl @ git+https://github.com/ddl-rliu/flyte.git@1ba7c1545198a2820348323e64c23a41a19e7a7d#subdirectory=flyteidl", |
There was a problem hiding this comment.
Bump this after flyteorg/flyte#7009 merges
| return None | ||
|
|
||
|
|
||
| class FileExtension: |
There was a problem hiding this comment.
Follows same pattern as BatchSize:
flytekit/flytekit/core/type_engine.py
Line 75 in 71194a4
52b0c79 to
ce58c01
Compare
Port new BlobType fields file_extension and enable_legacy_filename to flytekit.
FlyteFile inputs can be annotated with the FileDownloadConfig annotation to
configure the file extension to use during the copilot download phase.
e.g.
```python
def t1(file: Annotated[FlyteFile, FileDownloadConfig(file_extension="csv")]):
... # copilot downloads the file to e.g. /inputs/file.csv
versus...
def t1(file: FlyteFile["csv"]):
... # copilot downloads the file to e.g. /inputs/file
```
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
ce58c01 to
3edf6e4
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3406 +/- ##
==========================================
- Coverage 45.78% 39.44% -6.34%
==========================================
Files 317 216 -101
Lines 28359 22873 -5486
Branches 3015 3021 +6
==========================================
- Hits 12983 9022 -3961
+ Misses 15278 13755 -1523
+ Partials 98 96 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| def __init__(self, val: str = ""): | ||
| self._val = val | ||
|
|
||
| pattern = r"^[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)*$" |
There was a problem hiding this comment.
nit: Could we add a comment saying that this matches single and multi-part file extension (e.g. tar.gz)?
| ... # copilot downloads the file to e.g. /inputs/file | ||
| ``` | ||
|
|
||
| val: (Default is "") The file extension (e.g. "csv", "parquet") to use during copilot download. |
There was a problem hiding this comment.
Let's move this docstring to __init__ below. And change it to:
val: The file extension (e.g. "csv", "parquet") to use during copilot download.| val: (Default is "") The file extension (e.g. "csv", "parquet") to use during copilot download. | ||
| """ | ||
|
|
||
| def __init__(self, val: str = ""): |
There was a problem hiding this comment.
| def __init__(self, val: str = ""): | |
| def __init__(self, val: str): |
val cannot be "", which will not pass following regex check
| return _types_pb2.BlobType( | ||
| format=self.format, | ||
| dimensionality=self.dimensionality, | ||
| file_extension=self._file_extension, |
There was a problem hiding this comment.
| file_extension=self._file_extension, | |
| file_extension=self.file_extension, |
nit
| ".csv", | ||
| "my file", | ||
| "../../escape", | ||
| "csv!", |
See flyteorg/flyte#7009
Tracking issue
Closes flyteorg/flyte#7024 [BUG] [copilot] File extensions are missing when copilot downloads Blob/FlyteFile inputs
Why are the changes needed?
(Keeping this PR in draft until flyteorg/flyte#7009 is merged)
After flyteorg/flyte#7009 merges, adding the new file_extension field to BlobType, this flytekit PR will enable users to configure the file extension on FlyteFile inputs. This addresses the issue where file extensions are missing when copilot downloads Blob/FlyteFile inputs.
What changes were proposed in this pull request?
Add the file_extension field to BlobType. Add annotation "FileExtension", which is used to annotate a FlyteFile when we want to download the file with a specific extension. For example,
Under the hood, this sets
file_extensionin BlobType.How was this patch tested?
Setup process
Ran a workflow locally to test the changes
Screenshots
Check all the applicable boxes
Related PRs
Docs link