This project seems to upgrade to the latest version of Substrait at a pretty regular basis. This is nice for users who want to use the latest version but also makes it difficult to find and use the version of substrait-python for a specific version of Substrait.
For example, substrait-mlir uses Substrait v0.42.1 and substrait-python v0.12.1 for testing some end-to-end Python use cases. Now, we can't easily upgrade to pyarrow v16 or higher because those versions require substrait-python v0.15 or higher (due to how the generated proto files are packaged, AFAIU), which includes a different version of Substrait than what we use.
Note that Substrait regularly introduces breaking changes, in particular, to the text format, so it is not generally possible to use the protobuf definitions of a newer version.
To remedy the situation, I can imagine three things:
- Add a table to the README and/or PyPI that shows the correspondance between the version(s) of
substrait-python and Substrait. This would make it easier to find the right version of substrait-python. In some situations, that may be all that is necessary. Currently, the only way I know how to do this is to install several versions of substrait-python and print substrait.__substrait_version__, which isn't ideal.
- We might want to investigate whether it isn't possible to package several versions of Substrait inside of
substrait-python, for example, under substrait.v0_74_0 and substrait.v0_42_1, possibly with the latest version living in substrait as well.
- Another possibility would be to separate the packaging of the generated proto files from the other functionality of this repository.
substrait-mlir and, AFAIU, pyarrow only need the former. Such a package could either aim to be "released once" such that, for every version of Substrait, there would be a release of that package that would never need to be updated (ideally with the same versioning scheme as Substrait itself), or use the multi-version scheme from the previous point.
This project seems to upgrade to the latest version of Substrait at a pretty regular basis. This is nice for users who want to use the latest version but also makes it difficult to find and use the version of
substrait-pythonfor a specific version of Substrait.For example,
substrait-mliruses Substrait v0.42.1 andsubstrait-pythonv0.12.1 for testing some end-to-end Python use cases. Now, we can't easily upgrade topyarrowv16 or higher because those versions requiresubstrait-pythonv0.15 or higher (due to how the generated proto files are packaged, AFAIU), which includes a different version of Substrait than what we use.Note that Substrait regularly introduces breaking changes, in particular, to the text format, so it is not generally possible to use the protobuf definitions of a newer version.
To remedy the situation, I can imagine three things:
substrait-pythonand Substrait. This would make it easier to find the right version ofsubstrait-python. In some situations, that may be all that is necessary. Currently, the only way I know how to do this is to install several versions ofsubstrait-pythonand printsubstrait.__substrait_version__, which isn't ideal.substrait-python, for example, undersubstrait.v0_74_0andsubstrait.v0_42_1, possibly with the latest version living insubstraitas well.substrait-mlirand, AFAIU,pyarrowonly need the former. Such a package could either aim to be "released once" such that, for every version of Substrait, there would be a release of that package that would never need to be updated (ideally with the same versioning scheme as Substrait itself), or use the multi-version scheme from the previous point.