Problem
The current codebase has two separate, ad-hoc mechanisms for post-download data transformation before the GeoZarr is written:
Neither supports multiple ordered transformations, and adding new conversions requires changing core code rather than configuration.
Proposed solution
Replace both fields with a single transforms list in the dataset YAML, using the same dotted-path callable pattern already used by ingestion.function:
```yaml
era5_land temperature
transforms:
- function: climate_api.transforms.convert_units
era5_land precipitation
transforms:
- function: climate_api.transforms.deaccumulate_era5
- function: climate_api.transforms.convert_units
```
Each callable has the signature (ds: xr.Dataset, dataset: dict[str, Any]) -> xr.Dataset and is resolved at runtime via importlib, exactly like download functions. convert_units reads the existing units/convert_units fields (kept for STAC metadata). Transforms from external packages (e.g. dhis2eo) are supported without any changes to core code.
Changes required
- Add
src/climate_api/transforms/ module with at least convert_units (replacing _UNIT_CONVERSIONS) and a placeholder/implementation for deaccumulate_era5
- Update
build_dataset_zarr() in downloader.py to run the transforms pipeline instead of calling _apply_unit_conversion() directly
- Update
era5_land.yaml to use transforms: entries, removing pre_process and keeping convert_units/units for STAC metadata only
- Remove the hardcoded
_UNIT_CONVERSIONS dict and _apply_unit_conversion() from downloader.py
Problem
The current codebase has two separate, ad-hoc mechanisms for post-download data transformation before the GeoZarr is written:
convert_units: degC/convert_units: mm— implemented in feat: apply unit conversion during zarr build #78 with a hardcoded_UNIT_CONVERSIONSlookup tablepre_process: ['deaccumulate_era5']— defined in YAML but not yet implementedNeither supports multiple ordered transformations, and adding new conversions requires changing core code rather than configuration.
Proposed solution
Replace both fields with a single
transformslist in the dataset YAML, using the same dotted-path callable pattern already used byingestion.function:```yaml
era5_land temperature
transforms:
era5_land precipitation
transforms:
```
Each callable has the signature
(ds: xr.Dataset, dataset: dict[str, Any]) -> xr.Datasetand is resolved at runtime viaimportlib, exactly like download functions.convert_unitsreads the existingunits/convert_unitsfields (kept for STAC metadata). Transforms from external packages (e.g.dhis2eo) are supported without any changes to core code.Changes required
src/climate_api/transforms/module with at leastconvert_units(replacing_UNIT_CONVERSIONS) and a placeholder/implementation fordeaccumulate_era5build_dataset_zarr()indownloader.pyto run thetransformspipeline instead of calling_apply_unit_conversion()directlyera5_land.yamlto usetransforms:entries, removingpre_processand keepingconvert_units/unitsfor STAC metadata only_UNIT_CONVERSIONSdict and_apply_unit_conversion()fromdownloader.py