Generalise CLI/Workflows (for triggering from the app).

## Geometry

We have these geometries: `ACSC2`, `ABS Aus States`, `CWA`, `EEZ`.

Geometry workflows all do these steps:

1. Download a zipped shapefile from a URL to our bucket (cache).
2. Unzip the shapefile and write its data as both parquet and PMTiles (convert, "zip-to-parquet").
3. Write geometry and provenance to DB (provenance).

The workflow template for adding a geometry will be easy to generalise. They use different CLI commands for caching (e.g. `csdr eez cache`, `csdr aus-states cache`), which I will aim to consolidate .

- [x] To Do: Generalise CLI commands for geometries (e.g. `csdr eez cache`, `csdr aus-states cache`)
- [x] To Do: Make a general geometries.yaml workflow template that the specific geometry workflow templates call.

Resolved with:
https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/40, https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial/pull/115

## Dataset

We have these datasets: `ACA Reef`, `ACE`, `VIDA Buildings`, `DEP Seagrass`, `GMW v3`, `GMW v4`.

#### Data Types
1. **STAC Collection/STAC API to STAC-Geoparquet**: , `ACE`, `DEP Seagrass`. These do index, and provenance. Index CLI commands are different per dataset (e.g. `csdr ace index`). We should attempt to generalise this.
2. **Zipped TIFFs/COGs to STAC and STAC-Geoparquet**: `GMW v3`, `GMW v4`. Cache, extract, index, provenance. Already generalised.
3. **Partitioned parquets (vector points)**: `VIDA Buildings`. Index, provenance.
4. **Zipped shapefile (vector polygons)**: `ACA Reef`. Extract, index, provenance. This is just like the dataset workflows.

We can make a generalised dataset workflow template that processes data using 1 of these 4 categories.

- [x] To Do: Generalise ACE and DEP Seagrass CLI commands (e.g. `csdr ace index`)
- [x] To Do: Generalise Partitioned parquets CLI commands (e.g. `csdr buildings index` to `csdr partitioned_parquet index`) so that it can be used for more than just the buildings dataset.
- [x] To Do: Generalise Zipped shapefile CLI commands (e.g. `csdr aca extract` to `csdr zipped_shapefile extract`) so that it can be used for more than just the ACA Reef dataset. This has a lot of overlap with the geometry workflow.
- [x] To Do: Make a general datasets.yaml workflow template that calls one of 4 dataset-category workflow templates, that then calls a specific dataset workflow template. Add the category (1-4).

WIP: https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/41, https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial/pull/122

## Product

Product workflows are already quite generalised. The main differences handing is in the process geometry step and the CLI handling the indicators. The indicators are currently hard coded. We need to plan how to support more dynamic products with other indicators.

Product commands are already common. They do this:
1. List geometries, create run id, exclude geometries.
2. Process geometry (the main calculation).
3. Consolidate and provenance.

The indicator types we currently support are:
- sum_x_area: mangrove, seagrass, reef, intertidal, saltmarsh,
- count_x: count_buildings
- percent-x-area: mangrove, intertidal, saltmarsh, seagrass.

- [ ] To Do: Plan generalising indicators. How flexible will this be?
- [ ] To Do: Make a general products.yaml workflow template that the specific product workflow templates call.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalise CLI/Workflows (for triggering from the app). #121

Geometry

Dataset

Data Types

Product

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generalise CLI/Workflows (for triggering from the app). #121

Description

Geometry

Dataset

Data Types

Product

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions