Skip to content

Generalise CLI/Workflows (for triggering from the app). #121

@willjnz

Description

@willjnz

Geometry

We have these geometries: ACSC2, ABS Aus States, CWA, EEZ.

Geometry workflows all do these steps:

  1. Download a zipped shapefile from a URL to our bucket (cache).
  2. Unzip the shapefile and write its data as both parquet and PMTiles (convert, "zip-to-parquet").
  3. Write geometry and provenance to DB (provenance).

The workflow template for adding a geometry will be easy to generalise. They use different CLI commands for caching (e.g. csdr eez cache, csdr aus-states cache), which I will aim to consolidate .

  • To Do: Generalise CLI commands for geometries (e.g. csdr eez cache, csdr aus-states cache)
  • To Do: Make a general geometries.yaml workflow template that the specific geometry workflow templates call.

Resolved with:
https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/40, #115

Dataset

We have these datasets: ACA Reef, ACE, VIDA Buildings, DEP Seagrass, GMW v3, GMW v4.

Data Types

  1. STAC Collection/STAC API to STAC-Geoparquet: , ACE, DEP Seagrass. These do index, and provenance. Index CLI commands are different per dataset (e.g. csdr ace index). We should attempt to generalise this.
  2. Zipped TIFFs/COGs to STAC and STAC-Geoparquet: GMW v3, GMW v4. Cache, extract, index, provenance. Already generalised.
  3. Partitioned parquets (vector points): VIDA Buildings. Index, provenance.
  4. Zipped shapefile (vector polygons): ACA Reef. Extract, index, provenance. This is just like the dataset workflows.

We can make a generalised dataset workflow template that processes data using 1 of these 4 categories.

  • To Do: Generalise ACE and DEP Seagrass CLI commands (e.g. csdr ace index)
  • To Do: Generalise Partitioned parquets CLI commands (e.g. csdr buildings index to csdr partitioned_parquet index) so that it can be used for more than just the buildings dataset.
  • To Do: Generalise Zipped shapefile CLI commands (e.g. csdr aca extract to csdr zipped_shapefile extract) so that it can be used for more than just the ACA Reef dataset. This has a lot of overlap with the geometry workflow.
  • To Do: Make a general datasets.yaml workflow template that calls one of 4 dataset-category workflow templates, that then calls a specific dataset workflow template. Add the category (1-4).

WIP: https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/41, #122

Product

Product workflows are already quite generalised. The main differences handing is in the process geometry step and the CLI handling the indicators. The indicators are currently hard coded. We need to plan how to support more dynamic products with other indicators.

Product commands are already common. They do this:

  1. List geometries, create run id, exclude geometries.
  2. Process geometry (the main calculation).
  3. Consolidate and provenance.

The indicator types we currently support are:

  • sum_x_area: mangrove, seagrass, reef, intertidal, saltmarsh,

  • count_x: count_buildings

  • percent-x-area: mangrove, intertidal, saltmarsh, seagrass.

  • To Do: Plan generalising indicators. How flexible will this be?

  • To Do: Make a general products.yaml workflow template that the specific product workflow templates call.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions