Geometry
We have these geometries: ACSC2, ABS Aus States, CWA, EEZ.
Geometry workflows all do these steps:
- Download a zipped shapefile from a URL to our bucket (cache).
- Unzip the shapefile and write its data as both parquet and PMTiles (convert, "zip-to-parquet").
- Write geometry and provenance to DB (provenance).
The workflow template for adding a geometry will be easy to generalise. They use different CLI commands for caching (e.g. csdr eez cache, csdr aus-states cache), which I will aim to consolidate .
Resolved with:
https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/40, #115
Dataset
We have these datasets: ACA Reef, ACE, VIDA Buildings, DEP Seagrass, GMW v3, GMW v4.
Data Types
- STAC Collection/STAC API to STAC-Geoparquet: ,
ACE, DEP Seagrass. These do index, and provenance. Index CLI commands are different per dataset (e.g. csdr ace index). We should attempt to generalise this.
- Zipped TIFFs/COGs to STAC and STAC-Geoparquet:
GMW v3, GMW v4. Cache, extract, index, provenance. Already generalised.
- Partitioned parquets (vector points):
VIDA Buildings. Index, provenance.
- Zipped shapefile (vector polygons):
ACA Reef. Extract, index, provenance. This is just like the dataset workflows.
We can make a generalised dataset workflow template that processes data using 1 of these 4 categories.
WIP: https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/41, #122
Product
Product workflows are already quite generalised. The main differences handing is in the process geometry step and the CLI handling the indicators. The indicators are currently hard coded. We need to plan how to support more dynamic products with other indicators.
Product commands are already common. They do this:
- List geometries, create run id, exclude geometries.
- Process geometry (the main calculation).
- Consolidate and provenance.
The indicator types we currently support are:
Geometry
We have these geometries:
ACSC2,ABS Aus States,CWA,EEZ.Geometry workflows all do these steps:
The workflow template for adding a geometry will be easy to generalise. They use different CLI commands for caching (e.g.
csdr eez cache,csdr aus-states cache), which I will aim to consolidate .csdr eez cache,csdr aus-states cache)Resolved with:
https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/40, #115
Dataset
We have these datasets:
ACA Reef,ACE,VIDA Buildings,DEP Seagrass,GMW v3,GMW v4.Data Types
ACE,DEP Seagrass. These do index, and provenance. Index CLI commands are different per dataset (e.g.csdr ace index). We should attempt to generalise this.GMW v3,GMW v4. Cache, extract, index, provenance. Already generalised.VIDA Buildings. Index, provenance.ACA Reef. Extract, index, provenance. This is just like the dataset workflows.We can make a generalised dataset workflow template that processes data using 1 of these 4 categories.
csdr ace index)csdr buildings indextocsdr partitioned_parquet index) so that it can be used for more than just the buildings dataset.csdr aca extracttocsdr zipped_shapefile extract) so that it can be used for more than just the ACA Reef dataset. This has a lot of overlap with the geometry workflow.WIP: https://github.com/SustainableDevelopmentReform/csdr-cloud-spatial-flux/pull/41, #122
Product
Product workflows are already quite generalised. The main differences handing is in the process geometry step and the CLI handling the indicators. The indicators are currently hard coded. We need to plan how to support more dynamic products with other indicators.
Product commands are already common. They do this:
The indicator types we currently support are:
sum_x_area: mangrove, seagrass, reef, intertidal, saltmarsh,
count_x: count_buildings
percent-x-area: mangrove, intertidal, saltmarsh, seagrass.
To Do: Plan generalising indicators. How flexible will this be?
To Do: Make a general products.yaml workflow template that the specific product workflow templates call.