-
Notifications
You must be signed in to change notification settings - Fork 4
Update spatialdata #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update spatialdata #130
Changes from all commits
b8595ea
eeb745a
a566904
2fc4ea6
6ce6a6f
dc3b3b0
2d7d21b
7ba368a
7ec49c5
79b062c
5b51ab2
c71567e
2b74494
0282916
0bf0aae
415f302
1315f7d
bd06264
f8359a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| viash_version: 0.9.4 | ||
| viash_version: 0.9.7 | ||
|
|
||
| name: task_ist_preprocessing | ||
| organization: openproblems-bio | ||
|
|
||
| +7 −160 | component_tests/check_config.py | |
| +6 −174 | component_tests/run_and_check_output.py | |
| +21 −0 | nextflow_helpers/README.md | |
| +232 −0 | nextflow_helpers/benchmarkHelper.nf | |
| +58 −9 | nextflow_helpers/labels_tw.config | |
| +2,786 −0 | nextflow_helpers/workflowHelper.nf | |
| +35 −0 | schemas/results_v4/combined_output.json | |
| +63 −0 | schemas/results_v4/core.json | |
| +90 −0 | schemas/results_v4/dataset_info.json | |
| +84 −0 | schemas/results_v4/method_info.json | |
| +77 −0 | schemas/results_v4/metric_info.json | |
| +50 −0 | schemas/results_v4/quality_control.json | |
| +183 −0 | schemas/results_v4/results.json | |
| +64 −0 | schemas/results_v4/task_info.json | |
| +3 −3 | scripts/create_component | |
| +4 −4 | scripts/create_task_readme | |
| +1 −1 | scripts/fetch_task_run | |
| +418 −0 | scripts/render_results_report | |
| +3 −3 | scripts/sync_resources | |
| +1 −1 | scripts/upgrade_config |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,3 @@ | ||
| setup: | ||
| - type: python | ||
| pypi: ["spatialdata==0.5.0", "anndata>=0.12.0", "pyarrow<22.0.0", "zarr<3.0.0"] | ||
| # 1. remove pyarrow when https://github.com/scverse/spatialdata/issues/1007 is fixed. | ||
| # This is actually fixed now with the spatialdata release 0.6.0. However, the new | ||
| # release now comes with zarr 3.0.0. When reading a zarr file that was saved with | ||
| # zarr 3.0.0 we can not load it with zarr<3.0.0. (PathNotFoundError: nothing found at path '') | ||
| # 2. Currently sopa enforces zarr<3.0.0. Therefore we need to save all our data with zarr<3.0.0. | ||
| # As soon as this is fixed (https://github.com/gustaveroussy/sopa/issues/347): | ||
| # - remove restriction on spatialdata | ||
| # - remove zarr<3.0.0 | ||
| # - remove pyarrow<22.0.0 | ||
| # - Recreate all the datasets (scripts/create_resources/combine/process_datasets.sh) | ||
| pypi: ["spatialdata>=0.7.3", "anndata>=0.12.0", "zarr>=3.0.0"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -52,7 +52,7 @@ argument_groups: | |
| arguments: | ||
| - type: boolean | ||
| name: --keep_files | ||
| required: true | ||
| default: true | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This argument I brought in for development purposes. Didn't think about setting it to true as default, to not have files laying around when running the loader somewhere else. But it's not really important I guess |
||
| description: Whether to remove the downloaded files after processing. | ||
| - name: Metadata | ||
| arguments: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,7 +4,7 @@ namespace: datasets/loaders | |
| argument_groups: | ||
| - name: Inputs | ||
| arguments: | ||
| - type: string | ||
| - type: file | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had huge problems in the past when developing this component when setting type to file. |
||
| name: --input | ||
| required: true | ||
| description: A 10x xenium directory or zip file or download url | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,10 @@ | ||
| import anndata as ad | ||
| import geopandas as gpd | ||
| import sopa | ||
| import spatialdata as sd | ||
| from shapely.geometry import MultiPoint | ||
| from spatialdata.models import ShapesModel | ||
| from sopa.utils import copy_transformations | ||
|
|
||
| ## VIASH START | ||
| par = { | ||
|
|
@@ -36,9 +41,14 @@ | |
| del sdata.points[key] | ||
|
|
||
| for key in list(sdata.tables.keys()): | ||
| if key != 'metadata': | ||
| if key not in ['metadata', 'table']: | ||
| del sdata.tables[key] | ||
|
|
||
| # raw_ist.zarr stores the metadata table as 'table'; rename to match the output spec | ||
| if 'table' in sdata.tables and 'metadata' not in sdata.tables: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we should still assume that 'table' could exist at this stage?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah okay, I see it now! E.g. in binning we do generate a 'table' - all good then |
||
| sdata['metadata'] = sdata.tables['table'] | ||
| del sdata.tables['table'] | ||
|
|
||
| # sdata_transcripts | ||
| for col in list(sdata_transcripts["transcripts"].columns): | ||
| if col not in ['x', 'y', 'z', 'feature_name', 'cell_id', 'transcript_id']: | ||
|
|
@@ -69,6 +79,20 @@ | |
| adata.obs['passed_QC'] = adata_qc_col.obs['passed_QC'] | ||
| sdata['counts'] = adata | ||
|
|
||
| ####################### | ||
| # Compute cell shapes # | ||
| ####################### | ||
| print('Computing cell boundaries from transcripts using convex hulls', flush=True) | ||
| transcripts_df = sdata_transcripts["transcripts"].compute() | ||
| transcripts_assigned = transcripts_df[transcripts_df["cell_id"] != 0] | ||
| cell_shapes = transcripts_assigned.groupby("cell_id")[["x", "y"]].apply( | ||
| lambda g: MultiPoint(list(zip(g["x"], g["y"]))).convex_hull | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just out of interest, was this tested with a lot of cells? I.e. does this implementation scale well? (was this taken from sopa or so?) |
||
| ) | ||
| geo_df = gpd.GeoDataFrame(geometry=cell_shapes) | ||
| geo_df = sopa.shapes.to_valid_polygons(geo_df) | ||
| transformations = copy_transformations(sdata_transcripts["transcripts"]) | ||
| sdata["cell_boundaries"] = ShapesModel.parse(geo_df, transformations=transformations) | ||
|
|
||
| ################# | ||
| # Write output # | ||
| ################# | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow this comment moved here, right?
But it's not super related anymore? The zarr things are fixed with this PR and pyarrow install I don't see