Draft SpatialData.filter() #626

aeisenbarth · 2024-07-08T19:12:08Z

(In reference to #620)

This PR imlements an more advanced filtering options than subset, allowing to create a new SpatialData object that contains only specific tables, layers, obs keys, var keys.

Use cases

From a concatenated SpatialData, one can extract parts of it.
When testing an operation that adds elements or table columns, one can extract from an expected reference dataset the input data and pass it to the operation, then compare the processed data against the reference.
…

Closes #280
Closes #284
Closes #556

codecov · 2024-07-08T19:18:00Z

Codecov Report

Attention: Patch coverage is 10.71429% with 25 lines in your changes missing coverage. Please review.

Project coverage is 91.59%. Comparing base (95d69ff) to head (d9c1e0e).
Report is 49 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata/_core/spatialdata.py	10.71%	25 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #626      +/-   ##
==========================================
- Coverage   91.93%   91.59%   -0.35%     
==========================================
  Files          44       44              
  Lines        6661     6688      +27     
==========================================
+ Hits         6124     6126       +2     
- Misses        537      562      +25

Files with missing lines	Coverage Δ
src/spatialdata/_core/spatialdata.py	`88.51% <10.71%> (-2.47%)`	⬇️

aeisenbarth · 2024-07-08T19:29:17Z

In the current state, it does not yet complete the issues that were aimed to resolve.

Subset spatialdata by list of cell ids #556:
- A parameter instances could be added. If provided, rows of these instances will be selected in the table, if not provided, all instances are returned. But it should only allow a single region/element (otherwise it gets complicated).
- Shapes/points elements should also be filtered (easy)
- Still unanswered is what effect it should have on the labels. Shall we create a new labels image with the instances eliminated (0 = background), or leave the labels unchanged and just have no reference to them in the table?
Filter spatialData #280:
- This involves a condition. In my opinion, implementing a parameter with a condition as function or query expression is out of scope due to its complexity. I would do this in two steps, users use Pandas to get a list of instances, then pass the instances to the SpatialData filter function.
- The user also asked about adjusting shapes to match the filtered instances in the table.
Feature request: spatial cropping from select table rows #284:
- Filtering labels/shapes/points elements

LucaMarconato · 2024-07-12T17:55:29Z

Thanks @aeisenbarth, after discussing with @melonora, we are going to first turn the code #627 into an internal function, merge, and then continue working on your PR. The idea is to provide a single entry point for filtering filter() and use for instance subset() or the function from Wouter internally.

owenwilkins · 2024-10-28T20:59:10Z

is there somewhere is the domentation that now describes how to filter a spatialdata object by cell IDs? this is valuable for several reasons, e.g. filtering cells removed by QC in analysis using other libraries

LucaMarconato · 2025-01-27T11:11:48Z

I went back to this and to #627 today and realized that we maybe do not need to add a new API, since all the points covered by this PR and by the linked PR, including all the points listed in this message here: #626 (comment) are essentially covered by the example below, which uses the currently available APIs:

##
# constructing the example data
from spatialdata.datasets import blobs_annotating_element
from spatialdata import concatenate
from spatialdata import join_spatialelement_table
from spatialdata import SpatialData

sdata1 = blobs_annotating_element("blobs_polygons")
sdata2 = blobs_annotating_element("blobs_polygons")

sdata = concatenate({"sdata1": sdata1, "sdata2": sdata2}, concatenate_tables=True)
print(sdata)

##
# filtering the data
table_name = "table"
filtered_table = sdata[table_name][sdata[table_name].obs.instance_id < 3]
annotated_regions = sdata.get_annotated_regions(sdata[table_name])
elements, table = join_spatialelement_table(
    sdata, spatial_element_names=annotated_regions, table=filtered_table, how="inner"
)
sdata_filtered = SpatialData.init_from_elements(elements | {table_name: table})
print(sdata_filtered)

Explicitly, the code above first filters the table with standard pandas/anndata operations (and thus is very general), and then reuses the join operations to filter the SpatialData object (which again are very general and allow for several cases). In doing this:

we limit code redundancy
we can use any query/condition to filter the object, including:
- we can filter by a threshold
- we can filter by instances
- we can manually subset certain obs/var/layers of the table
shapes and points are filtered to match the table
if a table annotates multiple elements, they are all filtered
filtering labels is delegated to the join operations (currently not supported, but if in the future it will, it would be also supported here)

I think we could proceed by choosing one of the following strategies:

we do not add any new API, and put the example above visible in the docs
we add a very minimalistic API that essentially reproduces the example above. So convenient for most cases, and for more general cases the user can modify the code
we add a feature complete API (similar to this PR); we use the code above internally so we don't have code redundancy.

Any preference?

Draft SpatialData.filter()

d9c1e0e

melonora mentioned this pull request Jul 9, 2024

allow filtering by ids #627

Draft

LucaMarconato mentioned this pull request Jul 9, 2024

Draft SpatialData.filter() #620

Closed

LucaMarconato marked this pull request as draft July 12, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft SpatialData.filter() #626

Draft SpatialData.filter() #626

aeisenbarth commented Jul 8, 2024

codecov bot commented Jul 8, 2024 •

edited

Loading

aeisenbarth commented Jul 8, 2024

LucaMarconato commented Jul 12, 2024

owenwilkins commented Oct 28, 2024

LucaMarconato commented Jan 27, 2025 •

edited

Loading

Draft SpatialData.filter() #626

Are you sure you want to change the base?

Draft SpatialData.filter() #626

Conversation

aeisenbarth commented Jul 8, 2024

codecov bot commented Jul 8, 2024 • edited Loading

Codecov Report

aeisenbarth commented Jul 8, 2024

LucaMarconato commented Jul 12, 2024

owenwilkins commented Oct 28, 2024

LucaMarconato commented Jan 27, 2025 • edited Loading

codecov bot commented Jul 8, 2024 •

edited

Loading

LucaMarconato commented Jan 27, 2025 •

edited

Loading