-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft SpatialData.filter() #626
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #626 +/- ##
==========================================
- Coverage 91.93% 91.59% -0.35%
==========================================
Files 44 44
Lines 6661 6688 +27
==========================================
+ Hits 6124 6126 +2
- Misses 537 562 +25
|
In the current state, it does not yet complete the issues that were aimed to resolve.
|
Thanks @aeisenbarth, after discussing with @melonora, we are going to first turn the code #627 into an internal function, merge, and then continue working on your PR. The idea is to provide a single entry point for filtering |
is there somewhere is the domentation that now describes how to filter a spatialdata object by cell IDs? this is valuable for several reasons, e.g. filtering cells removed by QC in analysis using other libraries |
I went back to this and to #627 today and realized that we maybe do not need to add a new API, since all the points covered by this PR and by the linked PR, including all the points listed in this message here: #626 (comment) are essentially covered by the example below, which uses the currently available APIs: ##
# constructing the example data
from spatialdata.datasets import blobs_annotating_element
from spatialdata import concatenate
from spatialdata import join_spatialelement_table
from spatialdata import SpatialData
sdata1 = blobs_annotating_element("blobs_polygons")
sdata2 = blobs_annotating_element("blobs_polygons")
sdata = concatenate({"sdata1": sdata1, "sdata2": sdata2}, concatenate_tables=True)
print(sdata)
##
# filtering the data
table_name = "table"
filtered_table = sdata[table_name][sdata[table_name].obs.instance_id < 3]
annotated_regions = sdata.get_annotated_regions(sdata[table_name])
elements, table = join_spatialelement_table(
sdata, spatial_element_names=annotated_regions, table=filtered_table, how="inner"
)
sdata_filtered = SpatialData.init_from_elements(elements | {table_name: table})
print(sdata_filtered) Explicitly, the code above first filters the table with standard
I think we could proceed by choosing one of the following strategies:
Any preference? |
(In reference to #620)
This PR imlements an more advanced filtering options than
subset
, allowing to create a new SpatialData object that contains only specific tables, layers, obs keys, var keys.Use cases
Closes #280
Closes #284
Closes #556