Subselecting Data¶
[1]:
%reload_ext autoreload
%autoreload 2
import spatialproteomics
import pandas as pd
import xarray as xr
xr.set_options(display_style="text")
[1]:
<xarray.core.options.set_options at 0x7f823c14ea10>
One of the key features of spatialproteomics
is the ability to slice our image data quickly and intuitively. We start by loading our spatialproteomics object.
[2]:
ds = xr.open_zarr("../../data/BNHL_166_4_I2_LK_2.zarr")
Slicing Channels and Spatial Coordinates¶
To slice specific channels of the image we simply use .pp
accessor together with the familiar bracket []
indexing.
[3]:
ds.pp["CD4"]
[3]:
<xarray.Dataset> Dimensions: (channels: 1, y: 3000, x: 3000, labels: 8, la_props: 2, cells: 12560, features: 3) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'CD4' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(1, 375, 750), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(6280, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
We can also select multiple channels by simply passing a list to the .pp
accessor. As we will see later, this makes visualising image overlays easy.
[4]:
ds.pp[["CD4", "CD8"]]
[4]:
<xarray.Dataset> Dimensions: (channels: 2, y: 3000, x: 3000, labels: 8, la_props: 2, cells: 12560, features: 3) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'CD4' 'CD8' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(2, 375, 750), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(6280, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
The .pp
accessor also understands x
and y
coordinates. When x
and y
coordinates are sliced, we get ridd of all cells that do not belong to the respective image slice.
[5]:
ds.pp[50:150, 50:150]
[5]:
<xarray.Dataset> Dimensions: (channels: 56, y: 101, x: 101, labels: 8, la_props: 2, cells: 0, features: 3) Coordinates: * cells (cells) int64 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 * y (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(7, 101, 101), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(0, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Note that we can also pass channels
and x, y
coordinates at the same time.
[6]:
ds.pp[["CD4", "CD8"], 50:150, 50:150]
[6]:
<xarray.Dataset> Dimensions: (channels: 2, y: 101, x: 101, labels: 8, la_props: 2, cells: 0, features: 3) Coordinates: * cells (cells) int64 * channels (channels) <U11 'CD4' 'CD8' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 * y (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(2, 101, 101), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(0, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Slicing Labels¶
The labels accessor .la
allows to select specific cell types by their label number or name.
[7]:
ds.la[4]
[7]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2, cells: 1073, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 4 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 3 11 49 71 80 ... 12504 12516 12554 12558 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(1, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(1073, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
[8]:
ds.la["T (CD3)"]
[8]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2, cells: 5891, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 7 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(1, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(5891, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Again it is possible to pass multiple cell labels.
[9]:
ds.la[4, 5, 6]
[9]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 3, la_props: 2, cells: 3571, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 4 5 6 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 2 3 5 6 9 11 ... 12539 12554 12555 12557 12558 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(3, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(3571, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Finally, we can select all cells except a cell type using la.deselect
.
[10]:
ds.la.deselect([1])
[10]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 7, la_props: 2, cells: 10488, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 2 3 4 5 6 7 8 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 1 2 3 4 5 6 ... 10484 10485 10486 10487 10488 Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray> _la_properties (labels, la_props) <U20 dask.array<chunksize=(7, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(6280, 3), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Slicing Neighborhoods¶
We can also select by neighborhoods with the nh
accessor. The syntax is identical to the one in the label subsetting.
[11]:
ds = xr.open_zarr("../../data/sample_1_with_neighborhoods.zarr")
ds
[11]:
<xarray.Dataset> Dimensions: (cells: 6901, celltype_levels: 3, channels: 56, y: 2000, x: 2000, labels: 9, la_props: 2, neighborhoods: 4, nh_props: 2, features: 18) Coordinates: * cells (cells) int64 1 2 3 4 5 ... 6897 6898 6899 6900 6901 * celltype_levels (celltype_levels) <U8 'labels' 'labels_1' 'labels_2' * channels (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38' * features (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 9 * neighborhoods (neighborhoods) int64 0 1 3 4 * nh_props (nh_props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 * y (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 Data variables: _celltype_predictions (cells, celltype_levels) <U11 dask.array<chunksize=(3451, 2), meta=np.ndarray> _image (channels, y, x) uint8 dask.array<chunksize=(7, 500, 500), meta=np.ndarray> _la_properties (labels, la_props) <U11 dask.array<chunksize=(9, 2), meta=np.ndarray> _nh_properties (neighborhoods, nh_props) <U14 dask.array<chunksize=(4, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(3451, 9), meta=np.ndarray> _segmentation (y, x) int64 dask.array<chunksize=(250, 500), meta=np.ndarray>
[12]:
# subsetting only neighborhood 0
ds.nh[0]
[12]:
<xarray.Dataset> Dimensions: (cells: 709, celltype_levels: 3, channels: 56, y: 2000, x: 2000, labels: 9, la_props: 2, neighborhoods: 1, nh_props: 2, features: 18) Coordinates: * celltype_levels (celltype_levels) <U8 'labels' 'labels_1' 'labels_2' * channels (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38' * features (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 9 * neighborhoods (neighborhoods) int64 0 * nh_props (nh_props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 * y (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 * cells (cells) int64 1571 1858 1895 1912 ... 6204 6247 6293 Data variables: _celltype_predictions (cells, celltype_levels) <U11 dask.array<chunksize=(709, 2), meta=np.ndarray> _image (channels, y, x) uint8 dask.array<chunksize=(7, 500, 500), meta=np.ndarray> _la_properties (labels, la_props) <U11 dask.array<chunksize=(9, 2), meta=np.ndarray> _nh_properties (neighborhoods, nh_props) <U14 dask.array<chunksize=(1, 2), meta=np.ndarray> _obs (cells, features) float64 dask.array<chunksize=(709, 9), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0