Subselecting Data

[1]:
%reload_ext autoreload
%autoreload 2

import spatialproteomics
import pandas as pd
import xarray as xr

xr.set_options(display_style="text")
[1]:
<xarray.core.options.set_options at 0x7f823c14ea10>

One of the key features of spatialproteomics is the ability to slice our image data quickly and intuitively. We start by loading our spatialproteomics object.

[2]:
ds = xr.open_zarr("../../data/BNHL_166_4_I2_LK_2.zarr")

Slicing Channels and Spatial Coordinates

To slice specific channels of the image we simply use .pp accessor together with the familiar bracket [] indexing.

[3]:
ds.pp["CD4"]
[3]:
<xarray.Dataset>
Dimensions:         (channels: 1, y: 3000, x: 3000, labels: 8, la_props: 2,
                     cells: 12560, features: 3)
Coordinates:
  * cells           (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels        (channels) <U11 'CD4'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(1, 375, 750), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(6280, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

We can also select multiple channels by simply passing a list to the .pp accessor. As we will see later, this makes visualising image overlays easy.

[4]:
ds.pp[["CD4", "CD8"]]
[4]:
<xarray.Dataset>
Dimensions:         (channels: 2, y: 3000, x: 3000, labels: 8, la_props: 2,
                     cells: 12560, features: 3)
Coordinates:
  * cells           (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels        (channels) <U11 'CD4' 'CD8'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(2, 375, 750), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(6280, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

The .pp accessor also understands x and y coordinates. When x and y coordinates are sliced, we get ridd of all cells that do not belong to the respective image slice.

[5]:
ds.pp[50:150, 50:150]
[5]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 101, x: 101, labels: 8, la_props: 2,
                     cells: 0, features: 3)
Coordinates:
  * cells           (cells) int64
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
  * y               (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(7, 101, 101), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(0, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Note that we can also pass channels and x, y coordinates at the same time.

[6]:
ds.pp[["CD4", "CD8"], 50:150, 50:150]
[6]:
<xarray.Dataset>
Dimensions:         (channels: 2, y: 101, x: 101, labels: 8, la_props: 2,
                     cells: 0, features: 3)
Coordinates:
  * cells           (cells) int64
  * channels        (channels) <U11 'CD4' 'CD8'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
  * y               (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(2, 101, 101), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(8, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(0, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Slicing Labels

The labels accessor .la allows to select specific cell types by their label number or name.

[7]:
ds.la[4]
[7]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2,
                     cells: 1073, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 4
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 3 11 49 71 80 ... 12504 12516 12554 12558
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(1, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(1073, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
[8]:
ds.la["T (CD3)"]
[8]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2,
                     cells: 5891, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 7
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(1, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(5891, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Again it is possible to pass multiple cell labels.

[9]:
ds.la[4, 5, 6]
[9]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 3, la_props: 2,
                     cells: 3571, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 4 5 6
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 2 3 5 6 9 11 ... 12539 12554 12555 12557 12558
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(3, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(3571, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Finally, we can select all cells except a cell type using la.deselect.

[10]:
ds.la.deselect([1])
[10]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 7, la_props: 2,
                     cells: 10488, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 2 3 4 5 6 7 8
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 1 2 3 4 5 6 ... 10484 10485 10486 10487 10488
Data variables:
    _image          (channels, y, x) uint8 dask.array<chunksize=(7, 375, 750), meta=np.ndarray>
    _la_properties  (labels, la_props) <U20 dask.array<chunksize=(7, 2), meta=np.ndarray>
    _obs            (cells, features) float64 dask.array<chunksize=(6280, 3), meta=np.ndarray>
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Slicing Neighborhoods

We can also select by neighborhoods with the nh accessor. The syntax is identical to the one in the label subsetting.

[11]:
ds = xr.open_zarr("../../data/sample_1_with_neighborhoods.zarr")
ds
[11]:
<xarray.Dataset>
Dimensions:                (cells: 6901, celltype_levels: 3, channels: 56,
                            y: 2000, x: 2000, labels: 9, la_props: 2,
                            neighborhoods: 4, nh_props: 2, features: 18)
Coordinates:
  * cells                  (cells) int64 1 2 3 4 5 ... 6897 6898 6899 6900 6901
  * celltype_levels        (celltype_levels) <U8 'labels' 'labels_1' 'labels_2'
  * channels               (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38'
  * features               (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67'
  * la_props               (la_props) <U6 '_color' '_name'
  * labels                 (labels) int64 1 2 3 4 5 6 7 8 9
  * neighborhoods          (neighborhoods) int64 0 1 3 4
  * nh_props               (nh_props) <U6 '_color' '_name'
  * x                      (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * y                      (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
Data variables:
    _celltype_predictions  (cells, celltype_levels) <U11 dask.array<chunksize=(3451, 2), meta=np.ndarray>
    _image                 (channels, y, x) uint8 dask.array<chunksize=(7, 500, 500), meta=np.ndarray>
    _la_properties         (labels, la_props) <U11 dask.array<chunksize=(9, 2), meta=np.ndarray>
    _nh_properties         (neighborhoods, nh_props) <U14 dask.array<chunksize=(4, 2), meta=np.ndarray>
    _obs                   (cells, features) float64 dask.array<chunksize=(3451, 9), meta=np.ndarray>
    _segmentation          (y, x) int64 dask.array<chunksize=(250, 500), meta=np.ndarray>
[12]:
# subsetting only neighborhood 0
ds.nh[0]
[12]:
<xarray.Dataset>
Dimensions:                (cells: 709, celltype_levels: 3, channels: 56,
                            y: 2000, x: 2000, labels: 9, la_props: 2,
                            neighborhoods: 1, nh_props: 2, features: 18)
Coordinates:
  * celltype_levels        (celltype_levels) <U8 'labels' 'labels_1' 'labels_2'
  * channels               (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38'
  * features               (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67'
  * la_props               (la_props) <U6 '_color' '_name'
  * labels                 (labels) int64 1 2 3 4 5 6 7 8 9
  * neighborhoods          (neighborhoods) int64 0
  * nh_props               (nh_props) <U6 '_color' '_name'
  * x                      (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * y                      (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * cells                  (cells) int64 1571 1858 1895 1912 ... 6204 6247 6293
Data variables:
    _celltype_predictions  (cells, celltype_levels) <U11 dask.array<chunksize=(709, 2), meta=np.ndarray>
    _image                 (channels, y, x) uint8 dask.array<chunksize=(7, 500, 500), meta=np.ndarray>
    _la_properties         (labels, la_props) <U11 dask.array<chunksize=(9, 2), meta=np.ndarray>
    _nh_properties         (neighborhoods, nh_props) <U14 dask.array<chunksize=(1, 2), meta=np.ndarray>
    _obs                   (cells, features) float64 dask.array<chunksize=(709, 9), meta=np.ndarray>
    _segmentation          (y, x) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0