Subselecting Data

[3]:
import spatialproteomics
import pandas as pd
import xarray as xr
xr.set_options(display_style='text')
[3]:
<xarray.core.options.set_options at 0x7fd18a1534d0>

One of the key features of spatialproteomics is the ability to slice our image data quickly and intuitively. We start by loading our spatialproteomics object.

[7]:
ds = xr.load_dataset('../../data/BNHL_166_4_I2_LK.zarr')
/home/voehring/voehring/conda/pymc_env/lib/python3.11/site-packages/xarray/backends/plugins.py:139: RuntimeWarning: 'h5netcdf' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/home/voehring/voehring/conda/pymc_env/lib/python3.11/site-packages/xarray/backends/plugins.py:139: RuntimeWarning: 'scipy' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)

Slicing Channels and Spatial Coordinates

To slice specific channels of the image we simply use .pp accessor together with the familiar bracket [] indexing.

[8]:
ds.pp['CD4']
[8]:
<xarray.Dataset>
Dimensions:        (cells: 12560, channels: 1, y: 3000, x: 3000, labels: 8,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels       (channels) <U11 'CD4'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 1 2 3 4 5 6 7 8
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y              (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _arcsinh_mean  (cells, channels) float64 2.473 1.98 2.647 ... 2.113 1.923
    _arcsinh_sum   (cells, channels) float64 7.703 7.224 7.812 ... 7.55 7.277
    _image         (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 0 0 0 0 1 0
    _labels        (labels, props) object 'C3' ... 'B (PAX5)'
    _obs           (cells, features) float64 613.3 768.4 4.0 ... 8.0 7.0
    _raw_mean      (cells, channels) float64 29.45 17.77 35.1 ... 20.39 16.75
    _raw_sum       (cells, channels) float64 5.536e+03 3.429e+03 ... 3.617e+03
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

We can also select multiple channels by simply passing a list to the .pp accessor. As we will see later, this makes visualising image overlays easy.

[11]:
ds.pp[['CD4', 'CD8']]
[11]:
<xarray.Dataset>
Dimensions:        (cells: 12560, channels: 2, y: 3000, x: 3000, labels: 8,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels       (channels) <U11 'CD4' 'CD8'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 1 2 3 4 5 6 7 8
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y              (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _arcsinh_mean  (cells, channels) float64 2.473 0.7143 1.98 ... 1.923 0.5909
    _arcsinh_sum   (cells, channels) float64 7.703 5.677 7.224 ... 7.277 5.6
    _image         (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 2 1 1 1 2 1
    _labels        (labels, props) object 'C3' ... 'B (PAX5)'
    _obs           (cells, features) float64 613.3 768.4 4.0 ... 8.0 7.0
    _raw_mean      (cells, channels) float64 29.45 3.883 17.77 ... 16.75 3.13
    _raw_sum       (cells, channels) float64 5.536e+03 730.0 ... 3.617e+03 676.0
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

The .pp accessor also understands x and y coordinates. When x and y coordinates are sliced, we get ridd of all cells that do not belong to the respective image slice.

[12]:
ds.pp[50:150, 50:150]
Dropped 12560 cells.
[12]:
<xarray.Dataset>
Dimensions:        (cells: 0, channels: 56, y: 101, x: 101, labels: 8,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64
  * channels       (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 1 2 3 4 5 6 7 8
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
  * y              (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
Data variables:
    _arcsinh_mean  (cells, channels) float64
    _arcsinh_sum   (cells, channels) float64
    _image         (channels, y, x) uint8 5 4 5 4 5 4 5 4 4 ... 2 2 2 1 2 2 2 2
    _labels        (labels, props) object 'C3' ... 'B (PAX5)'
    _obs           (cells, features) float64
    _raw_mean      (cells, channels) float64
    _raw_sum       (cells, channels) float64
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Note that we can also pass channels and x, y coordinates at the same time.

[13]:
ds.pp[['CD4', 'CD8'], 50:150, 50:150]
Dropped 12560 cells.
[13]:
<xarray.Dataset>
Dimensions:        (cells: 0, channels: 2, y: 101, x: 101, labels: 8, props: 2,
                    features: 4)
Coordinates:
  * cells          (cells) int64
  * channels       (channels) <U11 'CD4' 'CD8'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 1 2 3 4 5 6 7 8
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
  * y              (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
Data variables:
    _arcsinh_mean  (cells, channels) float64
    _arcsinh_sum   (cells, channels) float64
    _image         (channels, y, x) uint8 0 0 0 0 0 0 1 1 0 ... 2 1 1 1 1 1 4 1
    _labels        (labels, props) object 'C3' ... 'B (PAX5)'
    _obs           (cells, features) float64
    _raw_mean      (cells, channels) float64
    _raw_sum       (cells, channels) float64
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Slicing Labels

The labels accessor .la allows to select specific cell types by their label number or name.

[14]:
ds.la[4]
[14]:
<xarray.Dataset>
Dimensions:        (cells: 5891, channels: 56, y: 3000, x: 3000, labels: 1,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522
  * channels       (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 4
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y              (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _arcsinh_mean  (cells, channels) float64 3.111 0.0 1.391 ... 0.9208 0.4512
    _arcsinh_sum   (cells, channels) float64 8.346 0.0 6.564 ... 4.93 5.817 5.0
    _image         (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _labels        (labels, props) object 'C0' 'T (CD3)'
    _obs           (cells, features) float64 613.3 768.4 4.0 ... 4.0 3.0
    _raw_mean      (cells, channels) float64 56.02 0.0 9.426 ... 5.283 2.333
    _raw_sum       (cells, channels) float64 1.053e+04 0.0 ... 840.0 371.0
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
[17]:
ds.la['T (CD3)']
[17]:
<xarray.Dataset>
Dimensions:        (cells: 5891, channels: 56, y: 3000, x: 3000, labels: 1,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522
  * channels       (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 4
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y              (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _arcsinh_mean  (cells, channels) float64 3.111 0.0 1.391 ... 0.9208 0.4512
    _arcsinh_sum   (cells, channels) float64 8.346 0.0 6.564 ... 4.93 5.817 5.0
    _image         (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _labels        (labels, props) object 'C0' 'T (CD3)'
    _obs           (cells, features) float64 613.3 768.4 4.0 ... 4.0 3.0
    _raw_mean      (cells, channels) float64 56.02 0.0 9.426 ... 5.283 2.333
    _raw_sum       (cells, channels) float64 1.053e+04 0.0 ... 840.0 371.0
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Again it is possible to pass multiple cell labels.

[15]:
ds.la[4, 5, 6]
[15]:
<xarray.Dataset>
Dimensions:        (cells: 7806, channels: 56, y: 3000, x: 3000, labels: 3,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64 1 2 4 5 6 8 ... 12531 12536 12540 12549 12557
  * channels       (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 4 5 6
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y              (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _arcsinh_mean  (cells, channels) float64 3.111 0.0 1.391 ... 0.2297 0.4495
    _arcsinh_sum   (cells, channels) float64 8.346 0.0 6.564 ... 4.208 4.904
    _image         (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _labels        (labels, props) object 'C0' 'T (CD3)' ... 'Macro (CD68)'
    _obs           (cells, features) float64 613.3 768.4 4.0 ... 6.0 5.0
    _raw_mean      (cells, channels) float64 56.02 0.0 9.426 ... 1.159 2.324
    _raw_sum       (cells, channels) float64 1.053e+04 0.0 ... 168.0 337.0
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Finally, we can select all cells except a cell type using la.deselect.

[16]:
ds.la.deselect([1])
[16]:
<xarray.Dataset>
Dimensions:        (cells: 12391, channels: 56, y: 3000, x: 3000, labels: 7,
                    props: 2, features: 4)
Coordinates:
  * cells          (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels       (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features       (features) <U10 'centroid-0' 'centroid-1' ... '_original_'
  * labels         (labels) int64 2 3 4 5 6 7 8
  * props          (props) <U6 '_color' '_name'
  * x              (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y              (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _arcsinh_mean  (cells, channels) float64 3.111 0.0 1.391 ... 1.324 0.4174
    _arcsinh_sum   (cells, channels) float64 8.346 0.0 6.564 ... 6.625 5.224
    _image         (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _labels        (labels, props) object 'C7' 'Stroma (CD90)' ... 'B (PAX5)'
    _obs           (cells, features) float64 613.3 768.4 4.0 ... 8.0 7.0
    _raw_mean      (cells, channels) float64 56.02 0.0 9.426 ... 8.727 2.148
    _raw_sum       (cells, channels) float64 1.053e+04 0.0 ... 1.885e+03 464.0
    _segmentation  (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0