Subselecting Data
[3]:
import spatialproteomics
import pandas as pd
import xarray as xr
xr.set_options(display_style='text')
[3]:
<xarray.core.options.set_options at 0x7fd18a1534d0>
One of the key features of spatialproteomics
is the ability to slice our image data quickly and intuitively. We start by loading our spatialproteomics object.
[7]:
ds = xr.load_dataset('../../data/BNHL_166_4_I2_LK.zarr')
/home/voehring/voehring/conda/pymc_env/lib/python3.11/site-packages/xarray/backends/plugins.py:139: RuntimeWarning: 'h5netcdf' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/home/voehring/voehring/conda/pymc_env/lib/python3.11/site-packages/xarray/backends/plugins.py:139: RuntimeWarning: 'scipy' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
Slicing Channels and Spatial Coordinates
To slice specific channels of the image we simply use .pp
accessor together with the familiar bracket []
indexing.
[8]:
ds.pp['CD4']
[8]:
<xarray.Dataset> Dimensions: (cells: 12560, channels: 1, y: 3000, x: 3000, labels: 8, props: 2, features: 4) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'CD4' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 1 2 3 4 5 6 7 8 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 2.473 1.98 2.647 ... 2.113 1.923 _arcsinh_sum (cells, channels) float64 7.703 7.224 7.812 ... 7.55 7.277 _image (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 0 0 0 0 1 0 _labels (labels, props) object 'C3' ... 'B (PAX5)' _obs (cells, features) float64 613.3 768.4 4.0 ... 8.0 7.0 _raw_mean (cells, channels) float64 29.45 17.77 35.1 ... 20.39 16.75 _raw_sum (cells, channels) float64 5.536e+03 3.429e+03 ... 3.617e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
We can also select multiple channels by simply passing a list to the .pp
accessor. As we will see later, this makes visualising image overlays easy.
[11]:
ds.pp[['CD4', 'CD8']]
[11]:
<xarray.Dataset> Dimensions: (cells: 12560, channels: 2, y: 3000, x: 3000, labels: 8, props: 2, features: 4) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'CD4' 'CD8' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 1 2 3 4 5 6 7 8 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 2.473 0.7143 1.98 ... 1.923 0.5909 _arcsinh_sum (cells, channels) float64 7.703 5.677 7.224 ... 7.277 5.6 _image (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 2 1 1 1 2 1 _labels (labels, props) object 'C3' ... 'B (PAX5)' _obs (cells, features) float64 613.3 768.4 4.0 ... 8.0 7.0 _raw_mean (cells, channels) float64 29.45 3.883 17.77 ... 16.75 3.13 _raw_sum (cells, channels) float64 5.536e+03 730.0 ... 3.617e+03 676.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
The .pp
accessor also understands x
and y
coordinates. When x
and y
coordinates are sliced, we get ridd of all cells that do not belong to the respective image slice.
[12]:
ds.pp[50:150, 50:150]
Dropped 12560 cells.
[12]:
<xarray.Dataset> Dimensions: (cells: 0, channels: 56, y: 101, x: 101, labels: 8, props: 2, features: 4) Coordinates: * cells (cells) int64 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 1 2 3 4 5 6 7 8 * props (props) <U6 '_color' '_name' * x (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 * y (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 Data variables: _arcsinh_mean (cells, channels) float64 _arcsinh_sum (cells, channels) float64 _image (channels, y, x) uint8 5 4 5 4 5 4 5 4 4 ... 2 2 2 1 2 2 2 2 _labels (labels, props) object 'C3' ... 'B (PAX5)' _obs (cells, features) float64 _raw_mean (cells, channels) float64 _raw_sum (cells, channels) float64 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Note that we can also pass channels
and x, y
coordinates at the same time.
[13]:
ds.pp[['CD4', 'CD8'], 50:150, 50:150]
Dropped 12560 cells.
[13]:
<xarray.Dataset> Dimensions: (cells: 0, channels: 2, y: 101, x: 101, labels: 8, props: 2, features: 4) Coordinates: * cells (cells) int64 * channels (channels) <U11 'CD4' 'CD8' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 1 2 3 4 5 6 7 8 * props (props) <U6 '_color' '_name' * x (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 * y (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 Data variables: _arcsinh_mean (cells, channels) float64 _arcsinh_sum (cells, channels) float64 _image (channels, y, x) uint8 0 0 0 0 0 0 1 1 0 ... 2 1 1 1 1 1 4 1 _labels (labels, props) object 'C3' ... 'B (PAX5)' _obs (cells, features) float64 _raw_mean (cells, channels) float64 _raw_sum (cells, channels) float64 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Slicing Labels
The labels accessor .la
allows to select specific cell types by their label number or name.
[14]:
ds.la[4]
[14]:
<xarray.Dataset> Dimensions: (cells: 5891, channels: 56, y: 3000, x: 3000, labels: 1, props: 2, features: 4) Coordinates: * cells (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 4 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 3.111 0.0 1.391 ... 0.9208 0.4512 _arcsinh_sum (cells, channels) float64 8.346 0.0 6.564 ... 4.93 5.817 5.0 _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _labels (labels, props) object 'C0' 'T (CD3)' _obs (cells, features) float64 613.3 768.4 4.0 ... 4.0 3.0 _raw_mean (cells, channels) float64 56.02 0.0 9.426 ... 5.283 2.333 _raw_sum (cells, channels) float64 1.053e+04 0.0 ... 840.0 371.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
[17]:
ds.la['T (CD3)']
[17]:
<xarray.Dataset> Dimensions: (cells: 5891, channels: 56, y: 3000, x: 3000, labels: 1, props: 2, features: 4) Coordinates: * cells (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 4 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 3.111 0.0 1.391 ... 0.9208 0.4512 _arcsinh_sum (cells, channels) float64 8.346 0.0 6.564 ... 4.93 5.817 5.0 _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _labels (labels, props) object 'C0' 'T (CD3)' _obs (cells, features) float64 613.3 768.4 4.0 ... 4.0 3.0 _raw_mean (cells, channels) float64 56.02 0.0 9.426 ... 5.283 2.333 _raw_sum (cells, channels) float64 1.053e+04 0.0 ... 840.0 371.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Again it is possible to pass multiple cell labels.
[15]:
ds.la[4, 5, 6]
[15]:
<xarray.Dataset> Dimensions: (cells: 7806, channels: 56, y: 3000, x: 3000, labels: 3, props: 2, features: 4) Coordinates: * cells (cells) int64 1 2 4 5 6 8 ... 12531 12536 12540 12549 12557 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 4 5 6 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 3.111 0.0 1.391 ... 0.2297 0.4495 _arcsinh_sum (cells, channels) float64 8.346 0.0 6.564 ... 4.208 4.904 _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _labels (labels, props) object 'C0' 'T (CD3)' ... 'Macro (CD68)' _obs (cells, features) float64 613.3 768.4 4.0 ... 6.0 5.0 _raw_mean (cells, channels) float64 56.02 0.0 9.426 ... 1.159 2.324 _raw_sum (cells, channels) float64 1.053e+04 0.0 ... 168.0 337.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Finally, we can select all cells except a cell type using la.deselect
.
[16]:
ds.la.deselect([1])
[16]:
<xarray.Dataset> Dimensions: (cells: 12391, channels: 56, y: 3000, x: 3000, labels: 7, props: 2, features: 4) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 'centroid-0' 'centroid-1' ... '_original_' * labels (labels) int64 2 3 4 5 6 7 8 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 3.111 0.0 1.391 ... 1.324 0.4174 _arcsinh_sum (cells, channels) float64 8.346 0.0 6.564 ... 6.625 5.224 _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _labels (labels, props) object 'C7' 'Stroma (CD90)' ... 'B (PAX5)' _obs (cells, features) float64 613.3 768.4 4.0 ... 8.0 7.0 _raw_mean (cells, channels) float64 56.02 0.0 9.426 ... 8.727 2.148 _raw_sum (cells, channels) float64 1.053e+04 0.0 ... 1.885e+03 464.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0