Subselecting Data¶
[1]:
%reload_ext autoreload
%autoreload 2
import spatialproteomics
import pandas as pd
import xarray as xr
xr.set_options(display_style='text')
[1]:
<xarray.core.options.set_options at 0x7f75c7593e80>
One of the key features of spatialproteomics
is the ability to slice our image data quickly and intuitively. We start by loading our spatialproteomics object.
[2]:
ds = xr.load_dataset('../../data/BNHL_166_4_I2_LK_2.zarr', engine='zarr')
Slicing Channels and Spatial Coordinates¶
To slice specific channels of the image we simply use .pp
accessor together with the familiar bracket []
indexing.
[3]:
ds.pp['CD4']
[3]:
<xarray.Dataset> Dimensions: (channels: 1, y: 3000, x: 3000, labels: 8, la_props: 2, cells: 12560, features: 3) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'CD4' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _image (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 0 0 0 0 1 0 _la_properties (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)' _obs (cells, features) float64 7.0 613.3 ... 2.249e+03 2.237e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
We can also select multiple channels by simply passing a list to the .pp
accessor. As we will see later, this makes visualising image overlays easy.
[4]:
ds.pp[['CD4', 'CD8']]
[4]:
<xarray.Dataset> Dimensions: (channels: 2, y: 3000, x: 3000, labels: 8, la_props: 2, cells: 12560, features: 3) Coordinates: * cells (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560 * channels (channels) <U11 'CD4' 'CD8' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 Data variables: _image (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 2 1 1 1 2 1 _la_properties (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)' _obs (cells, features) float64 7.0 613.3 ... 2.249e+03 2.237e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
The .pp
accessor also understands x
and y
coordinates. When x
and y
coordinates are sliced, we get ridd of all cells that do not belong to the respective image slice.
[5]:
ds.pp[50:150, 50:150]
[5]:
<xarray.Dataset> Dimensions: (channels: 56, y: 101, x: 101, labels: 8, la_props: 2, cells: 0, features: 3) Coordinates: * cells (cells) int64 * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 * y (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 Data variables: _image (channels, y, x) uint8 5 4 5 4 5 4 5 4 4 ... 2 2 2 1 2 2 2 2 _la_properties (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)' _obs (cells, features) float64 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Note that we can also pass channels
and x, y
coordinates at the same time.
[6]:
ds.pp[['CD4', 'CD8'], 50:150, 50:150]
[6]:
<xarray.Dataset> Dimensions: (channels: 2, y: 101, x: 101, labels: 8, la_props: 2, cells: 0, features: 3) Coordinates: * cells (cells) int64 * channels (channels) <U11 'CD4' 'CD8' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 * x (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 * y (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150 Data variables: _image (channels, y, x) uint8 0 0 0 0 0 0 1 1 0 ... 2 1 1 1 1 1 4 1 _la_properties (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)' _obs (cells, features) float64 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Slicing Labels¶
The labels accessor .la
allows to select specific cell types by their label number or name.
[7]:
ds.la[4]
[7]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2, cells: 1073, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 4 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 3 11 49 71 80 ... 12504 12516 12554 12558 Data variables: _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _la_properties (labels, la_props) <U20 'C2' 'Lymphatic (PDPN)' _obs (cells, features) float64 4.0 774.5 ... 2.266e+03 2.232e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
[8]:
ds.la['T (CD3)']
[8]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2, cells: 5891, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 7 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522 Data variables: _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _la_properties (labels, la_props) <U20 'C0' 'T (CD3)' _obs (cells, features) float64 7.0 613.3 ... 2.346e+03 1.865e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Again it is possible to pass multiple cell labels.
[9]:
ds.la[4, 5, 6]
[9]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 3, la_props: 2, cells: 3571, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 4 5 6 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 2 3 5 6 9 11 ... 12539 12554 12555 12557 12558 Data variables: _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _la_properties (labels, la_props) <U20 'C2' ... 'Stroma (CD90)' _obs (cells, features) float64 5.0 769.1 ... 2.266e+03 2.232e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Finally, we can select all cells except a cell type using la.deselect
.
[10]:
ds.la.deselect([1])
[10]:
<xarray.Dataset> Dimensions: (channels: 56, y: 3000, x: 3000, labels: 7, la_props: 2, cells: 10488, features: 3) Coordinates: * channels (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67' * features (features) <U10 '_labels' 'centroid-0' 'centroid-1' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 2 3 4 5 6 7 8 * x (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * cells (cells) int64 1 2 3 4 5 6 ... 10484 10485 10486 10487 10488 Data variables: _image (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2 _la_properties (labels, la_props) <U20 '#FFFF00' ... 'Vascular (CD31+CD34)' _obs (cells, features) float64 7.0 613.3 ... 2.266e+03 2.232e+03 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
Slicing Neighborhoods¶
We can also select by neighborhoods with the nh
accessor. The syntax is identical to the one in the label subsetting.
[13]:
ds = xr.load_dataset('../../data/sample_1_with_neighborhoods.zarr', engine='zarr')
ds
[13]:
<xarray.Dataset> Dimensions: (cells: 6901, celltype_levels: 3, channels: 56, y: 2000, x: 2000, labels: 9, la_props: 2, neighborhoods: 4, nh_props: 2, features: 18) Coordinates: * cells (cells) int64 1 2 3 4 5 ... 6897 6898 6899 6900 6901 * celltype_levels (celltype_levels) <U8 'labels' 'labels_1' 'labels_2' * channels (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38' * features (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 9 * neighborhoods (neighborhoods) int64 0 1 3 4 * nh_props (nh_props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 * y (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 Data variables: _celltype_predictions (cells, celltype_levels) <U11 'T' 'T' ... 'T_tox' _image (channels, y, x) uint8 1 1 1 0 1 1 0 ... 3 1 2 3 2 3 _la_properties (labels, la_props) <U11 '#e6194B' 'B' ... 'T' _nh_properties (neighborhoods, nh_props) <U14 'lightgreen' ... 'N... _obs (cells, features) float64 1.0 1.0 0.0 ... 904.5 0.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0
[14]:
# subsetting only neighborhood 0
ds.nh[0]
[14]:
<xarray.Dataset> Dimensions: (cells: 709, celltype_levels: 3, channels: 56, y: 2000, x: 2000, labels: 9, la_props: 2, neighborhoods: 1, nh_props: 2, features: 18) Coordinates: * celltype_levels (celltype_levels) <U8 'labels' 'labels_1' 'labels_2' * channels (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38' * features (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67' * la_props (la_props) <U6 '_color' '_name' * labels (labels) int64 1 2 3 4 5 6 7 8 9 * neighborhoods (neighborhoods) int64 0 * nh_props (nh_props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 * y (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999 * cells (cells) int64 1571 1858 1895 1912 ... 6204 6247 6293 Data variables: _celltype_predictions (cells, celltype_levels) <U11 'Dendritic' ... 'T' _image (channels, y, x) uint8 1 1 1 0 1 1 0 ... 3 1 2 3 2 3 _la_properties (labels, la_props) <U11 '#e6194B' 'B' ... 'T' _nh_properties (neighborhoods, nh_props) <U14 'lightgreen' 'Neigh... _obs (cells, features) float64 0.0 1.0 0.0 ... 395.1 0.0 _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0