Subselecting Data

[1]:
%reload_ext autoreload
%autoreload 2

import spatialproteomics
import pandas as pd
import xarray as xr
xr.set_options(display_style='text')
[1]:
<xarray.core.options.set_options at 0x7f75c7593e80>

One of the key features of spatialproteomics is the ability to slice our image data quickly and intuitively. We start by loading our spatialproteomics object.

[2]:
ds = xr.load_dataset('../../data/BNHL_166_4_I2_LK_2.zarr', engine='zarr')

Slicing Channels and Spatial Coordinates

To slice specific channels of the image we simply use .pp accessor together with the familiar bracket [] indexing.

[3]:
ds.pp['CD4']
[3]:
<xarray.Dataset>
Dimensions:         (channels: 1, y: 3000, x: 3000, labels: 8, la_props: 2,
                     cells: 12560, features: 3)
Coordinates:
  * cells           (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels        (channels) <U11 'CD4'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _image          (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 0 0 0 0 1 0
    _la_properties  (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)'
    _obs            (cells, features) float64 7.0 613.3 ... 2.249e+03 2.237e+03
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

We can also select multiple channels by simply passing a list to the .pp accessor. As we will see later, this makes visualising image overlays easy.

[4]:
ds.pp[['CD4', 'CD8']]
[4]:
<xarray.Dataset>
Dimensions:         (channels: 2, y: 3000, x: 3000, labels: 8, la_props: 2,
                     cells: 12560, features: 3)
Coordinates:
  * cells           (cells) int64 1 2 3 4 5 6 ... 12556 12557 12558 12559 12560
  * channels        (channels) <U11 'CD4' 'CD8'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
Data variables:
    _image          (channels, y, x) uint8 1 1 2 0 0 0 0 1 1 ... 1 1 2 1 1 1 2 1
    _la_properties  (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)'
    _obs            (cells, features) float64 7.0 613.3 ... 2.249e+03 2.237e+03
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

The .pp accessor also understands x and y coordinates. When x and y coordinates are sliced, we get ridd of all cells that do not belong to the respective image slice.

[5]:
ds.pp[50:150, 50:150]
[5]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 101, x: 101, labels: 8, la_props: 2,
                     cells: 0, features: 3)
Coordinates:
  * cells           (cells) int64
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
  * y               (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
Data variables:
    _image          (channels, y, x) uint8 5 4 5 4 5 4 5 4 4 ... 2 2 2 1 2 2 2 2
    _la_properties  (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)'
    _obs            (cells, features) float64
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Note that we can also pass channels and x, y coordinates at the same time.

[6]:
ds.pp[['CD4', 'CD8'], 50:150, 50:150]
[6]:
<xarray.Dataset>
Dimensions:         (channels: 2, y: 101, x: 101, labels: 8, la_props: 2,
                     cells: 0, features: 3)
Coordinates:
  * cells           (cells) int64
  * channels        (channels) <U11 'CD4' 'CD8'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 1 2 3 4 5 6 7 8
  * x               (x) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
  * y               (y) int64 50 51 52 53 54 55 56 ... 145 146 147 148 149 150
Data variables:
    _image          (channels, y, x) uint8 0 0 0 0 0 0 1 1 0 ... 2 1 1 1 1 1 4 1
    _la_properties  (labels, la_props) <U20 'C1' ... 'Vascular (CD31+CD34)'
    _obs            (cells, features) float64
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Slicing Labels

The labels accessor .la allows to select specific cell types by their label number or name.

[7]:
ds.la[4]
[7]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2,
                     cells: 1073, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 4
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 3 11 49 71 80 ... 12504 12516 12554 12558
Data variables:
    _image          (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _la_properties  (labels, la_props) <U20 'C2' 'Lymphatic (PDPN)'
    _obs            (cells, features) float64 4.0 774.5 ... 2.266e+03 2.232e+03
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0
[8]:
ds.la['T (CD3)']
[8]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 1, la_props: 2,
                     cells: 5891, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 7
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 1 4 8 10 12 ... 12444 12469 12473 12505 12522
Data variables:
    _image          (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _la_properties  (labels, la_props) <U20 'C0' 'T (CD3)'
    _obs            (cells, features) float64 7.0 613.3 ... 2.346e+03 1.865e+03
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Again it is possible to pass multiple cell labels.

[9]:
ds.la[4, 5, 6]
[9]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 3, la_props: 2,
                     cells: 3571, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 4 5 6
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 2 3 5 6 9 11 ... 12539 12554 12555 12557 12558
Data variables:
    _image          (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _la_properties  (labels, la_props) <U20 'C2' ... 'Stroma (CD90)'
    _obs            (cells, features) float64 5.0 769.1 ... 2.266e+03 2.232e+03
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Finally, we can select all cells except a cell type using la.deselect.

[10]:
ds.la.deselect([1])
[10]:
<xarray.Dataset>
Dimensions:         (channels: 56, y: 3000, x: 3000, labels: 7, la_props: 2,
                     cells: 10488, features: 3)
Coordinates:
  * channels        (channels) <U11 'DAPI' 'Helios' 'CD10' ... 'CD79a' 'Ki-67'
  * features        (features) <U10 '_labels' 'centroid-0' 'centroid-1'
  * la_props        (la_props) <U6 '_color' '_name'
  * labels          (labels) int64 2 3 4 5 6 7 8
  * x               (x) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * y               (y) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * cells           (cells) int64 1 2 3 4 5 6 ... 10484 10485 10486 10487 10488
Data variables:
    _image          (channels, y, x) uint8 4 4 4 4 5 4 4 3 4 ... 2 2 2 2 2 2 2 2
    _la_properties  (labels, la_props) <U20 '#FFFF00' ... 'Vascular (CD31+CD34)'
    _obs            (cells, features) float64 7.0 613.3 ... 2.266e+03 2.232e+03
    _segmentation   (y, x) int64 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0

Slicing Neighborhoods

We can also select by neighborhoods with the nh accessor. The syntax is identical to the one in the label subsetting.

[13]:
ds = xr.load_dataset('../../data/sample_1_with_neighborhoods.zarr', engine='zarr')
ds
[13]:
<xarray.Dataset>
Dimensions:                (cells: 6901, celltype_levels: 3, channels: 56,
                            y: 2000, x: 2000, labels: 9, la_props: 2,
                            neighborhoods: 4, nh_props: 2, features: 18)
Coordinates:
  * cells                  (cells) int64 1 2 3 4 5 ... 6897 6898 6899 6900 6901
  * celltype_levels        (celltype_levels) <U8 'labels' 'labels_1' 'labels_2'
  * channels               (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38'
  * features               (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67'
  * la_props               (la_props) <U6 '_color' '_name'
  * labels                 (labels) int64 1 2 3 4 5 6 7 8 9
  * neighborhoods          (neighborhoods) int64 0 1 3 4
  * nh_props               (nh_props) <U6 '_color' '_name'
  * x                      (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * y                      (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
Data variables:
    _celltype_predictions  (cells, celltype_levels) <U11 'T' 'T' ... 'T_tox'
    _image                 (channels, y, x) uint8 1 1 1 0 1 1 0 ... 3 1 2 3 2 3
    _la_properties         (labels, la_props) <U11 '#e6194B' 'B' ... 'T'
    _nh_properties         (neighborhoods, nh_props) <U14 'lightgreen' ... 'N...
    _obs                   (cells, features) float64 1.0 1.0 0.0 ... 904.5 0.0
    _segmentation          (y, x) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0
[14]:
# subsetting only neighborhood 0
ds.nh[0]
[14]:
<xarray.Dataset>
Dimensions:                (cells: 709, celltype_levels: 3, channels: 56,
                            y: 2000, x: 2000, labels: 9, la_props: 2,
                            neighborhoods: 1, nh_props: 2, features: 18)
Coordinates:
  * celltype_levels        (celltype_levels) <U8 'labels' 'labels_1' 'labels_2'
  * channels               (channels) <U11 'DAPI' 'TIM3' ... 'ki-67' 'CD38'
  * features               (features) <U14 'BCL-2' 'BCL-6' ... 'ki-67'
  * la_props               (la_props) <U6 '_color' '_name'
  * labels                 (labels) int64 1 2 3 4 5 6 7 8 9
  * neighborhoods          (neighborhoods) int64 0
  * nh_props               (nh_props) <U6 '_color' '_name'
  * x                      (x) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * y                      (y) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * cells                  (cells) int64 1571 1858 1895 1912 ... 6204 6247 6293
Data variables:
    _celltype_predictions  (cells, celltype_levels) <U11 'Dendritic' ... 'T'
    _image                 (channels, y, x) uint8 1 1 1 0 1 1 0 ... 3 1 2 3 2 3
    _la_properties         (labels, la_props) <U11 '#e6194B' 'B' ... 'T'
    _nh_properties         (neighborhoods, nh_props) <U14 'lightgreen' 'Neigh...
    _obs                   (cells, features) float64 0.0 1.0 0.0 ... 395.1 0.0
    _segmentation          (y, x) int64 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0