Segmentation

Segmentation describes the process of finding cells in an image. This can be done either on the nucleus or on whole cell level. Spatialproteomics provides wrappers for StarDist, mesmer and cellpose.

[1]:
%reload_ext autoreload
%autoreload 2
[2]:
import spatialproteomics
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
[3]:
ds = xr.load_dataset('../../data/segmentation_example.zarr', engine='zarr').pp[1000:1300, 1000:1300].pp.drop_layers(keep='_image')
[4]:
ds
[4]:
<xarray.Dataset>
Dimensions:   (channels: 4, y: 301, x: 301)
Coordinates:
  * channels  (channels) <U11 'DAPI' 'Na/K ATPase' 'CD68' 'CD11c'
  * x         (x) int64 1000 1001 1002 1003 1004 ... 1296 1297 1298 1299 1300
  * y         (y) int64 1000 1001 1002 1003 1004 ... 1296 1297 1298 1299 1300
Data variables:
    _image    (channels, y, x) uint8 86 83 80 80 70 90 75 63 ... 9 5 5 3 5 6 2 4

Whole-Cell Segmentation with Cellpose

Cellpose can either perform segmentation on nuclei only or on whole cells, using a nuclear and a membrane marker. In this dataset, we have DAPI as a nuclear marker and Na/K ATPase as a membrane marker. After selecting only the relevant markers, we need to tell cellpose at what position each marker is. You can refer to the cellpose documentation for more details.

[5]:
# selecting only the relevant channels and running the segmentation
ds_cellpose = ds.pp[['Na/K ATPase', 'DAPI']].tl.cellpose(channel_settings=[1, 2])
ds_cellpose
Neither TORCH CUDA nor MPS version not installed/working.
>>>> using CPU
>>>> using CPU
>> cyto3 << model set to be used
>>>> loading model /home/meyerben/.cellpose/models/cyto3
/home/meyerben/meyerben/.conda/envs/tmp_env_2/lib/python3.10/site-packages/cellpose/resnet_torch.py:280: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(filename, map_location=torch.device("cpu"))
>>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
channels set to [1, 2]
~~~ ESTIMATING CELL DIAMETER(S) ~~~
estimated cell diameter(s) in 7.13 sec
>>> diameter(s) =
[ 17.97 ]
~~~ FINDING MASKS ~~~
>>>> TOTAL TIME 12.35 sec
[5]:
<xarray.Dataset>
Dimensions:                 (channels: 2, y: 301, x: 301)
Coordinates:
  * channels                (channels) <U11 'Na/K ATPase' 'DAPI'
  * x                       (x) int64 1000 1001 1002 1003 ... 1298 1299 1300
  * y                       (y) int64 1000 1001 1002 1003 ... 1298 1299 1300
Data variables:
    _image                  (channels, y, x) uint8 28 25 27 32 ... 64 59 48 35
    _cellpose_segmentation  (y, x) uint16 0 1 1 1 1 1 ... 324 324 324 324 324 0

You might have noticed a small problem. We have selected only the nuclear and membrane channel from our object. How can we get back the old channels? To do this, we can extract the segmentation from ds_cellpose, and add it into our ds object.

[6]:
ds = ds.pp.add_segmentation(ds_cellpose['_cellpose_segmentation'])
ds
[6]:
<xarray.Dataset>
Dimensions:        (channels: 4, y: 301, x: 301, cells: 330, features: 2)
Coordinates:
  * channels       (channels) <U11 'DAPI' 'Na/K ATPase' 'CD68' 'CD11c'
  * x              (x) int64 1000 1001 1002 1003 1004 ... 1297 1298 1299 1300
  * y              (y) int64 1000 1001 1002 1003 1004 ... 1297 1298 1299 1300
  * cells          (cells) int64 1 2 3 4 5 6 7 8 ... 324 325 326 327 328 329 330
  * features       (features) <U10 'centroid-0' 'centroid-1'
Data variables:
    _image         (channels, y, x) uint8 86 83 80 80 70 90 75 ... 5 5 3 5 6 2 4
    _segmentation  (y, x) int64 0 1 1 1 1 1 1 1 ... 324 324 324 324 324 324 0
    _obs           (cells, features) float64 1.005e+03 1.011e+03 ... 1.003e+03
[7]:
# for plotting, we subselect the two markers from before, but you could also show the other markers
_ = ds.pp[['Na/K ATPase', 'DAPI']].pl.colorize(['deepskyblue', 'gold']).pl.show(render_segmentation=True)
../_images/notebooks_Segmentation_9_0.png
[8]:
# dropping the segmentation and obs again, so that we can use the dataset for the next example
ds = ds.pp.drop_layers(keep='_image')

Whole-Cell Segmentation with Mesmer

You can also do the same thing with mesmer. Note however that mesmer expects the channels to be in the order [nuclear, membrane].

[10]:
# here, we add the segmentation directly to ds_mesmer instead of ds
# note that whenever we want to make a segmentation the "default" segmentation of an object, we need to call 'pp.add_segmentation()'
ds_mesmer = ds.pp[['DAPI', 'Na/K ATPase']].tl.mesmer().pp.add_segmentation('_mesmer_segmentation')
# plotting the result
_ = ds_mesmer.pl.colorize(['gold', 'deepskyblue']).pl.show(render_segmentation=True)
2024-09-17 08:03:16.779560: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /g/huber/users/meyerben/.conda/envs/tmp_env_2/lib/python3.10/site-packages/cv2/../../lib64:
2024-09-17 08:03:16.779592: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Checking for cached data
Checking MultiplexSegmentation-9.tar.gz against provided file_hash...
MultiplexSegmentation-9.tar.gz with hash a1dfbce2594f927b9112f23a0a1739e0 already available.
Extracting /home/meyerben/.deepcell/models/MultiplexSegmentation-9.tar.gz
Successfully extracted /home/meyerben/.deepcell/models/MultiplexSegmentation-9.tar.gz into /home/meyerben/.deepcell/models
2024-09-17 08:04:04.578971: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /g/huber/users/meyerben/.conda/envs/tmp_env_2/lib/python3.10/site-packages/cv2/../../lib64:
2024-09-17 08:04:04.579002: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2024-09-17 08:04:04.579038: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (jupyter-meyerben): /proc/driver/nvidia/version does not exist
2024-09-17 08:04:04.619456: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
No training configuration found in save file, so the model was *not* compiled. Compile it manually.
Converting image dtype to float
../_images/notebooks_Segmentation_12_1.png

Nuclear Segmentation with StarDist

Sometimes, you may not have a membrane stain available. In that case, you can perform nuclear segmentation. This is possible with cellpose (simply call tl.cellpose(channel='DAPI')) and StarDist. The following shows and example of how StarDist performs on such a task.

[12]:
# running the segmentation
ds_stardist = ds.tl.stardist(channel='DAPI', key_added='_stardist_segmentation')
Found model '2D_versatile_fluo' for 'StarDist2D'.
Loading network weights from 'weights_best.h5'.
Loading thresholds from 'thresholds.json'.
Using default values: prob_thresh=0.479071, nms_thresh=0.3.
100%|██████████| 144/144 [00:08<00:00, 17.28it/s]

Since it is possible to have multiple segmentations in a spatialproteomics object, we need to tell the package which segmentation to treat as the default one. This is done with pp.add_segmentation().

[13]:
# adding the segmentation as default segmentation
ds_stardist = ds_stardist.pp.add_segmentation('_stardist_segmentation')
[14]:
# plotting the segmentation
fix, ax = plt.subplots(1, 2, figsize=(16, 8))
_ = ds_stardist.pp['DAPI'].pl.colorize('orange').pl.show(ax=ax[0])
_ = ds_stardist.pp['DAPI'].pl.colorize('orange').pl.show(render_segmentation=True, ax=ax[1])
../_images/notebooks_Segmentation_17_0.png

You might have noticed that StarDist tends to segment a lot of small cells. Let’s look at the distribution of cell sizes, and then remove all nuclei under a certain size.

[15]:
# getting a histogram of the areas
_ = plt.hist(ds_stardist.pp.add_observations('area').pp.get_layer_as_df()['area'], bins=100)
Found _obs in image container. Concatenating.
../_images/notebooks_Segmentation_19_1.png
[16]:
# removing cells which have an area below 50 pixels
ds_stardist = ds_stardist.pp.add_observations('area').pp.filter_by_obs('area', lambda x: x > 50)

# plotting the segmentation
fix, ax = plt.subplots(1, 2, figsize=(16, 8))
_ = ds_stardist.pp['DAPI'].pl.colorize('orange').pl.show(ax=ax[0])
_ = ds_stardist.pp['DAPI'].pl.colorize('orange').pl.show(render_segmentation=True, ax=ax[1])
Found _obs in image container. Concatenating.
../_images/notebooks_Segmentation_20_1.png

To try to capture full cells, we can now grow the segmentation masks, e. g. by two pixels into each direction. If two masks collide, the masks will stop growing into that direction.

[17]:
ds_stardist_grown = ds_stardist.pp.grow_cells(2)
Mask growing requires recalculation of the observations. All features other than the centroids will be removed and should be recalculated with pp.add_observations().
[18]:
# plotting the segmentation
fix, ax = plt.subplots(1, 2, figsize=(16, 8))
_ = ds_stardist.pp['DAPI'].pl.colorize('orange').pl.show(render_segmentation=True, ax=ax[0])
_ = ds_stardist_grown.pp['DAPI'].pl.colorize('orange').pl.show(render_segmentation=True, ax=ax[1])
ax[0].set_title('StarDist segmentation')
ax[1].set_title('Grown masks')
plt.tight_layout()
../_images/notebooks_Segmentation_23_0.png

Segmenting Multiple Channels with Cellpose

Next to predicting nuclei or whole cells using a universal membrane marker, you can also use cellpose to try to predict other cell types. In the following, we will see how we can use cellpose to segment three different channels independently and then merge the segmentation masks into one mask. There are some hyperparameters in cellpose which can help the prediction.

[19]:
# segmenting on DAPI, CD11c (dendritic cells) and CD68 (macrophages)
# note that when merging the masks later, the order matters (channels take precedence by position in the subsetting)
ds_cellpose = ds.pp[['CD11c', 'CD68', 'DAPI']].tl.cellpose(key_added='_cellpose_segmentation')
Neither TORCH CUDA nor MPS version not installed/working.
>>>> using CPU
>>>> using CPU
>> cyto3 << model set to be used
>>>> loading model /home/meyerben/.cellpose/models/cyto3
resnet_torch.py (280): You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
>>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
Performing independent segmentation on all markers. If you want to perform joint segmentation, please set the channel_settings argument appropriately.
channels set to [0, 0]
~~~ ESTIMATING CELL DIAMETER(S) ~~~
estimated cell diameter(s) in 3.65 sec
>>> diameter(s) =
[ 24.81 ]
~~~ FINDING MASKS ~~~
>>>> TOTAL TIME 7.21 sec
channels set to [0, 0]
~~~ ESTIMATING CELL DIAMETER(S) ~~~
estimated cell diameter(s) in 3.25 sec
>>> diameter(s) =
[ 20.71 ]
~~~ FINDING MASKS ~~~
>>>> TOTAL TIME 8.20 sec
channels set to [0, 0]
~~~ ESTIMATING CELL DIAMETER(S) ~~~
estimated cell diameter(s) in 2.69 sec
>>> diameter(s) =
[ 13.06 ]
~~~ FINDING MASKS ~~~
>>>> TOTAL TIME 7.02 sec
[20]:
ds_cellpose
[20]:
<xarray.Dataset>
Dimensions:                 (channels: 3, y: 301, x: 301)
Coordinates:
  * channels                (channels) <U11 'CD11c' 'CD68' 'DAPI'
  * x                       (x) int64 1000 1001 1002 1003 ... 1298 1299 1300
  * y                       (y) int64 1000 1001 1002 1003 ... 1298 1299 1300
Data variables:
    _image                  (channels, y, x) uint8 2 2 2 2 2 ... 74 64 59 48 35
    _cellpose_segmentation  (channels, y, x) uint16 0 0 0 0 0 0 ... 320 0 0 0 0
[21]:
# merging the segmentation masks into a single segmentation
# the threshold value tells you how much of a mask need to overlap for the less important one to be removed
ds_cellpose_merged = ds_cellpose.pp.merge_segmentation('_cellpose_segmentation', labels=['Dendritic Cell', 'Macrophage', 'Lymphocyte'], threshold=0.8)
[22]:
ds_cellpose_merged
[22]:
<xarray.Dataset>
Dimensions:                 (channels: 3, y: 301, x: 301)
Coordinates:
  * channels                (channels) <U11 'CD11c' 'CD68' 'DAPI'
  * x                       (x) int64 1000 1001 1002 1003 ... 1298 1299 1300
  * y                       (y) int64 1000 1001 1002 1003 ... 1298 1299 1300
Data variables:
    _image                  (channels, y, x) uint8 2 2 2 2 2 ... 74 64 59 48 35
    _cellpose_segmentation  (channels, y, x) uint16 0 0 0 0 0 0 ... 320 0 0 0 0
    _merged_segmentation    (y, x) uint16 0 0 0 0 0 25 25 25 ... 0 0 0 0 0 0 0 0
[23]:
# adding the merged segmentation as the main segmentation in the object
ds_cellpose_merged = ds_cellpose_merged.pp.add_segmentation('_merged_segmentation')
Found _obs in image container. Concatenating.
[24]:
ds_cellpose_merged
[24]:
<xarray.Dataset>
Dimensions:                 (labels: 3, props: 2, channels: 3, x: 301, y: 301,
                             cells: 296, features: 3)
Coordinates:
  * labels                  (labels) int64 1 2 3
  * props                   (props) <U6 '_color' '_name'
  * channels                (channels) <U11 'CD11c' 'CD68' 'DAPI'
  * x                       (x) int64 1000 1001 1002 1003 ... 1298 1299 1300
  * y                       (y) int64 1000 1001 1002 1003 ... 1298 1299 1300
  * cells                   (cells) int64 1 2 3 4 5 6 ... 292 293 294 295 296
  * features                (features) <U10 '_labels' 'centroid-0' 'centroid-1'
Data variables:
    _properties             (labels, props) object '#1CE6FF' ... 'Macrophage'
    _image                  (channels, y, x) uint8 2 2 2 2 2 ... 74 64 59 48 35
    _cellpose_segmentation  (channels, y, x) uint16 0 0 0 0 0 0 ... 320 0 0 0 0
    _merged_segmentation    (y, x) uint16 0 0 0 0 0 25 25 25 ... 0 0 0 0 0 0 0 0
    _obs                    (cells, features) float64 1.0 1.04e+03 ... 1.003e+03
    _segmentation           (y, x) int64 0 0 0 0 0 25 25 25 ... 0 0 0 0 0 0 0 0
[25]:
fig, ax = plt.subplots(1, 2, figsize=(16, 8))

# adding colors to match the channel colors
colors = ['orange', 'deepskyblue', 'darkred']
ds_cellpose_merged = ds_cellpose_merged.la.set_label_colors(['Lymphocyte', 'Dendritic Cell', 'Macrophage'], colors)

# merged masks
_ = ds_cellpose_merged.pp[['DAPI', 'CD11c', 'CD68']].pl.colorize(colors=colors).pl.show(render_segmentation=True, ax=ax[0])
_ = ds_cellpose_merged.pl.show(render_image=False, render_segmentation=True, render_labels=True, legend_label=True, ax=ax[1])
../_images/notebooks_Segmentation_31_0.png

Note that this method is far from perfect, and does not always pick up on all masks. In this case, you might want to look into fine-tuning cellpose to your celltype of interest. You can also tweak the cellpose parameters to try to improve the segmentation.