The preprocessing (pp) accessor

The preprocessing accessor provides several methods to subset and process image data.

class spatialproteomics.pp.preprocessing.PreprocessingAccessor(xarray_obj)

The image accessor enables fast indexing and preprocessing of the spatialproteomics object.

add_channel(channels: str | list, array: ndarray) Dataset

Adds channel(s) to an existing image container.

Parameters:
  • channels (Union[str, list]) – The name of the channel or a list of channel names to be added.

  • array (np.ndarray) – The numpy array representing the channel(s) to be added.

Returns:

The updated image container with added channel(s).

Return type:

xr.Dataset

add_feature(feature_name: str, feature_values: list | ndarray)

Adds a feature to the image container.

Parameters:
  • feature_name (str) – The name of the feature to be added.

  • feature_values – The values of the feature to be added.

Returns:

The updated image container with the added feature.

Return type:

xr.Dataset

add_layer(array: ndarray, key_added: str = '_mask') Dataset

Adds a layer (such as a mask highlighting artifacts) to the xarray dataset.

Parameters:
  • array (np.ndarray) – The array representing the layer to be added. Can either be 2D or 3D (in this case, the first dimension should be the number of channels).

  • key_added (str, optional) – The name of the added layer in the xarray dataset. Default is ‘_mask’.

Returns:

The updated dataset with the added layer.

Return type:

xr.Dataset

Raises:

AssertionError – If the array is not 2-dimensional or its shape does not match the image shape.

Notes

This method adds a layer to the xarray dataset, where the layer has the same shape as the image field. The array should be a 2-dimensional numpy array representing the segmentation mask or layer to be added. The layer is created as a DataArray with the same coordinates and dimensions as the image field. The name of the added layer in the xarray dataset can be specified using the key_added parameter. The amended xarray dataset is returned after merging the original dataset with the new layer.

add_layer_from_dataframe(df: DataFrame, key_added: str = '_la_layers') Dataset

Adds a dataframe as a layer to the xarray object. This is similar to add_obs, with the only difference that it can be used to add any kind of data to the xarray object. Useful to add things like string-based labels or other metadata.

Parameters:

df (pd.DataFrame) – A dataframe with the observation values.

Returns:

The amended image container.

Return type:

xr.Dataset

add_obs_from_dataframe(df: DataFrame) Dataset

Adds an observation table to the image container. Columns of the dataframe have to match the feature coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.

Parameters:

df (pd.DataFrame) – A dataframe with the observation values.

Returns:

The amended image container.

Return type:

xr.Dataset

add_observations(properties: str | list | tuple = ('label', 'centroid'), layer_key: str = '_segmentation', return_xarray: bool = False) Dataset

Adds properties derived from the segmentation mask to the image container.

Parameters:
  • properties (Union[str, list, tuple]) – A list of properties to be added to the image container. See skimage.measure.regionprops_table for a list of available properties.

  • layer_key (str) – The key of the layer that contains the segmentation mask.

  • return_xarray (bool) – If true, the function returns an xarray.DataArray with the properties instead of adding them to the image container.

Returns:

The amended image container.

Return type:

xr.Dataset

add_quantification(func: str | Callable = 'intensity_mean', key_added: str = '_intensity', layer_key: str = '_image', return_xarray=False) Dataset

Quantify channel intensities over the segmentation mask.

Parameters:
  • func (Callable or str, optional) – The function used for quantification. Can either be a string to specify a function from skimage.measure.regionprops_table or a custom function. Default is ‘intensity_mean’.

  • key_added (str, optional) – The key under which the quantification data will be stored in the image container. Default is ‘_intensity’.

  • layer_key (str, optional) – The key of the layer to be quantified. Default is ‘_image’.

  • return_xarray (bool, optional) – If True, the function returns an xarray.DataArray with the quantification data instead of adding it to the image container.

Returns:

The updated image container with added quantification data or the quantification data as a separate xarray.DataArray.

Return type:

xr.Dataset or xr.DataArray

add_quantification_from_dataframe(df: DataFrame, key_added: str = '_intensity') Dataset

Adds an observation table to the image container. Columns of the dataframe have to match the channel coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.

Parameters:
  • df (pd.DataFrame) – A dataframe with the quantification values.

  • key_added (str, optional) – The key under which the quantification data will be added to the image container.

Returns:

The amended image container.

Return type:

xr.Dataset

add_segmentation(segmentation: str | ndarray | None = None, reindex: bool = True, keep_labels: bool = True) Dataset

Adds a segmentation mask field to the xarray dataset. This will be stored in the ‘_segmentation’ layer.

Parameters:
  • segmentation (str or np.ndarray) – A segmentation mask, i.e., a np.ndarray with image.shape = (x, y), that indicates the location of each cell, or a layer key.

  • mask_growth (int) – The number of pixels by which the segmentation mask should be grown.

  • reindex (bool) – If true the segmentation mask is relabeled to have continuous numbers from 1 to n.

  • keep_labels (bool) – When using cellpose on multiple channels, you may already get some initial celltype annotations from those. If you want to keep those annotations, set this to True. Default is True.

Returns:

The amended xarray.

Return type:

xr.Dataset

apply(func: Callable, key: str = '_image', key_added: str = '_image', **kwargs)

Apply a function to each channel independently.

Parameters:
  • func (Callable) – The function to apply to the layer.

  • key (str) – The key of the layer to apply the function to. Default is ‘_image’.

  • key_added (str) – The key under which the updated layer will be stored. Default is ‘_image’ (i. e. the original image will be overwritten).

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the function.

Returns:

The updated image container with the applied function.

Return type:

xr.Dataset

convert_to_8bit(key: str = '_image', key_added: str = '_image')

Convert the image to 8-bit.

Parameters:
  • key (str) – The key of the image layer in the object. Default is ‘_image’.

  • key_added (str) – The key to assign to the 8-bit image in the object. Default is ‘_image’, which overwrites the original image.

Returns:

The object with the image converted to 8-bit.

Return type:

xr.Dataset

downsample(rate: int)

Downsamples the image and segmentation mask in the object by a given rate.

Parameters:

rate (int) – The downsampling rate. Only every rate-th pixel will be kept.

Returns:

The downsampled object containing the updated image and segmentation mask.

Return type:

xr.Dataset

Raises:

AssertionError – If no image layer is found in the object.

drop_layers(layers: str | list | None = None, keep: str | list | None = None, drop_obs: bool = True, suppress_warnings: bool = False) Dataset

Drops layers from the image container. Can either drop all layers specified in layers or drop all layers but the ones specified in keep.

Parameters:
  • layers (Union[str, list]) – The name of the layer or a list of layer names to be dropped.

  • keep (Union[str, list]) – The name of the layer or a list of layer names to be kept.

  • drop_obs (bool) – If True, the observations are removed when the label or neighborhood properties are dropped. Default is True.

  • suppress_warnings (bool) – If True, warnings are suppressed. Default is False.

Returns:

The updated image container with dropped layers.

Return type:

xr.Dataset

filter_by_obs(col: str, func: Callable, segmentation_key: str = '_segmentation')

Filter the object by observations based on a given feature and filtering function.

Parameters:
  • col (str) – The name of the feature to filter by.

  • func (Callable) – A filtering function that takes in the values of the feature and returns a boolean array.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.

Returns:

The filtered object with the selected cells and updated segmentation mask.

Return type:

xr.Dataset

Raises:

AssertionError – If the feature does not exist in the object’s observations.

Notes

  • This method filters the object by selecting only the cells that satisfy the filtering condition.

  • It also updates the segmentation mask to remove cells that are not selected and relabels the remaining cells.

Example

To filter the object by the feature “area” and keep only the cells with an area greater than 70px: obj = obj.pp.add_observations(‘area’).pp.filter_by_obs(‘area’, lambda x: x > 70)

get_bbox(x_slice: slice, y_slice: slice) Dataset

Returns the bounds of the image container.

Parameters:
  • x_slice (slice) – The slice representing the x-coordinates for the bounding box.

  • y_slice (slice) – The slice representing the y-coordinates for the bounding box.

Returns:

The updated image container.

Return type:

xr.Dataset

get_channels(channels: List[str] | str) Dataset

Retrieve the specified channels from the dataset.

Parameters:

channels (Union[List[str], str]) – The channels to retrieve. Can be a single channel name or a list of channel names.

Returns:

The dataset containing the specified channels.

Return type:

xr.Dataset

get_disconnected_cell() int

Returns the first disconnected cell from the segmentation layer.

Returns:

The first disconnected cell from the segmentation layer.

Return type:

np.ndarray

get_layer_as_df(layer: str = '_obs', celltypes_to_str: bool = True, neighborhoods_to_str: bool = True, idx_to_str: bool = False) DataFrame

Returns the specified layer as a pandas DataFrame.

Parameters:
  • layer (str) – The name of the layer to retrieve. Defaults to Layers.OBS.

  • celltypes_to_str (bool) – Whether to convert celltype labels to strings. Defaults to True.

  • neighborhoods_to_str (bool) – Whether to convert neighborhood labels to strings. Defaults to True.

  • idx_to_str (bool) – Whether to convert the index to strings. Defaults to False.

Returns:

The layer data as a DataFrame.

Return type:

pandas.DataFrame

grow_cells(iterations: int = 2, handle_disconnected: str = 'ignore', suppress_warning: bool = False) Dataset

Grows the segmentation masks by expanding the labels in the object.

Parameters:
  • iterations (int) – The number of iterations to grow the segmentation masks. Default is 2.

  • handle_disconnected (str) – The mode to handle disconnected segmentation masks. Options are “ignore”, “remove”, or “fill”. Default is “ignore”.

  • suppress_warning (bool) – Whether to suppress the warning about recalculating the observations. Used internally, default is False.

Raises:

ValueError – If the object does not contain a segmentation mask.

Returns:

The object with the grown segmentation masks and updated observations.

Return type:

xr.Dataset

mask_cells(mask_key: str = '_mask', segmentation_key='_segmentation') Dataset

Mask cells in the segmentation mask.

Parameters:
  • mask_key (str) – The key of the mask to use for masking.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.

Returns:

The object with the masked cells in the segmentation mask.

Return type:

xr.Dataset

mask_region(key: str = '_mask', image_key='_image', key_added='_image') Dataset

Mask a region in the image.

Parameters:
  • key (str) – The key of the region to mask.

  • image_key (str) – The key of the image layer in the object. Default is Layers.IMAGE.

  • key_added (str) – The key to assign to the masked image in the object. Default is Layers.IMAGE, which overwrites the original image.

Returns:

The object with the masked region in the image.

Return type:

xr.Dataset

merge_segmentation(layer_key: str, key_added: str = '_merged_segmentation', labels: str | List[str] | None = None, threshold: float = 0.8)

Merge segmentation masks. This can be done in two ways: either by merging a multi-dimensional array from the object directly, or by adding a numpy array. You can either just merge a multi-dimensional array, or merge to an existing 1D mask (e. g. a precomputed DAPI segmentation).

Parameters:
  • array (np.ndarray) – The array containing the segmentation masks to be merged. It can be 2D or 3D.

  • from_key (str) – The key of the segmentation mask in the xarray object to be merged.

  • labels (Optional[Union[str, List[str]]])) – Optional. The labels corresponding to each segmentation mask in the array. If provided, the number of labels must match the number of arrays.

  • threshold (float)) – Optional. The threshold value for merging cells. Default is 1.0.

  • handle_disconnected (str) – Optional. The method to handle disconnected cells. Default is “relabel”.

  • key_base_segmentation (str) – Optional. The key of the base segmentation mask in the xarray object to merge to.

  • key_added (str)

  • "_segmentation". (Optional. The key under which the merged segmentation mask will be stored in the xarray object. Default is)

Returns:

The xarray object with the merged segmentation mask.

Return type:

xr.Dataset

Raises:
  • AssertionError – If no segmentation mask is found in the xarray object.:

  • AssertionError – If the input array is not 2D or 3D.:

  • AssertionError – If the input array is not of type int.:

  • AssertionError – If the shape of the input array does not match the shape of the segmentation mask.:

Notes

  • If the input array is 2D, it will be expanded to 3D.

  • If labels are provided, they need to match the number of arrays.

  • The merging process starts with merging the biggest cells first, then the smaller ones.

  • Disconnected cells in the input are handled based on the specified method.

normalize()

Performs a percentile normalization on each channel using the 3- and 99.8-percentile. Resulting values are in the range of 0 to 1.

Returns:

The image container with the normalized image stored in ‘_plot’.

Return type:

xr.Dataset

rescale(scale: int)

Rescales the image and segmentation mask in the object by a given scale.

Parameters:

scale (int) – The scale factor by which to rescale the image and segmentation mask.

Returns:

The rescaled object containing the updated image and segmentation mask.

Return type:

xr.Dataset

Raises:
  • - AssertionError – If no image layer is found in the object.:

  • - AssertionError – If no segmentation mask is found in the object.:

threshold(quantile: float | list | None = None, intensity: int | list | None = None, key_added: str | None = None, channels: str | list | None = None, shift: bool = True)

Apply thresholding to the image layer of the object. By default, shift is set to true. This means that the threshold value is subtracted from the image, and all negative values are set to 0. If you instead want to set all values below the threshold to 0 while retaining the rest of the image at the original values, set shift to False.

Parameters:
  • quantile (float) – The quantile value used for thresholding. If provided, the pixels below this quantile will be set to 0.

  • intensity (int) – The absolute intensity value used for thresholding. If provided, the pixels below this intensity will be set to 0.

  • key_added (Optional[str])) – The name of the new image layer after thresholding. If not provided, the original image layer will be replaced.

  • channels (Optional[Union[str, list]])) – The channels to apply the thresholding to. If None, the thresholding will be applied to all channels.

  • shift (bool) – If True, the thresholded image will be shifted so that values do not start at an arbitrary value. Default is True.

Returns:

The object with the thresholding applied to the image layer.

Return type:

xr.Dataset

Raises:

ValueError – If both quantile and intensity are None or if both quantile and intensity are provided.

transform_expression_matrix(method: str = 'arcsinh', key: str = '_intensity', key_added: str = '_intensity', cofactor: float = 5.0, min_percentile: float = 1.0, max_percentile: float = 99.0)

Transforms the expression matrix based on the specified mode.

Parameters:
  • method (str) – The transformation method. Available options are “arcsinh”, “zscore”, “minmax”, “double_zscore”, and “clip”.

  • key (str) – The key of the expression matrix in the object.

  • key_added (str) – The key to assign to the transformed matrix in the object.

  • cofactor (float) – The cofactor to use for the “arcsinh” transformation.

  • min_percentile (float) – The minimum percentile value to use for the “clip” transformation.

  • max_percentile (float) – The maximum percentile value to use for the “clip” transformation.

Returns:

The object with the transformed matrix added.

Return type:

xr.Dataset

Raises:
  • ValueError – If an unknown transformation mode is specified.

  • AssertionError – If no expression matrix is found at the specified layer.