The preprocessing (pp) accessor

The preprocessing accessor provides several methods to subset and process image data.

class spatialproteomics.pp.preprocessing.PreprocessingAccessor(xarray_obj)

The image accessor enables fast indexing and preprocessing of the spatialproteomics object.

add_channel(channels: Union[str, list], array: ndarray) Dataset

Adds channel(s) to an existing image container.

Parameters
  • channels (Union[str, list]) – The name of the channel or a list of channel names to be added.

  • array (np.ndarray) – The numpy array representing the channel(s) to be added.

Returns

The updated image container with added channel(s).

Return type

xarray.Dataset

add_feature(feature_name: str, feature_values: Union[list, ndarray])

Adds a feature to the image container.

Parameters
  • feature_name (str) – The name of the feature to be added.

  • feature_values – The values of the feature to be added.

Returns

The updated image container with the added feature.

Return type

xr.Dataset

add_labels(labels: Optional[dict] = None) Dataset

Add labels from a mapping (cell -> label) to the spatialproteomics object.

labelsUnion[dict, None], optional

A dictionary containing cell labels as keys and corresponding labels as values. If None, a default labeling will be added. Default is None.

xr.Dataset

The spatialproteomics object with added labels.

This method converts the input dictionary into a pandas DataFrame and then adds the labels to the object using the pp.add_labels_from_dataframe method.

add_labels_from_dataframe(df: Optional[DataFrame] = None, cell_col: str = 'cell', label_col: str = 'label', colors: Optional[list] = None, names: Optional[list] = None) Dataset

Adds labels to the image container.

Parameters
  • df (Union[pd.DataFrame, None], optional) – A dataframe with the cell and label information. If None, a default labeling will be applied.

  • cell_col (str, optional) – The name of the column in the dataframe representing cell coordinates. Default is “cell”.

  • label_col (str, optional) – The name of the column in the dataframe representing cell labels. Default is “label”.

  • colors (Union[list, None], optional) – A list of colors corresponding to the cell labels. If None, random colors will be assigned. Default is None.

  • names (Union[list, None], optional) – A list of names corresponding to the cell labels. If None, default names will be assigned. Default is None.

Returns

The updated image container with added labels.

Return type

xr.Dataset

add_layer(array: ndarray, key_added: str = '_mask') Dataset
arraynp.ndarray

The array representing the segmentation mask or layer to be added.

key_addedstr, optional

The name of the added layer in the xarray dataset. Default is ‘_mask’.

Returns

Return type

The amended xarray dataset.

Raises

AssertionError – If the array is not 2-dimensional or its shape does not match the image shape.

Notes

This method adds a layer to the xarray dataset, where the layer has the same shape as the image field. The array should be a 2-dimensional numpy array representing the segmentation mask or layer to be added. The layer is created as a DataArray with the same coordinates and dimensions as the image field. The name of the added layer in the xarray dataset can be specified using the key_added parameter. The amended xarray dataset is returned after merging the original dataset with the new layer.

add_obs_from_dataframe(df: DataFrame) Dataset

Adds an observation table to the image container. Columns of the dataframe have to match the feature coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.

Parameters

df (pd.DataFrame) – A dataframe with the observation values.

Returns

The amended image container.

Return type

xr.DataSet

add_observations(properties: Union[str, list, tuple] = ('label', 'centroid'), layer_key: str = '_segmentation', return_xarray: bool = False) Dataset

Adds properties derived from the segmentation mask to the image container.

Parameters
  • properties (Union[str, list, tuple]) – A list of properties to be added to the image container. See skimage.measure.regionprops_table for a list of available properties.

  • layer_key (str) – The key of the layer that contains the segmentation mask.

  • return_xarray (bool) – If true, the function returns an xarray.DataArray with the properties instead of adding them to the image container.

Returns

The amended image container.

Return type

xr.DataSet

add_properties(array: Union[ndarray, list], prop: str = '_labels', return_xarray: bool = False) Dataset

Adds properties to the image container.

Parameters
  • array (Union[np.ndarray, list]) – An array or list of properties to be added to the image container.

  • prop (str, optional) – The name of the property. Default is Features.LABELS.

  • return_xarray (bool, optional) – If True, the function returns an xarray.DataArray with the properties instead of adding them to the image container.

Returns

The updated image container with added properties or the properties as a separate xarray.DataArray.

Return type

xr.Dataset or xr.DataArray

add_quantification(func: Union[str, Callable] = 'intensity_mean', key_added: str = '_intensity', return_xarray=False) Dataset

Quantify channel intensities over the segmentation mask.

Parameters
  • func (Callable or str, optional) – The function used for quantification. Can either be a string to specify a function from skimage.measure.regionprops_table or a custom function. Default is ‘intensity_mean’.

  • key_added (str, optional) – The key under which the quantification data will be stored in the image container. Default is Layers.INTENSITY.

  • return_xarray (bool, optional) – If True, the function returns an xarray.DataArray with the quantification data instead of adding it to the image container.

Returns

The updated image container with added quantification data or the quantification data as a separate xarray.DataArray.

Return type

xr.Dataset or xr.DataArray

add_quantification_from_dataframe(df: DataFrame, key_added: str = '_intensity') Dataset

Adds an observation table to the image container. Columns of the dataframe have to match the channel coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.

Parameters
  • df (pd.DataFrame) – A dataframe with the quantification values.

  • key_added (str, optional) – The key under which the quantification data will be added to the image container.

Returns

The amended image container.

Return type

xr.DataSet

add_segmentation(segmentation: Optional[Union[str, ndarray]] = None, reindex: bool = True, keep_labels: bool = True) Dataset

Adds a segmentation mask (_segmentation) field to the xarray dataset.

Parameters
  • segmentation (str or np.ndarray) – A segmentation mask, i.e., a np.ndarray with image.shape = (x, y), that indicates the location of each cell, or a layer key.

  • mask_growth (int) – The number of pixels by which the segmentation mask should be grown.

  • reindex (bool) – If true the segmentation mask is relabeled to have continuous numbers from 1 to n.

  • keep_labels (bool) – When using cellpose on multiple channels, you may already get some initial celltype annotations from those. If you want to keep those annotations, set this to True. Default is True.

  • Returns

  • --------

  • xr.Dataset – The amended xarray.

apply(func: Callable, key: str = '_image', key_added: str = '_image', **kwargs)

Apply a function to each channel independently.

Parameters
  • func (Callable) – The function to apply to the layer.

  • key (str) – The key of the layer to apply the function to. Default is Layers.IMAGE.

  • key_added (str) – The key under which the updated layer will be stored. Default is Layers.IMAGE (i. e. the original image will be overwritten).

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the function.

Returns

The updated image container with the applied function.

Return type

xr.Dataset

convert_to_8bit(key: str = '_image', key_added: str = '_image')

Convert the image to 8-bit.

Parameters
  • key (str) – The key of the image layer in the object. Default is Layers.IMAGE.

  • key_added (str) – The key to assign to the 8-bit image in the object. Default is Layers.IMAGE, which overwrites the original image.

Returns

The object with the image converted to 8-bit.

Return type

xr.Dataset

downsample(rate: int)

Downsamples the image and segmentation mask in the object by a given rate.

Parameters: - rate (int): The downsampling rate. Only every rate-th pixel will be kept.

Returns: - xr.Dataset: The downsampled object containing the updated image and segmentation mask.

Raises: - AssertionError: If no image layer is found in the object.

drop_layers(layers: Optional[Union[str, list]] = None, keep: Optional[Union[str, list]] = None) Dataset

Drops layers from the image container. Can either drop all layers specified in layers or drop all layers but the ones specified in keep.

Parameters
  • layers (Union[str, list]) – The name of the layer or a list of layer names to be dropped.

  • keep (Union[str, list]) – The name of the layer or a list of layer names to be kept.

Returns

The updated image container with dropped layers.

Return type

xr.Dataset

filter_by_obs(col: str, func: Callable, segmentation_key: str = '_segmentation')

Filter the object by observations based on a given feature and filtering function.

Parameters
  • col (str) – The name of the feature to filter by.

  • func (Callable) – A filtering function that takes in the values of the feature and returns a boolean array.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.

Returns

The filtered object with the selected cells and updated segmentation mask.

Return type

xr.Dataset

Raises

AssertionError – If the feature does not exist in the object’s observations.

Notes

  • This method filters the object by selecting only the cells that satisfy the filtering condition.

  • It also updates the segmentation mask to remove cells that are not selected and relabels the remaining cells.

Example

To filter the object by the feature “area” and keep only the cells with an area greater than 70px: >>> obj = obj.pp.add_observations(‘area’).pp.filter_by_obs(‘area’, lambda x: x > 70)

get_bbox(x_slice: slice, y_slice: slice) Dataset

Returns the bounds of the image container.

Parameters
  • x_slice (slice) – The slice representing the x-coordinates for the bounding box.

  • y_slice (slice) – The slice representing the y-coordinates for the bounding box.

  • Returns

  • --------

  • xarray.Dataset – The updated image container.

get_channels(channels: Union[List[str], str]) Dataset

Retrieve the specified channels from the dataset.

Parameters: channels (Union[List[str], str]): The channels to retrieve. Can be a single channel name or a list of channel names.

Returns: xr.Dataset: The dataset containing the specified channels.

get_disconnected_cell() int

Returns the first disconnected cell from the segmentation layer.

Returns

The first disconnected cell from the segmentation layer.

Return type

ndarray

get_layer_as_df(layer: str = '_obs', celltypes_to_str: bool = True, idx_to_str: bool = False) DataFrame

Returns the specified layer as a pandas DataFrame.

Parameters
  • layer (str) – The name of the layer to retrieve. Defaults to Layers.OBS.

  • celltypes_to_str (bool) – Whether to convert celltype labels to strings. Defaults to True.

  • idx_to_str (bool) – Whether to convert the index to strings. Defaults to False.

Returns

The layer data as a DataFrame.

Return type

pandas.DataFrame

grow_cells(iterations: int = 2, handle_disconnected: str = 'ignore') Dataset

Grows the segmentation masks by expanding the labels in the object.

Parameters: - iterations (int): The number of iterations to grow the segmentation masks. Default is 2. - handle_disconnected (str): The mode to handle disconnected segmentation masks. Options are “ignore”, “remove”, or “fill”. Default is “ignore”.

Raises: - ValueError: If the object does not contain a segmentation mask.

Returns: - obj (xarray.Dataset): The object with the grown segmentation masks and updated observations.

mask_cells(mask_key: str = '_mask', segmentation_key='_segmentation') Dataset

Mask cells in the segmentation mask.

Parameters
  • mask_key (str) – The key of the mask to use for masking.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.

Returns

The object with the masked cells in the segmentation mask.

Return type

xr.Dataset

mask_region(key: str = '_mask', image_key='_image', key_added='_image') Dataset

Mask a region in the image.

Parameters
  • key (str) – The key of the region to mask.

  • image_key (str) – The key of the image layer in the object. Default is Layers.IMAGE.

  • key_added (str) – The key to assign to the masked image in the object. Default is Layers.IMAGE, which overwrites the original image.

Returns

The object with the masked region in the image.

Return type

xr.Dataset

merge_segmentation(layer_key: str, key_added: str = '_merged_segmentation', labels: Optional[Union[str, List[str]]] = None, threshold: float = 0.8)

Merge segmentation masks. This can be done in two ways: either by merging a multi-dimensional array from the object directly, or by adding a numpy array. You can either just merge a multi-dimensional array, or merge to an existing 1D mask (e. g. a precomputed DAPI segmentation).

Parameters
  • array (np.ndarray) – The array containing the segmentation masks to be merged. It can be 2D or 3D.

  • from_key (str) – The key of the segmentation mask in the xarray object to be merged.

  • labels (Optional[Union[str, List[str]]]) – Optional. The labels corresponding to each segmentation mask in the array. If provided, the number of labels must match the number of arrays.

  • threshold (float) – Optional. The threshold value for merging cells. Default is 1.0.

  • handle_disconnected (str) – Optional. The method to handle disconnected cells. Default is “relabel”.

  • key_base_segmentation (str) – Optional. The key of the base segmentation mask in the xarray object to merge to.

  • key_added (str) – Optional. The key under which the merged segmentation mask will be stored in the xarray object. Default is “_segmentation”.

Returns

The xarray object with the merged segmentation mask.

Return type

obj (xarray.Dataset)

Raises
  • AssertionError – If no segmentation mask is found in the xarray object.

  • AssertionError – If the input array is not 2D or 3D.

  • AssertionError – If the input array is not of type int.

  • AssertionError – If the shape of the input array does not match the shape of the segmentation mask.

Notes

  • If the input array is 2D, it will be expanded to 3D.

  • If labels are provided, they need to match the number of arrays.

  • The merging process starts with merging the biggest cells first, then the smaller ones.

  • Disconnected cells in the input are handled based on the specified method.

normalize()

Performs a percentile normalization on each channel.

Returns

The image container with the normalized image stored in Layers.PLOT.

Return type

xr.Dataset

rescale(scale: int)

Rescales the image and segmentation mask in the object by a given scale.

Parameters: - scale (int): The scale factor by which to rescale the image and segmentation mask.

Returns: - xr.Dataset: The rescaled object containing the updated image and segmentation mask.

Raises: - AssertionError: If no image layer is found in the object. - AssertionError: If no segmentation mask is found in the object.

threshold(quantile: Optional[float] = None, intensity: Optional[int] = None, key_added: Optional[str] = None)

Apply thresholding to the image layer of the object.

Parameters: - quantile (float): The quantile value used for thresholding. If provided, the pixels below this quantile will be set to 0. - intensity (int): The absolute intensity value used for thresholding. If provided, the pixels below this intensity will be set to 0. - key_added (Optional[str]): The name of the new image layer after thresholding. If not provided, the original image layer will be replaced.

Returns: - xr.Dataset: The object with the thresholding applied to the image layer.

Raises: - ValueError: If both quantile and intensity are None or if both quantile and intensity are provided.

transform_expression_matrix(method: str = 'arcsinh', key: str = '_intensity', key_added: str = '_intensity', cofactor: float = 5.0, min_percentile: float = 1.0, max_percentile: float = 99.0)

Transforms the expression matrix based on the specified mode.

Parameters
  • method (str) – The transformation method. Available options are “arcsinh”, “zscore”, “minmax”, “double_zscore”, and “clip”.

  • key (str) – The key of the expression matrix in the object.

  • key_added (str) – The key to assign to the transformed matrix in the object.

  • cofactor (float) – The cofactor to use for the “arcsinh” transformation.

  • min_percentile (float) – The minimum percentile value to use for the “clip” transformation.

  • max_percentile (float) – The maximum percentile value to use for the “clip” transformation.

Returns

The object with the transformed matrix added.

Return type

xr.Dataset

Raises
  • ValueError – If an unknown transformation mode is specified.

  • AssertionError – If no expression matrix is found at the specified layer.