The preprocessing (pp
) accessor
The preprocessing accessor provides several methods to subset and process image data.
- class spatialproteomics.pp.preprocessing.PreprocessingAccessor(xarray_obj)
The image accessor enables fast indexing and preprocessing of the spatialproteomics object.
- add_channel(channels: Union[str, list], array: ndarray) Dataset
Adds channel(s) to an existing image container.
- Parameters
channels (Union[str, list]) – The name of the channel or a list of channel names to be added.
array (np.ndarray) – The numpy array representing the channel(s) to be added.
- Returns
The updated image container with added channel(s).
- Return type
xarray.Dataset
- add_feature(feature_name: str, feature_values: Union[list, ndarray])
Adds a feature to the image container.
- Parameters
feature_name (str) – The name of the feature to be added.
feature_values – The values of the feature to be added.
- Returns
The updated image container with the added feature.
- Return type
xr.Dataset
- add_labels(labels: Optional[dict] = None) Dataset
Add labels from a mapping (cell -> label) to the spatialproteomics object.
- labelsUnion[dict, None], optional
A dictionary containing cell labels as keys and corresponding labels as values. If None, a default labeling will be added. Default is None.
- xr.Dataset
The spatialproteomics object with added labels.
This method converts the input dictionary into a pandas DataFrame and then adds the labels to the object using the pp.add_labels_from_dataframe method.
- add_labels_from_dataframe(df: Optional[DataFrame] = None, cell_col: str = 'cell', label_col: str = 'label', colors: Optional[list] = None, names: Optional[list] = None) Dataset
Adds labels to the image container.
- Parameters
df (Union[pd.DataFrame, None], optional) – A dataframe with the cell and label information. If None, a default labeling will be applied.
cell_col (str, optional) – The name of the column in the dataframe representing cell coordinates. Default is “cell”.
label_col (str, optional) – The name of the column in the dataframe representing cell labels. Default is “label”.
colors (Union[list, None], optional) – A list of colors corresponding to the cell labels. If None, random colors will be assigned. Default is None.
names (Union[list, None], optional) – A list of names corresponding to the cell labels. If None, default names will be assigned. Default is None.
- Returns
The updated image container with added labels.
- Return type
xr.Dataset
- add_layer(array: ndarray, key_added: str = '_mask') Dataset
- arraynp.ndarray
The array representing the segmentation mask or layer to be added.
- key_addedstr, optional
The name of the added layer in the xarray dataset. Default is ‘_mask’.
- Returns
- Return type
The amended xarray dataset.
- Raises
AssertionError – If the array is not 2-dimensional or its shape does not match the image shape.
Notes
This method adds a layer to the xarray dataset, where the layer has the same shape as the image field. The array should be a 2-dimensional numpy array representing the segmentation mask or layer to be added. The layer is created as a DataArray with the same coordinates and dimensions as the image field. The name of the added layer in the xarray dataset can be specified using the key_added parameter. The amended xarray dataset is returned after merging the original dataset with the new layer.
- add_obs_from_dataframe(df: DataFrame) Dataset
Adds an observation table to the image container. Columns of the dataframe have to match the feature coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.
- Parameters
df (pd.DataFrame) – A dataframe with the observation values.
- Returns
The amended image container.
- Return type
xr.DataSet
- add_observations(properties: Union[str, list, tuple] = ('label', 'centroid'), layer_key: str = '_segmentation', return_xarray: bool = False) Dataset
Adds properties derived from the segmentation mask to the image container.
- Parameters
properties (Union[str, list, tuple]) – A list of properties to be added to the image container. See skimage.measure.regionprops_table for a list of available properties.
layer_key (str) – The key of the layer that contains the segmentation mask.
return_xarray (bool) – If true, the function returns an xarray.DataArray with the properties instead of adding them to the image container.
- Returns
The amended image container.
- Return type
xr.DataSet
- add_properties(array: Union[ndarray, list], prop: str = '_labels', return_xarray: bool = False) Dataset
Adds properties to the image container.
- Parameters
array (Union[np.ndarray, list]) – An array or list of properties to be added to the image container.
prop (str, optional) – The name of the property. Default is Features.LABELS.
return_xarray (bool, optional) – If True, the function returns an xarray.DataArray with the properties instead of adding them to the image container.
- Returns
The updated image container with added properties or the properties as a separate xarray.DataArray.
- Return type
xr.Dataset or xr.DataArray
- add_quantification(func: Union[str, Callable] = 'intensity_mean', key_added: str = '_intensity', return_xarray=False) Dataset
Quantify channel intensities over the segmentation mask.
- Parameters
func (Callable or str, optional) – The function used for quantification. Can either be a string to specify a function from skimage.measure.regionprops_table or a custom function. Default is ‘intensity_mean’.
key_added (str, optional) – The key under which the quantification data will be stored in the image container. Default is Layers.INTENSITY.
return_xarray (bool, optional) – If True, the function returns an xarray.DataArray with the quantification data instead of adding it to the image container.
- Returns
The updated image container with added quantification data or the quantification data as a separate xarray.DataArray.
- Return type
xr.Dataset or xr.DataArray
- add_quantification_from_dataframe(df: DataFrame, key_added: str = '_intensity') Dataset
Adds an observation table to the image container. Columns of the dataframe have to match the channel coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.
- Parameters
df (pd.DataFrame) – A dataframe with the quantification values.
key_added (str, optional) – The key under which the quantification data will be added to the image container.
- Returns
The amended image container.
- Return type
xr.DataSet
- add_segmentation(segmentation: Optional[Union[str, ndarray]] = None, reindex: bool = True, keep_labels: bool = True) Dataset
Adds a segmentation mask (_segmentation) field to the xarray dataset.
- Parameters
segmentation (str or np.ndarray) – A segmentation mask, i.e., a np.ndarray with image.shape = (x, y), that indicates the location of each cell, or a layer key.
mask_growth (int) – The number of pixels by which the segmentation mask should be grown.
reindex (bool) – If true the segmentation mask is relabeled to have continuous numbers from 1 to n.
keep_labels (bool) – When using cellpose on multiple channels, you may already get some initial celltype annotations from those. If you want to keep those annotations, set this to True. Default is True.
Returns –
-------- –
xr.Dataset – The amended xarray.
- apply(func: Callable, key: str = '_image', key_added: str = '_image', **kwargs)
Apply a function to each channel independently.
- Parameters
func (Callable) – The function to apply to the layer.
key (str) – The key of the layer to apply the function to. Default is Layers.IMAGE.
key_added (str) – The key under which the updated layer will be stored. Default is Layers.IMAGE (i. e. the original image will be overwritten).
**kwargs (dict, optional) – Additional keyword arguments to pass to the function.
- Returns
The updated image container with the applied function.
- Return type
xr.Dataset
- convert_to_8bit(key: str = '_image', key_added: str = '_image')
Convert the image to 8-bit.
- Parameters
key (str) – The key of the image layer in the object. Default is Layers.IMAGE.
key_added (str) – The key to assign to the 8-bit image in the object. Default is Layers.IMAGE, which overwrites the original image.
- Returns
The object with the image converted to 8-bit.
- Return type
xr.Dataset
- downsample(rate: int)
Downsamples the image and segmentation mask in the object by a given rate.
Parameters: - rate (int): The downsampling rate. Only every rate-th pixel will be kept.
Returns: - xr.Dataset: The downsampled object containing the updated image and segmentation mask.
Raises: - AssertionError: If no image layer is found in the object.
- drop_layers(layers: Optional[Union[str, list]] = None, keep: Optional[Union[str, list]] = None) Dataset
Drops layers from the image container. Can either drop all layers specified in layers or drop all layers but the ones specified in keep.
- Parameters
layers (Union[str, list]) – The name of the layer or a list of layer names to be dropped.
keep (Union[str, list]) – The name of the layer or a list of layer names to be kept.
- Returns
The updated image container with dropped layers.
- Return type
xr.Dataset
- filter_by_obs(col: str, func: Callable, segmentation_key: str = '_segmentation')
Filter the object by observations based on a given feature and filtering function.
- Parameters
col (str) – The name of the feature to filter by.
func (Callable) – A filtering function that takes in the values of the feature and returns a boolean array.
segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.
- Returns
The filtered object with the selected cells and updated segmentation mask.
- Return type
xr.Dataset
- Raises
AssertionError – If the feature does not exist in the object’s observations.
Notes
This method filters the object by selecting only the cells that satisfy the filtering condition.
It also updates the segmentation mask to remove cells that are not selected and relabels the remaining cells.
Example
To filter the object by the feature “area” and keep only the cells with an area greater than 70px: >>> obj = obj.pp.add_observations(‘area’).pp.filter_by_obs(‘area’, lambda x: x > 70)
- get_bbox(x_slice: slice, y_slice: slice) Dataset
Returns the bounds of the image container.
- Parameters
x_slice (slice) – The slice representing the x-coordinates for the bounding box.
y_slice (slice) – The slice representing the y-coordinates for the bounding box.
Returns –
-------- –
xarray.Dataset – The updated image container.
- get_channels(channels: Union[List[str], str]) Dataset
Retrieve the specified channels from the dataset.
Parameters: channels (Union[List[str], str]): The channels to retrieve. Can be a single channel name or a list of channel names.
Returns: xr.Dataset: The dataset containing the specified channels.
- get_disconnected_cell() int
Returns the first disconnected cell from the segmentation layer.
- Returns
The first disconnected cell from the segmentation layer.
- Return type
ndarray
- get_layer_as_df(layer: str = '_obs', celltypes_to_str: bool = True, idx_to_str: bool = False) DataFrame
Returns the specified layer as a pandas DataFrame.
- Parameters
layer (str) – The name of the layer to retrieve. Defaults to Layers.OBS.
celltypes_to_str (bool) – Whether to convert celltype labels to strings. Defaults to True.
idx_to_str (bool) – Whether to convert the index to strings. Defaults to False.
- Returns
The layer data as a DataFrame.
- Return type
pandas.DataFrame
- grow_cells(iterations: int = 2, handle_disconnected: str = 'ignore') Dataset
Grows the segmentation masks by expanding the labels in the object.
Parameters: - iterations (int): The number of iterations to grow the segmentation masks. Default is 2. - handle_disconnected (str): The mode to handle disconnected segmentation masks. Options are “ignore”, “remove”, or “fill”. Default is “ignore”.
Raises: - ValueError: If the object does not contain a segmentation mask.
Returns: - obj (xarray.Dataset): The object with the grown segmentation masks and updated observations.
- mask_cells(mask_key: str = '_mask', segmentation_key='_segmentation') Dataset
Mask cells in the segmentation mask.
- Parameters
mask_key (str) – The key of the mask to use for masking.
segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.
- Returns
The object with the masked cells in the segmentation mask.
- Return type
xr.Dataset
- mask_region(key: str = '_mask', image_key='_image', key_added='_image') Dataset
Mask a region in the image.
- Parameters
key (str) – The key of the region to mask.
image_key (str) – The key of the image layer in the object. Default is Layers.IMAGE.
key_added (str) – The key to assign to the masked image in the object. Default is Layers.IMAGE, which overwrites the original image.
- Returns
The object with the masked region in the image.
- Return type
xr.Dataset
- merge_segmentation(layer_key: str, key_added: str = '_merged_segmentation', labels: Optional[Union[str, List[str]]] = None, threshold: float = 0.8)
Merge segmentation masks. This can be done in two ways: either by merging a multi-dimensional array from the object directly, or by adding a numpy array. You can either just merge a multi-dimensional array, or merge to an existing 1D mask (e. g. a precomputed DAPI segmentation).
- Parameters
array (np.ndarray) – The array containing the segmentation masks to be merged. It can be 2D or 3D.
from_key (str) – The key of the segmentation mask in the xarray object to be merged.
labels (Optional[Union[str, List[str]]]) – Optional. The labels corresponding to each segmentation mask in the array. If provided, the number of labels must match the number of arrays.
threshold (float) – Optional. The threshold value for merging cells. Default is 1.0.
handle_disconnected (str) – Optional. The method to handle disconnected cells. Default is “relabel”.
key_base_segmentation (str) – Optional. The key of the base segmentation mask in the xarray object to merge to.
key_added (str) – Optional. The key under which the merged segmentation mask will be stored in the xarray object. Default is “_segmentation”.
- Returns
The xarray object with the merged segmentation mask.
- Return type
obj (xarray.Dataset)
- Raises
AssertionError – If no segmentation mask is found in the xarray object.
AssertionError – If the input array is not 2D or 3D.
AssertionError – If the input array is not of type int.
AssertionError – If the shape of the input array does not match the shape of the segmentation mask.
Notes
If the input array is 2D, it will be expanded to 3D.
If labels are provided, they need to match the number of arrays.
The merging process starts with merging the biggest cells first, then the smaller ones.
Disconnected cells in the input are handled based on the specified method.
- normalize()
Performs a percentile normalization on each channel.
- Returns
The image container with the normalized image stored in Layers.PLOT.
- Return type
xr.Dataset
- rescale(scale: int)
Rescales the image and segmentation mask in the object by a given scale.
Parameters: - scale (int): The scale factor by which to rescale the image and segmentation mask.
Returns: - xr.Dataset: The rescaled object containing the updated image and segmentation mask.
Raises: - AssertionError: If no image layer is found in the object. - AssertionError: If no segmentation mask is found in the object.
- threshold(quantile: Optional[float] = None, intensity: Optional[int] = None, key_added: Optional[str] = None)
Apply thresholding to the image layer of the object.
Parameters: - quantile (float): The quantile value used for thresholding. If provided, the pixels below this quantile will be set to 0. - intensity (int): The absolute intensity value used for thresholding. If provided, the pixels below this intensity will be set to 0. - key_added (Optional[str]): The name of the new image layer after thresholding. If not provided, the original image layer will be replaced.
Returns: - xr.Dataset: The object with the thresholding applied to the image layer.
Raises: - ValueError: If both quantile and intensity are None or if both quantile and intensity are provided.
- transform_expression_matrix(method: str = 'arcsinh', key: str = '_intensity', key_added: str = '_intensity', cofactor: float = 5.0, min_percentile: float = 1.0, max_percentile: float = 99.0)
Transforms the expression matrix based on the specified mode.
- Parameters
method (str) – The transformation method. Available options are “arcsinh”, “zscore”, “minmax”, “double_zscore”, and “clip”.
key (str) – The key of the expression matrix in the object.
key_added (str) – The key to assign to the transformed matrix in the object.
cofactor (float) – The cofactor to use for the “arcsinh” transformation.
min_percentile (float) – The minimum percentile value to use for the “clip” transformation.
max_percentile (float) – The maximum percentile value to use for the “clip” transformation.
- Returns
The object with the transformed matrix added.
- Return type
xr.Dataset
- Raises
ValueError – If an unknown transformation mode is specified.
AssertionError – If no expression matrix is found at the specified layer.