Interoparability
This notebook shows some way that you can import and export from spatialproteomics
.
[4]:
%reload_ext autoreload
%autoreload 2
import spatialproteomics as sp
import pandas as pd
import xarray as xr
import os
import shutil
import anndata
xr.set_options(display_style='text')
[4]:
<xarray.core.options.set_options at 0x7f265b472320>
Importing Data
In the example workflow, you have already seen how to read data from a tiff file. If you already have your data in spatialdata
format, you can also read it in from there.
[5]:
ds = sp.read_from_spatialdata('../../data/spatialdata_example.zarr', image_key='raccoon')
ds
root_attr: multiscales
root_attr: omero
datasets [{'coordinateTransformations': [{'scale': [1.0, 1.0, 1.0], 'type': 'scale'}], 'path': '0'}]
resolution: 0
- shape ('c', 'y', 'x') = (3, 768, 1024)
- chunks = ['3', '768', '1024']
- dtype = uint8
root_attr: multiscales
root_attr: omero
[5]:
<xarray.Dataset> Dimensions: (channels: 3, y: 768, x: 1024, cells: 70, features: 2) Coordinates: * channels (channels) int64 0 1 2 * y (y) int64 0 1 2 3 4 5 6 7 ... 760 761 762 763 764 765 766 767 * x (x) int64 0 1 2 3 4 5 6 ... 1018 1019 1020 1021 1022 1023 * cells (cells) int64 1 2 3 4 5 6 7 8 9 ... 63 64 65 66 67 68 69 70 * features (features) <U10 'centroid-0' 'centroid-1' Data variables: _image (channels, y, x) uint8 dask.array<chunksize=(3, 768, 1024), meta=np.ndarray> _segmentation (y, x) int64 0 0 0 0 0 0 0 0 0 ... 69 69 69 69 69 69 69 69 69 _obs (cells, features) float64 44.79 402.5 46.1 ... 736.5 890.5
Exporting Data
Once you are happy with your analysis, you will likely want to export the results. The easiest way to do this is by using the zarr
format, but csv
, anndata
, and spatialdata
are also supported.
[3]:
# loading a test file which we will export later
# notice how easy it is to load the file from a zarr using xarray
ds = xr.load_dataset('../../data/tmp.zarr')
ds
/home/meyerben/meyerben/.conda/envs/spatialproteomics_env/lib/python3.9/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'netcdf4' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/home/meyerben/meyerben/.conda/envs/spatialproteomics_env/lib/python3.9/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'scipy' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
[3]:
<xarray.Dataset> Dimensions: (cells: 12560, channels: 56, y: 3000, x: 3000, labels: 9, props: 2, features: 7) Coordinates: * cells (cells) int64 1 2 3 4 ... 12558 12559 12560 * channels (channels) <U11 'DAPI' 'Helios' ... 'Ki-67' * features (features) <U14 'CD3_binarized' ... 'cent... * labels (labels) int64 1 2 3 4 5 6 7 8 9 * props (props) <U6 '_color' '_name' * x (x) int64 0 1 2 3 4 ... 2996 2997 2998 2999 * y (y) int64 0 1 2 3 4 ... 2996 2997 2998 2999 Data variables: _arcsinh_mean (cells, channels) float64 3.111 ... 0.4174 _arcsinh_sum (cells, channels) float64 8.346 ... 5.224 _image (channels, y, x) uint8 4 4 4 4 5 ... 2 2 2 2 _labels (labels, props) object '#C8A1A1' 'B' ... 'T' _obs (cells, features) float64 1.0 ... 2.237e+03 _percentage_positive_intensity (cells, channels) float64 1.0 0.0 ... 1.0 _raw_mean (cells, channels) float64 56.02 ... 2.148 _raw_sum (cells, channels) float64 1.053e+04 ... 4... _segmentation (y, x) int64 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
Exporting to Zarr
This is the easiest file format to work with. It allows you to store and load the xarray objects with a single line of code.
In case there are issues with simply running ds.to_zarr("your_path.zarr")
, you might need to parse the dtypes correctly. This is a known issue with xarray and will hopefully be fixed soon.
[8]:
# parsing as unicode (only necessary if ds.to_zarr() does not work out of the box)
for v in list(ds.coords.keys()):
if ds.coords[v].dtype == object:
ds.coords[v] = ds.coords[v].astype("unicode")
for v in list(ds.variables.keys()):
if ds[v].dtype == object:
ds[v] = ds[v].astype("unicode")
[9]:
zarr_path = "tmp.zarr"
# removing the zarr if it exists
if os.path.exists(zarr_path):
shutil.rmtree(zarr_path)
# exporting as zarr
ds.to_zarr("tmp.zarr")
[9]:
<xarray.backends.zarr.ZarrStore at 0x7ffcc2566440>
Exporting Tables to CSV
Let’s say you want to export some tables as csvs. This can be done with pandas.
[26]:
df = ds.pp.get_layer_as_df("_arcsinh_mean")
df.head()
[26]:
DAPI | Helios | CD10 | TCF7/TCF1 | PD-L1 | BCL-6 | FOXP3 | CD69 | Perforin | CD19 | ... | CD68 | CD31 | CD45 | CD3 | Cytokeratin | CD45RO | CD8 | Granyzme B | CD79a | Ki-67 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3.111332 | 0.0 | 1.391040 | 1.532299 | 1.700792 | 0.0 | 0.0 | 0.000000 | 1.026824 | 0.029783 | ... | 0.345229 | 0.000000 | 2.018150 | 2.460342 | 0.595998 | 1.719421 | 0.714288 | 0.428276 | 0.528275 | 0.458260 |
2 | 2.804985 | 0.0 | 1.168321 | 0.000000 | 1.395341 | 0.0 | 0.0 | 0.000000 | 0.847262 | 0.002073 | ... | 1.559274 | 0.000000 | 1.294762 | 0.303109 | 0.642876 | 1.328594 | 0.799208 | 2.029083 | 0.426344 | 0.528429 |
3 | 3.380220 | 0.0 | 1.733945 | 0.666575 | 2.020150 | 0.0 | 0.0 | 0.066995 | 1.397469 | 0.013636 | ... | 0.822320 | 0.000000 | 1.412199 | 2.153628 | 0.763425 | 2.767838 | 1.036900 | 0.571746 | 0.727335 | 0.497415 |
4 | 2.987283 | 0.0 | 1.297533 | 0.607904 | 1.572571 | 0.0 | 0.0 | 0.003597 | 0.960472 | 0.004317 | ... | 0.297740 | 0.000000 | 1.242867 | 2.149749 | 0.583574 | 2.473159 | 0.804046 | 0.425201 | 0.427177 | 0.436378 |
5 | 3.120023 | 0.0 | 1.542808 | 0.000000 | 1.928561 | 0.0 | 0.0 | 0.155537 | 1.463069 | 0.010959 | ... | 0.872304 | 0.079369 | 1.005996 | 0.212105 | 0.894870 | 2.299642 | 0.743329 | 0.518868 | 1.011288 | 0.488958 |
5 rows × 56 columns
[28]:
# exporting as csv
df.to_csv("tmp.csv")
Exporting to AnnData
AnnData is a format used by scanpy, which can be useful to create interesting plots and downstream analyses. For this reason, you can export the xarray object as an AnnData object. Note that this object will only store the tabular data, but not the image or the segmentation layer.
[43]:
# putting the expression matrix into an anndata object
adata = ds.ext.convert_to_anndata(expression_matrix_key="_arcsinh_mean",
additional_layers={"arcsinh_sum": "_arcsinh_sum", "raw_mean": "_raw_mean", "raw_sum": "_raw_sum"},
additional_uns={"label_colors": "_labels"})
adata
[43]:
AnnData object with n_obs × n_vars = 12560 × 56
obs: 'centroid-0', 'centroid-1', '_labels', '_original_'
uns: 'label_colors'
layers: 'arcsinh_sum', 'raw_mean', 'raw_sum'
[46]:
# writing to disk as hdf5
adata.write('tmp.h5ad')
Exporting to SpatialData
SpatialData is a data format which is commonly used for spatial omics analysis and combines the power of zarr with anndata. You can export to this data format as well.
[54]:
spatialdata_object = ds.ext.convert_to_spatialdata(expression_matrix_key="_arcsinh_mean")
spatialdata_object
INFO Transposing `data` of type: <class 'dask.array.core.Array'> to ('c', 'y', 'x').
INFO Transposing `data` of type: <class 'dask.array.core.Array'> to ('y', 'x').
[54]:
SpatialData object with:
├── Images
│ └── 'image': SpatialImage[cyx] (56, 3000, 3000)
├── Labels
│ └── 'segmentation': SpatialImage[yx] (3000, 3000)
└── Table
└── AnnData object with n_obs × n_vars = 12560 × 56
obs: 'id', 'region'
uns: 'spatialdata_attrs': AnnData (12560, 56)
with coordinate systems:
▸ 'global', with elements:
image (Images), segmentation (Labels)
[56]:
# storing as zarr file
spatialdata_object.write("tmp.zarr")
root_attr: channels_metadata
root_attr: multiscales
datasets [{'coordinateTransformations': [{'scale': [1.0, 1.0, 1.0], 'type': 'scale'}], 'path': '0'}]
resolution: 0
- shape ('c', 'y', 'x') = (56, 3000, 3000)
- chunks = ['56', '1548 (+ 1452)', '1548 (+ 1452)']
- dtype = uint8
root_attr: image-label
root_attr: multiscales
no parent found for <ome_zarr.reader.Label object at 0x7fffc1abe670>: None
root_attr: image-label
root_attr: multiscales
datasets [{'coordinateTransformations': [{'scale': [1.0, 1.0], 'type': 'scale'}], 'path': '0'}]
resolution: 0
- shape ('y', 'x') = (3000, 3000)
- chunks = ['3000', '3000']
- dtype = int64