Issue matching cells with their metadata from SEA-AD spatial transcriptomics datasets

I hope this message finds you well. I am reaching out to bring to your attention a potential discrepancy we have observed related to the SEA-AD MERFISH spatial transcriptomics datasets. Specifically, it is not clear to us how to identify cells between the H5AD object which summarizes the count matrix and cell-level metadata with the individual tissue-sample files which indicate detected transcripts for each molecule from the MERFISH data from individual tissue samples.

Specifically, in the uploaded files with cellpose-based detected transcripts for each tissue sample, such as (AWS S3 Explorer), we noticed that there is a column containing the cell_id for each cell that is identified via an integer (e.g., 1117161400100099968).

However, in the uploaded H5AD file (AWS S3 Explorer), the metadata slot ($obs) contains no column that refers to the cell_id. Instead, there is a column named sample_id with entries denoting cells represented usingnucleotide barcodes (e.g.,
TGTAAAGCACATTAAC-L8XR_210805_01_H09-1124629228). This is confusing to us as our expectation is that the cell_ids for the MERFISH data would be in some numeric format (as in the cellpose-detected_transcripts.csv file) and not a barcode format. Is it possible that the uploaded MERFISH metadata file corresponds to a 10x-based metadata file?

Can you please let us know if there is a different file or system we should be using to link the information on MERFISH based individual cells and transcripts from the uploaded tissue sample files with the metadata on these same cells uploaded from the .h5ad file?

Thanks!

Hi @tson thanks for your interest in the data.

You’re right- that h5ad file is the single nucleus data- the MERSCOPE spatial data is currently located here. These locations are confusing and we’ll be changing them when we update these files soon.

However, our current pipeline for this data doesn’t track the cell ids from the segmentation results through to the aggregated anndata object. We may be able to re-link these IDs before we update the h5ad file- I’ll post to this thread when that update happens.

Thank you so much for your reply and for sending the MERSCOPE spatial data. I have started to work with the dataset but there are some issues that I am encountering. In particular, I am unable to reproduce the plots showing the spatial coordinates for cells colored by cell type for each tissue sample as shown by the SEA-AD Brain Cell Atlas (Brain Knowledge Platform).

I have attempted to plot spatial coordinates for cells using different variables from $obsm of the H5AD file including: X_spatial_raw, X_spatial_tiled, X_umap, spatial, X_selected_cell_spatial_tiled. I have attached the plots that are returned upon plotting them.

Additionally, I have attached what individual tissue samples of the X_spatial_raw column in $obsm look like.

All of the tissue samples seem to have a triangular shape rather than the rectangular slices that are shown in the SEA-AD Brain Cell Atlas.This is confusing to us as our expectation is that one of the columns from $obsm object would be identical, if not similar, to the SEA-AD Brain Cell Atlas.

Can you please let us know if there is a different column of information related to spatial positions from the H5AD file that we should be using, that more closely corresponds to the data shown in the SEA-AD Brain Cell Atlas?

Thanks!

Hi @tson,
The rectangular blocks in the SEA-AD web product are the subset of cells cells that we used for analysis based on selecting a rectangular region spanning pia to white matter in each section. This is encoded in the selected_cells column in .obs, so once you subset to that, your selected_cells_spatial_tiled should look very similar to the web product view.