Issue matching cells with their metadata from SEA-AD spatial transcriptomics datasets

tson · January 25, 2024, 9:22pm

I hope this message finds you well. I am reaching out to bring to your attention a potential discrepancy we have observed related to the SEA-AD MERFISH spatial transcriptomics datasets. Specifically, it is not clear to us how to identify cells between the H5AD object which summarizes the count matrix and cell-level metadata with the individual tissue-sample files which indicate detected transcripts for each molecule from the MERFISH data from individual tissue samples.

Specifically, in the uploaded files with cellpose-based detected transcripts for each tissue sample, such as (AWS S3 Explorer), we noticed that there is a column containing the cell_id for each cell that is identified via an integer (e.g., 1117161400100099968).

However, in the uploaded H5AD file (AWS S3 Explorer), the metadata slot ($obs) contains no column that refers to the cell_id. Instead, there is a column named sample_id with entries denoting cells represented usingnucleotide barcodes (e.g.,
TGTAAAGCACATTAAC-L8XR_210805_01_H09-1124629228). This is confusing to us as our expectation is that the cell_ids for the MERFISH data would be in some numeric format (as in the cellpose-detected_transcripts.csv file) and not a barcode format. Is it possible that the uploaded MERFISH metadata file corresponds to a 10x-based metadata file?

Can you please let us know if there is a different file or system we should be using to link the information on MERFISH based individual cells and transcripts from the uploaded tissue sample files with the metadata on these same cells uploaded from the .h5ad file?

Thanks!

berl · February 2, 2024, 5:59pm

Hi @tson thanks for your interest in the data.

You’re right- that h5ad file is the single nucleus data- the MERSCOPE spatial data is currently located here. These locations are confusing and we’ll be changing them when we update these files soon.

However, our current pipeline for this data doesn’t track the cell ids from the segmentation results through to the aggregated anndata object. We may be able to re-link these IDs before we update the h5ad file- I’ll post to this thread when that update happens.

tson · March 22, 2024, 8:24pm

Thank you so much for your reply and for sending the MERSCOPE spatial data. I have started to work with the dataset but there are some issues that I am encountering. In particular, I am unable to reproduce the plots showing the spatial coordinates for cells colored by cell type for each tissue sample as shown by the SEA-AD Brain Cell Atlas (Brain Knowledge Platform).

I have attempted to plot spatial coordinates for cells using different variables from $obsm of the H5AD file including: X_spatial_raw, X_spatial_tiled, X_umap, spatial, X_selected_cell_spatial_tiled. I have attached the plots that are returned upon plotting them.

Additionally, I have attached what individual tissue samples of the X_spatial_raw column in $obsm look like.

All of the tissue samples seem to have a triangular shape rather than the rectangular slices that are shown in the SEA-AD Brain Cell Atlas.This is confusing to us as our expectation is that one of the columns from $obsm object would be identical, if not similar, to the SEA-AD Brain Cell Atlas.

Can you please let us know if there is a different column of information related to spatial positions from the H5AD file that we should be using, that more closely corresponds to the data shown in the SEA-AD Brain Cell Atlas?

Thanks!

berl · March 22, 2024, 10:54pm

Hi @tson,
The rectangular blocks in the SEA-AD web product are the subset of cells cells that we used for analysis based on selecting a rectangular region spanning pia to white matter in each section. This is encoded in the selected_cells column in .obs, so once you subset to that, your selected_cells_spatial_tiled should look very similar to the web product view.

rrbutler · April 11, 2025, 10:51pm

Hi @berl, I am also looking to subset on the squares, but in the aws h5ad file the selected_cells column does not appear to be in obs (SEAAD_MTG_MERFISH.2024-12-11.h5ad), nor is there an X_selected_cell_spatial_tiled variable in obsm. It looks like several columns in obs such as Merscope are mostly NA, suggesting I could filter by !is.na to get the cells that were selected for further analysis (i.e. in the selection squares). Is this correct?

Is there a location were there is some more description of the field information for this file? For instance I am unclear on the difference between X_spatial_raw & X_spatial_tiled.

Thanks

mfafouti · February 18, 2026, 9:07pm

hey everyone! I ran into the same problem - I figured out that the correct column to obtain the selected cells in the squares is called Depth from pia and is part of the .obs slot in SEAAD_MTG_MERFISH.2024-12-11.h5ad. Once you filter for rows that don’t have NA values (.notna()) for this column, you get the cells that are only in the rectangle.

Topic		Replies	Views
'Sample Id' of selected cells in MERFISH datasets in the online viewer don't match cell labels in h5ad data Allen Brain Cell (ABC) Atlas	2	109	October 31, 2024
MERFISH data question	2	97	October 8, 2024
Generating input for MapMyCells from Spatial Data MapMyCells	6	568	December 6, 2023
Mapmycell pipeline for user Cell Taxonomies	3	157	May 1, 2024
Indexing in the map my cells web tool Technical	2	69	February 11, 2025

Issue matching cells with their metadata from SEA-AD spatial transcriptomics datasets

Related topics