Spatial Merfish data of ABC atlas

I am working with the MERFISH dataset (C57BL6J-638850 imputed gene) from the Allen Brain Cell Atlas and have a question regarding how to correctly link spatial and gene expression data.

I understand that:

  • The imputed expression h5ad file (e.g., MERFISH-C57BL6J-638850-imputed-log2.h5ad) contains gene expression values for ~8,460 genes but does not appear to include spatial coordinates or full cell metadata.

  • The original MERFISH h5ad files (e.g., C57BL6J-638850-log2.h5ad) seem to contain spatial information such as x, y, z coordinates.

However, I am unsure how to correctly connect these datasets.

Specifically, I would like to:

  1. Identify cells belonging to a specific brain region (e.g., olfactory tubercle, OT)

  2. Extract gene expression (e.g., Oprm1) for those cells from the imputed dataset

My main question is:
What is the correct way to map between the imputed expression data and the spatial/metadata information?

In particular:

  • Which identifier should be used to link cells across datasets (cell_label vs abc_sample_id)?

  • Is there an official mapping table (e.g., cell_metadata) that should be used for this purpose?

  • What is the recommended workflow to combine spatial location, cell type annotations, and imputed gene expression?

Any clarification or recommended workflow would be greatly appreciated.

Thank you very much for your help.

Hi @Sora

cell_label is the unique identifier used for cells in the abc_atlas_access datasets. abc_sample_id only exists to link cells in abc_atlas_access back to cells downloaded from the ABC Atlas web app using the cell selection tool tip (though, both cell_label and abc_sample_id are valid unique identifiers for the same cells, so technically either should work).

I would recommend linking on cell_label

Generally speaking: cell_label is a unique identifier for all cells across all datasets in abc_atlas_access, so any time cell_label matches between two CSV files, you can be confident that the two CSVs are referring to the same cell. That being said, the cell_metadata table under MERFISH-C57BL6J-638850 is probably what you want. Because the imputed gene dataset was just a new set of gene expression values associated with the cells already released under MERFISH-C57BL6J-638850, we did not bother putting out a cell_metadata table for the imputed gene dataset (the idea is that users would just be able to use the cell_metadata table associated with MERFISH-C57BL6J-638850).

Regarding cell type annotations, you will see in the cell_metadata table that each cell has been assigned a cluster_alias. This is the link between the cell and our Whole Mouse Brain cell type taxonomy. This jupyter notbook describes the data model for our cell type taxonomies and how to link them to single cell metadata tables (the notebook uses some of our 10X data, however, the principle is still the same: cells are linked to the taxonomy via their assignment to a cluster_alias).

Please let us know if any of this is unclear.

Thank you very much for your detailed and clear explanation. I’ll try this out.