Map cells to their cell type

Hi everyone!

I have downloaded the [WMB-10Xv2] expression matrices and also matched them to their corresponding meta data mathcing by cell barcodes in the obs table of h5ad file and csv file of metadata.

The last column of metadata table is cluster_alias and I was presuming that each id in this column can be matched to one of the ~5200(updated to 5322) clusters defined for mouse brain clusters defined in [Yao et. al].
So, the cells could be mathced to their clusters, classes, sub-classes with different levels of granuality using cluster_alias as the mathcing id.
while merging two tables (cell_meta_data and ‘cluster_annotatin’ sheet of this xlsx file), I noticed that the cluster_alias column does not include values ranging from 1 to ~5200 but has some values arouund 14000!
Accordingly, I’m posting this here to ask how can I map each cell of each of WMB-10X databaes to its cell type with different levels of granuality?

@jeremyinseattle @tylermo
Thanks in advance

Hey Mostafa,

It looks like the column you want to join on is not cluster_id in that file but cl which will should map on to the values you’re observing in cluster_alias. Unfortunately I don’t know myself why those values are so large when compared to the number of clusters.

We have a few python/jupyter notebook tutorials available that show how to merge the cell and cluster data. You can check out this one: 10x RNA-seq clustering analysis and annotation (CCN20230722) — Allen Brain Cell Atlas - Data Access for more information on the cluster dataset and this one: 10x RNA-seq gene expression data (part 1) — Allen Brain Cell Atlas - Data Access which is the first in a set of two that uses the merged cell and cluster data. Look for cell_extended in the previously linked, “part 1” notebook.

Please reach out if you run into more issues.

Good luck!
Chris

1 Like

Additionally, there is a tutorial to help get you started using the cache object that those two notebooks use here: Getting started — Allen Brain Cell Atlas - Data Access

The full set notebooks that you can run for yourself can be found is this repo here: abc_atlas_access/notebooks at main · AllenInstitute/abc_atlas_access · GitHub

1 Like

I am actually having a similar issue to the gentleman above. I’ve downloaded the x10v3 dataset from
https://allen-brain-cell-atlas.s3-us-west-2.amazonaws.comexpression_matrices/WMB-10Xv3/20230630/WMB-10Xv3-CB-raw.h5ad
(I’ve pulled down and stacked all of individual h5ads)

I’ve looked around for the relevant metadata on the S3 bucket. There is a metadata file annotated WMB-10x/20230630, which is the same nomenclature given to the expression matrices above, and the cell labels appear to match as I was able to join the dfs, however, 2 issues:

  1. The “cluster alias” column in this metadata has integers not present either in the cl column of the cluster annotation sheet, nor the cluster id column of the cluster annotation sheet.
  2. The metadata csv in general has a column labeled “matrx label”, where it indicates that it corresponds to WMB-10xV2-HPF. Is the cause of the issue the fact that the V2 and V3 expression matrices are different? If so, where would I look for metadata describing the V3 matrix?

I would suggest checking out the tutorial links I posted above as well. They should help you navigate the data. Specifically using the python cache object for consistent versioning.

While the version you found for the metadata matches the version on the h5ad file, there has been an updated release of the metadata. I would recommend using the cache object from the Getting started — Allen Brain Cell Atlas - Data Access notebook to get a consistent release version. That might fix both your issues. Note that the cell/cluster metadata is for the complete set of h5ad files in the WMB data across several assays.