Hi everyone,
I’ve been working with the Allen reference dataset (Whole Cortex & Hippocampus - 10x Genomics (2020) with 10x-SMART-seq taxonomy [2021]) and using MapMyCells (MMC) to classify both my own snRNA-seq samples and, for consistency, the reference data itself.
Just to clarify: everything described here refers only to the reference dataset, no other samples are included in these steps or results.
Before running MMC on the reference data, I performed downsampling with the following filtering steps: I isolated cortical cells, filtered for GABAergic types, and separated SST and PV cells using string pattern matching based on “cell_type_alias_label” and related fields.
This already raised a first question: I ended up with almost twice as many SST-labeled cells as PV (43,681 SST vs. 28,971 PV), which surprised me. Based on the mouse lines and methods used in the original dataset, I wasn’t expecting such a strong SST enrichment. Does anyone have thoughts on this?
Continuing with the reference dataset: when running MMC on the SST group, I calculated the mismatch rate as described in the documentation and found only ~2% mismatch at the subclass or supertype level; so far, so good. However, I noticed this because I was trying to match MMC output labels back to the original Allen taxonomy, as used on the transcriptomics web portal, in order to later apply that structure when comparing to my own samples.
I was specifically trying to compare the annotations of the same cells before and after running MMC, in order to understand how MMC reassigns their identity and how that aligns with the original taxonomy.
Here’s where things got a bit confusing (again, still working only with the reference dataset after MMC mapping):
Some cells labeled as “100_Sst” in “cell_type_alias_label” are reassigned by MMC to multiple subclass names, including: “053 Sst Gaba”, “056 Sst Chodl Gaba”, “052 Pvalb Gaba”. In more extreme cases, cells labeled “74_Sst” are reassigned to as many as seven different subclass names, including PV and non-cortical types.
I’d appreciate any advice or clarification on: whether there’s a recommended approach to align MMC subclass labels with the original Allen taxonomy, or whether these should be treated as distinct classification systems, and direct mapping is not expected
Thanks in advance! It’s been great to use MMC, and I’d really appreciate any insights that could help me interpret these results more confidently.
Best,
Rebeca