Cell typing different based on clustering

scaradonna · April 9, 2025, 9:10pm

I have been working with some spatial sequencing data and have the data analyzed to different resolutions where based on the UMAP analysis I have either 16 clusters of cells, or with higher resolution I have 22 clusters of cells. When I run MapMyCells on using either subset of data, I get a nice diverse cell population with the 16 cluster data, but for the 22 clusters data I receive almost all prediction for 01 IT-ET Glut besides 2 clusters. I am curious where the error could be in my data.

danielsf · April 10, 2025, 2:54am

Before suggesting anything more technical, may I ask when these two datasets were run through MapMyCells (I am asking for the specific dates of both runs; I just want to make sure that they didn’t hit different versions during an update, etc.)?

Edit: There’s actually an easier way to answer this question. The first ~ 5 lines of the CSV file you got from MapMyCells will list the version of MapMyCells that was run. The same information should be in the 'metadata' field in the JSON file.

scaradonna · April 10, 2025, 2:41pm

The mapping from the 20 clusters is from this run:

metadata = counts_10xWholeMouseBrain(CCN20230722)_HierarchicalMapping_UTC_1744231865796.json

taxonomy hierarchy = [“CCN20230722_CLAS”

readable taxonomy hierarchy = [“class”

algorithm: ‘hierarchical’; codebase: GitHub - AllenInstitute/cell_type_mapper: Repository for storing prototype functionality implementations for the BKP; version: 1.5.1

The mapping from the 16 clusters is from this run:

metadata = counts_10xWholeMouseBrain(CCN20230722)_HierarchicalMapping_UTC_1744130599119.json

taxonomy hierarchy = [“CCN20230722_CLAS”

readable taxonomy hierarchy = [“class”

algorithm: ‘hierarchical’; codebase: GitHub - AllenInstitute/cell_type_mapper: Repository for storing prototype functionality implementations for the BKP; version: 1.5.1

cell_id

Let me know if you need any additional info!

danielsf · April 10, 2025, 3:27pm

Okay. So they both ran on the same version of the code (1.5.1). That’s good.

At this point, without seeing your data, I can only suggest things you might want to examine to try and diagnose why the mappings are different.

You can look at the distributions of the quality metrics in the two runs. For each cell at each level of the taxonomy, you will find in the JSON output an avg_correlation, which is the correlation coefficient between the cell and the cell type to which it was assigned, and a aggregate_probability which is a measure of the confidence that MapMyCells has in that cell type assignment (you can see this Jupyter notebook for a walk through of the contents of the JSON output file). I would look at the distribution of those two statistics in your two mapping results to see if either or both of them is much lower in the 22 cluster mapping. That wouldn’t tell you why, but it would tell you if MapMyCells is just less confident in the mappings it performed on that data than in the mappings it performed on the 16 cluster data.
Full disclosure, I did my PhD in physics and don’t entirely know what you mean when you say the two datasets were run at different resolutions. Is this a question of the gene panels used in the two dataset, or how the spot images (?) were processed? If it is a difference in the gene panels, you can look at the marker_genes entry in the JSON output file. This will tell you which marker genes were actually used for your mapping run. Maybe the 22 cluster data had fewer marker genes in it for some reason.
Absent any of that, it would be interesting to look at the difference in the two input datasets. Do the distributions in
- counts per cell
- non-zero genes per cell
- variance in counts across genes per cell

differ significantly between the two datasets? Maybe there is just a weaker signal in the 22 cluster dataset.

danielsf · April 10, 2025, 3:33pm

I’m going to make this a separate post because it is a more concrete idea that deserves its own list of bullet points.

Get the marker genes used by MapMyCells from the JSON output. This will be a dict mapping from cell types to lists of marker genes. The entry under 'None' is the list of marker genes used to select the class for any given cell (you can see the full meaning of the marker gene table on this page).

For each of the 16 UMAP clusters in your first dataset and the 22 clusters in your second dataset, plot the average gene expression profile of the cells in each cluster in the space of the marker genes listed under marker_genes['None']. Maybe, in the 22 cluster data, each of the 22 clusters really does look similar in this marker gene space (in a way the clusters from the 16 cluster data doesn’t). This, again, would not tell you why the clusters look similar, but it would explain why MapMyCells assigned all of the cells to the same class.

Note: In the event that MapMyCells had to transform between your gene identifiers and ENSEMBL IDs, you will find the mapping that MapMyCells used under 'gene_identifier_mapping' in the JSON output file as discussed in cells [21] and [22] of this Jupyter notebook.

I hope this helps.

Topic		Replies	Views
Low diversity in mapped results MapMyCells analysis	15	62	May 28, 2025
Cell typing in limited cell types Technical transcriptomics , celltype , analysis	6	96	August 12, 2024
Regional cell taxonomy Cell Taxonomies atlas-cell-types , atlas-mouse-brain-adult , analysis	3	95	October 15, 2024
Map cells to their cell type Allen Mouse Brain Atlas celltype , how-to	4	561	September 27, 2024
Has MMC changed? Completely different results on identical datasets MapMyCells	6	92	April 3, 2025

Cell typing different based on clustering

metadata = counts_10xWholeMouseBrain(CCN20230722)_HierarchicalMapping_UTC_1744231865796.json

taxonomy hierarchy = [“CCN20230722_CLAS”

readable taxonomy hierarchy = [“class”

algorithm: ‘hierarchical’; codebase: GitHub - AllenInstitute/cell_type_mapper: Repository for storing prototype functionality implementations for the BKP; version: 1.5.1

metadata = counts_10xWholeMouseBrain(CCN20230722)_HierarchicalMapping_UTC_1744130599119.json

taxonomy hierarchy = [“CCN20230722_CLAS”

readable taxonomy hierarchy = [“class”

algorithm: ‘hierarchical’; codebase: GitHub - AllenInstitute/cell_type_mapper: Repository for storing prototype functionality implementations for the BKP; version: 1.5.1

Related topics