Mapmycells reliability

Sim · January 20, 2026, 11:16am

Hi,

Thank you for the brilliant tool with mapmycells! I was wondering what the recommendations are for ensuring reliability?

I have been looking at the probability etc and have found that expectedly cells with a lower probability of being correct are associated with worse QC parameters in general.

From looking at publications which have utilised the tool I cannot find anything in terms of how authors decide to accept the allocations based on these parameters or others?

It would be great to hear from both the developers and any other users

Thanks!

danielsf · January 20, 2026, 10:15pm

Hi @Sim

I am one of the developers who worked on MapMyCells (as opposed to someone who has published a scientific paper based on results from MapMyCells; just setting expectations here).

The question “how do I know which cell type labels to trust?” is an open one, from my perspective. I can give you a qualitative answer. Unfortunately, I do not know that I can give you a very quantitative answer.

The executive summary is that you should be performing a quality cut on the bootstrapping_probability metric, not the average_correlation metric. The figure below illustrates my point.

I took a sample of 800,000 cells from the Whole Mouse Brain data used to define our Whole Mouse Brain taxonomy (this data is available for download via the abc_atlas_access tool). I ran it through MapMyCells. Because this is the data used to define the taxonomy, it comes annotated with “ground truth” cell labels derived by the original, taxonomy-defining analysis. I compared the outputs of MapMyCells with these ground truth labels. Specifically, I compared the distribution of average correlation and bootstrapping probability between two populations: correctly labeled cells and incorrectly labeled cells. This is what I saw.

As you can see, the distributions of average correlation for correctly and incorrectly labeled cells appear very similar. This means that any cut on average correlation that you make will discard a comparable number of incorrectly and correctly labeled cells. The distributions of bootstrapping_probability, however, are different enough that you can get rid of roughly half the incorrectly labeled cells while only sacrificing 10-15% of correctly labeled cells.

Another factor that comes into this question is gene panel. MapMyCells works using a pre-defined lookup table of marker genes. What happens if you do not have all of those genes. The second row of the figure takes my test data and intentionally downsamples it so that it only contains 1000 of the ~ 6000 expected marker genes. You can see that, qualitatively, the relative shapes of the distributions remain unchanged.

I wish I could say that there was a direct quantitative interpretation of the quality metrics (i.e. “40% of cells with bootstrapping_probability == 0.6 are wrong”). So far, we haven’t been able to find such a direct interpretation.

If you need to quantify your accuracy more directly, you could try downloading the original data used to define whatever taxonomy you are working with using abc_atlas_access, reshaping the data to “look like” yours (similar gene panel; similar noise characteristics if you feel ambitious), map the reshaped data with MapMyCells and do an analysis similar to what I did above (comparing MapMyCells with the ground truth annotations from the downloaded data).

Please reach out if you have any more questions or anything is unclear. As I said: this is an open question in which we are very interested.

Topic		Replies	Views
MapMyCells cell type assignment question Technical	3	49	January 5, 2026
Has MMC changed? Completely different results on identical datasets MapMyCells	6	157	April 3, 2025
Regional cell taxonomy Cell Taxonomies atlas-cell-types , atlas-mouse-brain-adult , analysis	3	130	October 15, 2024
Cell typing different based on clustering Technical celltype	4	134	April 10, 2025
Low diversity in mapped results MapMyCells analysis	15	129	May 28, 2025

Mapmycells reliability

Related topics