Hi @Sim
I am one of the developers who worked on MapMyCells (as opposed to someone who has published a scientific paper based on results from MapMyCells; just setting expectations here).
The question “how do I know which cell type labels to trust?” is an open one, from my perspective. I can give you a qualitative answer. Unfortunately, I do not know that I can give you a very quantitative answer.
The executive summary is that you should be performing a quality cut on the bootstrapping_probability metric, not the average_correlation metric. The figure below illustrates my point.
I took a sample of 800,000 cells from the Whole Mouse Brain data used to define our Whole Mouse Brain taxonomy (this data is available for download via the abc_atlas_access tool). I ran it through MapMyCells. Because this is the data used to define the taxonomy, it comes annotated with “ground truth” cell labels derived by the original, taxonomy-defining analysis. I compared the outputs of MapMyCells with these ground truth labels. Specifically, I compared the distribution of average correlation and bootstrapping probability between two populations: correctly labeled cells and incorrectly labeled cells. This is what I saw.
As you can see, the distributions of average correlation for correctly and incorrectly labeled cells appear very similar. This means that any cut on average correlation that you make will discard a comparable number of incorrectly and correctly labeled cells. The distributions of bootstrapping_probability, however, are different enough that you can get rid of roughly half the incorrectly labeled cells while only sacrificing 10-15% of correctly labeled cells.
Another factor that comes into this question is gene panel. MapMyCells works using a pre-defined lookup table of marker genes. What happens if you do not have all of those genes. The second row of the figure takes my test data and intentionally downsamples it so that it only contains 1000 of the ~ 6000 expected marker genes. You can see that, qualitatively, the relative shapes of the distributions remain unchanged.
I wish I could say that there was a direct quantitative interpretation of the quality metrics (i.e. “40% of cells with bootstrapping_probability == 0.6 are wrong”). So far, we haven’t been able to find such a direct interpretation.
If you need to quantify your accuracy more directly, you could try downloading the original data used to define whatever taxonomy you are working with using abc_atlas_access, reshaping the data to “look like” yours (similar gene panel; similar noise characteristics if you feel ambitious), map the reshaped data with MapMyCells and do an analysis similar to what I did above (comparing MapMyCells with the ground truth annotations from the downloaded data).
Please reach out if you have any more questions or anything is unclear. As I said: this is an open question in which we are very interested.