Cell typing different based on clustering

Okay. So they both ran on the same version of the code (1.5.1). That’s good.

At this point, without seeing your data, I can only suggest things you might want to examine to try and diagnose why the mappings are different.

  1. You can look at the distributions of the quality metrics in the two runs. For each cell at each level of the taxonomy, you will find in the JSON output an avg_correlation, which is the correlation coefficient between the cell and the cell type to which it was assigned, and a aggregate_probability which is a measure of the confidence that MapMyCells has in that cell type assignment (you can see this Jupyter notebook for a walk through of the contents of the JSON output file). I would look at the distribution of those two statistics in your two mapping results to see if either or both of them is much lower in the 22 cluster mapping. That wouldn’t tell you why, but it would tell you if MapMyCells is just less confident in the mappings it performed on that data than in the mappings it performed on the 16 cluster data.

  2. Full disclosure, I did my PhD in physics and don’t entirely know what you mean when you say the two datasets were run at different resolutions. Is this a question of the gene panels used in the two dataset, or how the spot images (?) were processed? If it is a difference in the gene panels, you can look at the marker_genes entry in the JSON output file. This will tell you which marker genes were actually used for your mapping run. Maybe the 22 cluster data had fewer marker genes in it for some reason.

  3. Absent any of that, it would be interesting to look at the difference in the two input datasets. Do the distributions in

    • counts per cell
    • non-zero genes per cell
    • variance in counts across genes per cell

differ significantly between the two datasets? Maybe there is just a weaker signal in the 22 cluster dataset.