Hello All,
Thank you for providing the community with both the resources and tools for taxonomy assignment. Our lab is interested in developing region specific reference for our analysis purposes. I have been using the command line version of the MapMyCells and @danielsf has been very helpful in helping solve roadblocks thus far. During the process, I realized that the original taxonomy assigned to the region of interest had many subclasses that had very few cells assigned (n < 20). Closer examination of these classes revealed that they not originally belonging to the region of interest i.e. Hippocampus in this case.
Were these assigned due to the process of computational approximation or are these cells really part of the hippocampal formation ? Currently, it was suggested to implement a strategy to pool cells across the taxonomy for better marker signal detection. Before doing that, I want to confirm if these subclasses are biologically correct or if they should be filtered out. Thank you for your help!
Hi @scanchi,
I’d suggest removing any cluster that seems unreasonable and then grabbing all (or more) of the cells from clusters that seem reasonable. In this case the key question is how to define “reasonable”, which ultimately comes down to a neuroscience question. As a very rough rule of thumb (e.g., you should confirm for your use case): non-neuronal types, and to a lesser extent GABAergic types, are more likely to span regions than other neuronal types and could be retained. When in doubt, I’d recommend keeping the cluster. If the mapping is working correctly, then if you have a cell type from the wrong region, no “high quality” cells should map to it.
Best,
Jeremy
1 Like
Thanks Jeremy! This does sound like a reasonable approach and what I was leaning towards as well. Does this also imply that when we run a dataset against WMB using MapMyCells tool, we are likely to see a portion of results map to taxonomy that is likely inaccurate ? However, we can determine confidence in assignment based on number of cells called as well as perhaps mean expression values.
MapMyCells/the cell_type_mapper code reports some confidence metrics to help you judge the quality of each cell’s mapping. They are
-
avg_correlation
is the average correlation coefficient between the cell and its chosen cell type (class, subclass, cluster, etc.). (i.e. how well correlated is the cell with its chosen type in the space of marker genes)
-
bootstrapping_probability
is the fraction of the the bootstrapping iterations that actually chose the assigned cell type (i.e. when we randomly subsampled the marker genes, how frequently did the mapping result change or not).
This jupyter notebook does a deep dive on the contents of the files output by MapMyCells when run on real data and shows how the quality metrics correlate with the actual quality of mapping.
Note: since you are running the code locally, I would recommend setting type_assignment.bootstrap_factor=0.5
. I have found that this gives the bootstrapping_probability
metric more informative values than the default type_assignment.bootstrap_factor=0.9
.
1 Like