Scrattch.hicat tutorial


I followed the scrattch.hicat tutorial found here ( with my scRNAseq dataset (instead of the tasic et al data) and the results says that my data has 6 clusters denoted by 6 different numbers (not 1-6). I looked up these numbers in the primary cell type id column of the Tasic, et al annotations file and identified the corresponding primary cell type. I thought, perhaps these cluster numbers are correlated to the Allen Brain cell types and these are my 6 cell types in my data! Then, I looked up these numbers in the Yao et al MOp mouse cortex data set annotations file, which revealed 6 different cell types. What does the user gain from these 6 cluster numbers if they are not correlated to an Allen Brain cell type? How does the user get to identifying “Vip Lmo1” or “Sst Cbln4” type of nomenclature for these 6 clusters? I do not see any output that gives the top 2 marker genes to be used for this naming convention. If it is possible I would like to classify my cell types the same way as the Allen Brain does so that way these cell types are comparable with the classified cell types published by Allen Brain. I completed the pipeline to get my 6 clusters by the same pipeline the Allen Brain folks use (scrattch.hicat) but I need help figuring out how I should name these 6 clusters to obtain nomenclature compatible with “Pvalb Tpbg”, etc.

Thank you,
Eden Hornung

The clustering pipeline does not automatically produce any cluster labels. This is true for almost all existing tools. Our labels were generated based on manual curation of markers based on differential gene expression analysis. If you just want to map your cells to our reference, the easiest approach maybe is to use Seurat label transfer function. On the other hand, your dataset seems to include only a subset of clusters present in our taxonomy. Seurat sometimes have issue when there is a big difference in cell type composition between the query and reference datasets. If you have issue with that, you can try “map_sampling” function in scrattch.hicat package, with option method=”mean”. It requires both train and testing data matrix to be transformed by logCPM, clustering labels for the training dataset, and marker genes.

The cluster membership can be found in the supplementary table 10 of 2018 Tasic paper.