Comparing gene expression between the two datasets I noticed that the Zhuang dataset has consistently lower values. This can be noticed by looking for example at “Slc17a7” here and here. Can I join these two datasets together using some scaling factor or is this to be avoided? I am looking at a series of gene, some genes are present in both dataset and some only in one of the two dataset (either Zhuang or Zeng). What would be the best way to combine these data? Thanks!
Upon further analysis I notice that the percentage of cells expressing genes (expression >0) is also very different. Here is the expression of Htr2a in Isocortex structures.
Those two datasets were acquired on different systems in different labs so some difference is expected. The Zhuang lab also used a gene panel of over 1000 genes, which sometimes can cause lower detection efficiency. We also used different segmentation algorithms that can affect the number of genes assigned to a cell. I think it’s totally fine to integrate the two datasets using a scaling factor. You can always standardize gene expression levels to a fixed value for all cells.
As for the different number of cells expressing. In both cases, sections were taken at an interval so hitting the same exact regions. also, I noticed that you broke up that data by region, and alignment between brains is never perfect.