Single cell Gene Expression by Cluster - Do 0s mean no expression?

Hello,

I would like to use the WHOLE CORTEX & HIPPOCAMPUS - 10X GENOMICS (2020)
WITH 10X-SMART-SEQ TAXONOMY (2021) mouse data to study the expression of specific genes in the cell clusters as defined by Allen. I have a few questions though.

  1. In the median and in the trimmed means expression datasets, do 0s mean no expression of that gene in that specific cluster?
  2. Is it best to use the median or trimmed means expression to determine expression Vs no expression of a gene in a cluster? On the transcriptomic explorer trimmed expression is used so should I consider that more appropriate?
  3. I understand that with trimmed means I will be able to see expression without any extreme values but in that case, would I have to use a cut-off value to determine expression? I figure it’s more straightforward with medians as there are no values very close to 0, which is often the case with trimmed means.

Any suggestions would be greatly appreciated.
Thank you.

Regarding to your question

  1. Yes. O means no expressed, based on definition of trimmed means and median. However, due to dropout, it is common for lowly expressed genes to have median of 0, or in even in trimmed means (cutoff higher and lower 25%), more so in 10X than Smartseq dataset.
  2. Trimmed means is more appropriate than medians because cut off is more conservative.
  3. I would recommend that after exploring the whole dataset, subsetting the genes and cells of interest to you, and plot the whole distribution (violin plot, dot plot or show gene expression heatmap at single cell level). We hope to provide such functionalities in future releases, but unfortunately they are not available yet. The download page is here:
    Mouse Whole Cortex and Hippocampus 10x - brain-map.org
    You can download the hdf5 file, and many R/python packages allow you do operate on the matrix.
1 Like

i cool idea would be to have every piece of data posted by the Allen institute to have a quick link to the scientific publication whare the data came from. this would avoid confusion as to the procedures/analysis methods used in each dataset. I’ve noticed many of the questions asked on this fourm would be solved by having such quick links.