I have been using the Transcriptomic Explorer for Mouse brain (both 10X and SMART-seq) before the summer and recently I noticed some changes in the data visualisation in the heatmap. I noticed that many genes I search for in the 10X dataset are not showing any expression, where they mostly do in the SMART-seq dataset. In addition, some genes that show no expression in either dataset in the heatmap, do appear on the scatterplot visualisation. However, in the downloadable dataset the expression of that gene is 0.0. Finally, there are now quite a few columns (cell types) in the heatmap of the Mouse-SMART-seq that have N/A as data value, where this wasn’t case a few months ago (early June).
So I was wondering if you have adapted the data of the seqencing datasets somehow? And which data is being used to show gene expression in the scatterplot? Since certain genes are appearing on the scatterplot but have 0.0 expression in the downloadable data or in the heatmap.
You raised a few topics so I’ll try to address each. Feel free to follow-up if I’ve misunderstood something. For additional general information about the Transcriptomics Explorer, please refer to this post.
We have not intentionally changed either dataset between June and now. We did, however, change the taxonomy that’s displayed with the SMART-seq dataset. The Transcriptomics Explorer now reflects a more recent taxonomy that was generated using both the SMART-seq and 10x data.
As you note, there are many genes that were measured in the SMART-seq data that have no expression readings in the 10x data. That’s an expected outcome: gene detection in 10x is typically lower than in SMART-seq, although 10x has higher throughput. We use both methods because they complement each other.
You noted some genes are “0.0” in the heatmap but appear >0 in the scatterplot. In the heatmap for the mouse datasets, the values are median of the expression for all cells in that cell type. In the scatterplot, each dot represents one cell. There can be many cells that have expression values >0 while the median is still 0. If that answer doesn’t explain what you’re seeing, please share one or two genes and cell types where you’re observing this issue and we’ll take a closer look.
The N/A reading in the heatmap is because not all cell types were represented in that dataset. While most cell types are present in both the 10x and SMART-seq datasets, some cell types are present only in one or the other.