I am using the “WHOLE CORTEX & HIPPOCAMPUS - 10X GENOMICS (2020) WITH 10X-SMART-SEQ TAXONOMY (2020)” dataset, and I’m wanting to look at how certain genes are expressed in different clusters, regions, and subclasses, and their co-expression.

In the full dataset, there is a `matrix.csv`

file which contains raw UMI counts for all cells and all genes, and there is a `trimmed_means.csv`

file which shows normalized mean expression of all genes in all 378 clusters. I would like to look at normalized mean expression and variance (e.g., standard deviation, standard error of the mean) for not just clusters, but also subclasses and regions.

Looking at the documentation and the Transcriptomics Explorer, it seems that the `trimmed_means.csv`

file is calculated by taking all cells in a cluster, and for each gene, removing the top 25% and bottom 25% of the data, and taking the log2(CPM + 1) of the data, and I assume taking the mean of these values?

I wish there was a `normalized_matrix.csv`

file, but if I’m going to have to reverse engineer their `trimmed_means.csv`

process, any help is appreciated.

EDIT: I was able to get closer to the values in `trimmed_means.csv`

by taking the log2(CPM + 1) of all cells, then excluding the top and bottom 25%, and taking the mean of the remaining values. As an example, for a single cluster and a single gene, the `trimmed_means.csv`

value is 6.651905, and I’m getting 6.652420762158425.