I am using the “WHOLE CORTEX & HIPPOCAMPUS - 10X GENOMICS (2020) WITH 10X-SMART-SEQ TAXONOMY (2020)” dataset, and I’m wanting to look at how certain genes are expressed in different clusters, regions, and subclasses, and their co-expression.
In the full dataset, there is a
matrix.csv file which contains raw UMI counts for all cells and all genes, and there is a
trimmed_means.csv file which shows normalized mean expression of all genes in all 378 clusters. I would like to look at normalized mean expression and variance (e.g., standard deviation, standard error of the mean) for not just clusters, but also subclasses and regions.
Looking at the documentation and the Transcriptomics Explorer, it seems that the
trimmed_means.csv file is calculated by taking all cells in a cluster, and for each gene, removing the top 25% and bottom 25% of the data, and taking the log2(CPM + 1) of the data, and I assume taking the mean of these values?
I wish there was a
normalized_matrix.csv file, but if I’m going to have to reverse engineer their
trimmed_means.csv process, any help is appreciated.
EDIT: I was able to get closer to the values in
trimmed_means.csv by taking the log2(CPM + 1) of all cells, then excluding the top and bottom 25%, and taking the mean of the remaining values. As an example, for a single cluster and a single gene, the
trimmed_means.csv value is 6.651905, and I’m getting 6.652420762158425.