I am using the “WHOLE CORTEX & HIPPOCAMPUS - 10X GENOMICS (2020) WITH 10X-SMART-SEQ TAXONOMY (2020)” dataset, and I’m wanting to look at how certain genes are expressed in different clusters, regions, and subclasses, and their co-expression.
In the full dataset, there is a matrix.csv
file which contains raw UMI counts for all cells and all genes, and there is a trimmed_means.csv
file which shows normalized mean expression of all genes in all 378 clusters. I would like to look at normalized mean expression and variance (e.g., standard deviation, standard error of the mean) for not just clusters, but also subclasses and regions.
Looking at the documentation and the Transcriptomics Explorer, it seems that the trimmed_means.csv
file is calculated by taking all cells in a cluster, and for each gene, removing the top 25% and bottom 25% of the data, and taking the log2(CPM + 1) of the data, and I assume taking the mean of these values?
I wish there was a normalized_matrix.csv
file, but if I’m going to have to reverse engineer their trimmed_means.csv
process, any help is appreciated.
EDIT: I was able to get closer to the values in trimmed_means.csv
by taking the log2(CPM + 1) of all cells, then excluding the top and bottom 25%, and taking the mean of the remaining values. As an example, for a single cluster and a single gene, the trimmed_means.csv
value is 6.651905, and I’m getting 6.652420762158425.