Human Motor Cortex Single Cell RNAseq Data - is this raw/unprocessed data?

danielcgingerich · August 20, 2020, 6:05pm

I would like to use the M1 10X dataset from @trygveb paper, ‘Evolution of cellular diversity in primary motor cortex of human, marmoset, monkey, and mouse’ as a reference for snRNA seq. I obtained the data from the allen brain map website: https://portal.brain-map.org/atlases-and-data/rnaseq/human-m1-10x. I see in this paper that the data was normalized and scaled. What I would like to know is if this dataset that is provided on the website is the raw, unprocessed data. I am using Seurat and SingleR to annotate cell types, which require normalized and scaled expression matrices. Do I need to normalize and scale this dataset before using as a reference? Or is the provided expression matrix already processed? Thanks!

jeremyinseattle · August 24, 2020, 4:27pm

Hi @danielcgingerich. The human data at this link above represents total reads assigned to a given gene for a given nucleus (introns + exons). Typically, we normalize and scale snRNA-seq data by calculating counts per million and then log normalizing: e.g. log2(CPM(introns+exons)+1). In theory the unique molecular identifier (UMI) values represent the actual number of transcripts in the cell and don’t need to be normalized, but in practice we (and many others) find such normalization and scaling improves the results.

nik.jorstad · August 24, 2020, 8:10pm

Hi @danielcgingerich,

The data in the gene expression matrix .csv file are the raw UMIs. The values are not normalized and represent the direct output matrix from Cell Ranger. For analysis in Seurat, we find that using either the SCTransform normalization method or the standard NormalizeData function work well.

Code for reproducing our analyses will be provided upon peer-review publication of this manuscript.

Please let us know if you have additional questions!

aramisho · May 3, 2021, 1:23pm

Hello, for analysis in Seurat using NormalizeData function - what scale.factor is recommended? , The default is 10000.

Topic		Replies	Views
Is smart-seq matrix human multiple cortical areas normalized? Science atlas-cell-types , rna-seq , human	1	439	July 18, 2022
Normalisation protocol and cell types How To	2	21	July 28, 2025
snRNA-seq M1 Data Across Species rna-seq	2	447	April 20, 2022
References and citations of human RNA-seq datasets Technical celltype , rna-seq , human	1	584	November 5, 2021
Raw UMI counts as input for MapMyCells	1	25	February 18, 2025

Human Motor Cortex Single Cell RNAseq Data - is this raw/unprocessed data?

Related topics