Hello!
I am a master student and my teacher shared your work to me for learning sc-rnaseq analysis in certain brain regions that we are interested in. So I want to find the data in certain brain regions for qc, clustering, feature analysis…
I found these two websites with 10x sequencing data: Getting started — Allen Brain Cell Atlas - Data Access
GEO Accession viewer
I want to know which one should be selected for my research?
And I found that in the first website there are two types data, namely raw and log2, what’s the difference between them? If I use the “log2” data, how do I continue my analysis(without qc and normalization)
Thank you for your interest in our work. The data at : Getting started — Allen Brain Cell Atlas - Data Access 5 is the final processed dataset after many rounds of QC. It can be used to apply different post QC analysis.
GEO Accession viewer contains the raw sequencing data. If you want to learn preprocessing steps, including sequence alignment, gene quantification and QC, you can start with this dataset.
For the first dataset, the raw count matrix corresponds to the raw counts per gene per cell. Log2 matrix refers to the values after log normalization (first normalize by computing CPM, counts-per-million, then take log2(CPM+1)). The log normalized data is usually used for dimension reduction, visualization etc, but many tools such as Seurat, scanpy prefers raw counts, and apply its own normalization.