Extracting data from whole cortex+hippocampus transcrip.tome data set

Thank you for making this great data set available. A couple of diverse questions on data analysis.

  1. Are cells from prefrontal cortex included in the data? When I check through the ‘region_label’ entry of all 76308 cells, I find the following labels of brain areas represented: ‘ACA’, ‘AI’, ‘AUD’, ‘CA’, ‘CLA’, ‘CLA;EPd’, ‘ENTl’, ‘ENTm’, ‘GU;VISC;AIp’, ‘HIP’, ‘MOp’, ‘MOs’, ‘ORB’, ‘PAR;POST;PRE’, ‘PL;ILA’, ‘PTLp’, ‘RSP’, ‘RSPv’, ‘SSp’, ‘SSs’, ‘SSs;GU’, ‘SSs;GU;VISC’, ‘SUB;ProS’, ‘TEa;PERI;ECT’, ‘VISal;VISl;VISli’, ‘VISam;VISpm’, ‘VISp’, ‘VISpl;VISpor’. What does ‘CA’ stand for? Does ‘ORB’ stand for orbitofrontal cortex? And does ‘SSp’ include barrel cortex?

  2. Is there a recommended normalization procedure? I see that the results can be different depending on whether I use /data/exon or /data/intron dataset. Is it recommended to pool all read counts together from the two data sets or first calculate CPM values and take an average between the two?

Hi @Tuomo, thanks for your interest in this data set! All of the region abbreviations match our mouse reference atlas. If you click on the link you’ll see that ORB stands for “Orbital area”. You can search the other names in the box in the upper left (CA stands for the CA region of hippocampus). Regarding data normalization, we typically do the following: (normalized gene expression) = log2(CPM(intron counts + exon counts) + 1)