Human Multiple Cortical Areas SMART-seq (2019)

I am working with some of the datasets posted to the brain atlas. We would like to use them while developing a computational technique for denoising single cell data.

We are having some issues with the Human Multiple Cortical Area SMART-seq dataset and would like to

  1. know where it has been published, if anywhere, so that we may follow along from there - at first glance it appears to be from Hodge et al. 2019, however the number of cells does not match up,
  2. know if the preprocessing and filtering has already been done to the data. The criteria posted on the Allen brain atlas page do not appear to be immediately enforceable given the data we are given (a count matrix),
  3. identify whether it is possible to obtain or process count matrices that merge the introns and exons, as well as only consider the introns, and only consider the exons. As it stands it is unclear from the reference genome what are considered introns vs. lncRNAs.

I appreciate your time.
Thank you
Jay S. Stanley III

Hi Jay, thanks for your interest in the dataset. In response to your questions:

  1. A preprint describing these data has not yet been published. Hodge et al. 2019 describes a subset of the data from the middle temporal gyrus (MTG) region.
  2. The count matrix should be the number of total reads without further processing or normalization.
  3. The count matrix includes the sum of intronic and exonic reads. lncRNAs are quantified separately if they are present in the genome annotation. We will work on posting separate exon and intron read count matrices.