Does the BrainSpan RNA-Seq data cover only protein-coding genes?


I am analysing the RNA-Seq data from the BrainSpan Atlas of the Developing Human Brain. I have seen that this dataset contains RNA-Seq RPKM values for 52376 genes.

I have read the documentation on how the RNA was extracted from the brain tissues, and there is a section called ‘mRNA Library Preparation and Sequencing’ which states that mRNA was purified from the total RNA. I have also read that the Gencode gene annotations were used in the alignment of the reads to the reference genome.

If only mRNA was sequenced, why does the BrainSpan RNA-Seq dataset have RPKM values for 52376 genes? There are ~20,000 protein coding genes, and the most recent Gencode release has annotations for 19955 protein coding genes.


All transcripts available from Genecode v10 are included in alignment for RNA-seq data (which can be downloaded here). This includes protein coding genes, as well as non-coding transcripts (e.g., lincRNAs) and pseudogenes. Inclusion does not necessarily mean that there is expression in the brain; however, we do find that some are expressed, and quite a few show cell type specificity in the Allen Cell Types Database.