I am analysing RNA-Seq data from the Human Developmental Transcriptome dataset. I have seen that the RNA-Seq data was obtained from 42 brain samples from donors of different ages, and that there are 524 anatomically annotated samples in total.
I am making heatmaps using the RNA-Seq data to show the expression of some genes of interest in specific brain regions. I want to standardise the the RPKM values using a Z-score transformation. However, I have read that the Z-score transformation can only be applied for values that are obtained within a single experiment.
Would it be possible to know if the RPKM values for all genes in all the brain samples in the dataset were obtained in a single RNA-Seq experiment?
Hi @miro, thanks for your question. These data were generated by collaborators so to the best of my knowledge we don’t have specific information about data generation available beyond what is shown in the Documentation. Given the extent of this project, the RNA-seq samples were almost certainly processed in multiple batches. However, since RNA-seq data is not as prone to batch effects as microarray data and since these samples were processed using identical experimental and computational methods, it is probably reasonable to treat everything as a single experiment when performing normalization and analysis.
I personally prefer using log-normalized RPKM values over a Z-score transformation, but I am not sure which of the many potential normalization strategies is most appropriate for the BrainSpan data – maybe others have thoughts on that?