Hello,
I have been looking into how to properly normalize and use the gene-level RNA-seq data (RNA-Seq Gencode v10 summarized to genes) provided in https://www.brainspan.org/static/download.html. I have looked into the documents and the forums and I cannot find how these data were pooled together into one csv file. My question is, was there a between samples and across studies normalization step applied to the RPKM data provided in the gene_matrix_csv file? Or where the data from different study groups just put together without further normalization? To put it another way, I am wondering if a normalization step has been performed on the whole expression matrix provided for download (so that they are readily comparable across samples and age categories as is).
I am interested in studying the expression trajectory of genes across developmental stages (from 8 pcw to 40 yrs) and given that this would require comparison between different samples from different studies, I was wondering if other normalization steps are required and if there is a guideline on how to do this.
This is what I have found in “microarray data analysis” section in Miller et al., 2014 paper:
“Data for samples passing QC were normalized in three steps: 1) “within-batch” normalization to the 75th percentile expression values; 2) “cross-batch” bias reduction using ComBat57; and 3) “cross-brain” normalization as in step 1.”
But this is only in reference to four brains included in this paper and only microarray data. So basically I want to know if a similar normalization was done once RNA-seq data from all 42 donors was aggregated into the gene-level expression matrix.
I would really appreciate your help with this. Thank you very much.