Transcriptomics (RNA-seq/microarray) data normalization - FAQ


We get many questions about how we normalized data for a given atlas, and why we chose the method we chose. For any atlas, the best place to look to find out what normalization we used is in the “Documentation” tab. For example in the Human Brain Atlas, there is an entire white paper about Microarray Data Normalization". In other Atlases, there are sections about normalization in a more general white paper. If you’d like more information about why we chose that normalization method, that is often also found either in the same white paper or in the relevant "Primary Publication Citation"for the atlas. In many cases, we tried several normalization strategies, and the one presented was the one that provided the most reliable biological insights in the data.

As a general rule, we have different goals of normalization for different data modalities:

  • Microarray data - For most microarray data sets, normalization is designed to remove sources of unwanted variability (e.g., batch effects, RNA quality) without removing sources of desired biological variability (e.g., brain structure).
  • RNA-seq (bulk) - In this case the goal of normalization is to account for differences in sequencing depth as well as gene length, which is typically done using TPM or FPKM explained clearly in this blog. If needed additional normalization for unwanted variability is also applied.
  • Single cell/nucleus RNA-seq - We don’t see many of the quality issues in single cell/nucleus data that we find with bulk tissue, and therefore in most cases we only control for sequencing depth (e.g., by taking CPM, or counts per million). Much of the information is found in reads mapping to introns, and so we report these along with reads mapping to exons as well.

Please post any related questions or comments as responses and we can address them!

1 Like