Hi everyone,
I know this is a very basic question, but why is that log2 gene expression values are sometimes very different even for similar z-score values? For example:
I noticed that log2 values are quite stable within a given probe, while z-score values are more stable between probes. Is this because they are normalized by different factors?
Which score would be better to use if I want to compare expression differences between genes?
Thank you.
Hi @spraguedawley. Z scores are defined as the number of standard deviations a value is away from the mean expression level, which makes it a good measure of differential expression. So if a gene has higher expression in hippocampus (for example) as compared to the rest of the brain, then the z score will be high regardless of what the average expression is in brain. I would not recommend comparing absolute expression differences between genes at all using microarray data. That said, the log2 expression values would be better to use than z scores for comparing such average expression levels between genes. Differential expression values (e.g., how much higher is gene x in hippocampus than in frontal cortex) are more reliable and can be compared between genes. This thread on data normalization and z scores goes into a bit more detail about some components of this response, but feel free to reply if you still have questions.
1 Like
Thank you @jeremyinseattle for your answer.
Just to make sure that I have understood, when you say:
I can compare these differential expression values between genes using z scores, right?
Best regards.
I think that would be fine. What you’d want to do is divide (if values are linear or subtract if they are logarithmic) the expression in region B (e.g., frontal cortex) from the expression in region A (e.g., hippocampus), where “expression” could be absolute values, normalized values, or Z-scores. This is what you get when you perform such a differential search in the Allen Human Brain Atlas.