I am trying to use the zscore values from the GBM IVY atlas to determine if several genes of interest are upregulated in GBM patients. I would like to use the gene expression data to do this, but am confused about what the zscore is referencing. Is there a normal value it is being compared to? Same question applies to the log2 fold changes. What is the change relative to? Is it healthy tissue or cancerous tissue? It seems to truly quantify overexpression you would need to be comparing to normal tissue.
Thank you in advance for your help.
Hi @SB5. I would suggest starting with this community forum post about normalization and Z-scores. You are not the first to have questions about this data set (see this post). In general, the expression values are normalized, within-sample gene expression measures that are log-normalized so the data across samples approximates a normal distribution. Z-scores are then defined as usual (x-mean(x))/sd(x). Z scoring is done irrespective of sample meta-data, and any differential expression analyses are comparing whatever is selected in the source vs. target groups.
Thanks @jeremyinseattle! I have read those other posts but found they didn’t quite answer my question. So to clarify, the z scores included in the GBM database are relative to the mean of the GBM samples, not to healthy tissue, correct?
Correct. They are relative to all of the samples that are included in the data set, but do not include any samples from other data sets of normal tissue. To first approximation you can probably treat “leading edge” samples as healthy tissue, with the caveat that there may be some (potentially very relevant!) tumor gene signatures in there as well.