Normalized Microarray Datasets

Hello everybody,

i want to use the following normalized microarray datasets (Microarray Data Download :: Allen Brain Atlas: Human Brain) to perform a differential gene analysis on specific anatomical region of the brain. Therefore i use Bioconductor’s limma package and develop an R Script.
Can you tell me if the normalized microarray data is log-transforemd?

Thanks in advance!

The short answer is yes, the data is log-transformed. A lot more information about data normalization in this (and other) atlases is in this post, if you are interested.

1 Like

Dear jeremyinseattle,

thanks for your answer! Your information was very helpful. I think it owuld be good to have it in white paper or in the Readme.txt file.

I still try to find the best way to perform the Differential Gene Expression (DGE) analysis with the MicroarrrayExpression data file and have some more questions:

  1. What was the threshold value used in the microarrays to discriminate between signal and noise (background) and thereby determining if a value is significantly expressed above threshold (value of 1 in PACall file)?
    The Normalization White paper says the threshold was roughly about 5 (image below).
    Though, i find expression values below 5 in the MicroarrayExpression file.

bottom page 8 of 11: “In addition, analysis of contingency tables at log2 expression threshold = 5 (roughly equivalent to the present/absent threshold in the standard quality control report for Agilent arrays)”

Which value should i use as threshold to determine if a gene is expressed?

  1. Furthermore, to obtain a suitable data frame for subsequent DGE analysis with the limma package in R, do you recommend to multiply the microarrayEpression dataset with the PACall dataset to set expression values below threshold to zero and perform the DGE? Or does this artificially introduced value of 0 for non-expressed values lead to problems in my DGE?

  2. Do you recommend to use a representative probe (with the highest expression values ?) and omit the other probes for a specific gene, or do you recommend to average between all probes before performing the DGE?

Thanks for your help so far!

Hi @waltheja. I don’t know the answer to 1 or 2 but will ask around and get back to you.

For #3, I actually wrote a paper on that topic and have and R function to do the calculation for you, whichever method you try. In short, taking the highest expressed probe is the easiest way to do it and will probably work fine. I would not suggest averaging them. You could also use our table from this paper, where we compared each microarray probe to matched RNA-seq data to find the ones that are the most consistent for each gene.

1 Like