Normalized Microarray Datasets

waltheja · October 4, 2022, 1:12pm

Hello everybody,

i want to use the following normalized microarray datasets (Microarray Data Download :: Allen Brain Atlas: Human Brain) to perform a differential gene analysis on specific anatomical region of the brain. Therefore i use Bioconductor’s limma package and develop an R Script.
Can you tell me if the normalized microarray data is log-transforemd?

Thanks in advance!

jeremyinseattle · October 10, 2022, 2:58pm

The short answer is yes, the data is log-transformed. A lot more information about data normalization in this (and other) atlases is in this post, if you are interested.

waltheja · October 19, 2022, 1:10pm

Dear jeremyinseattle,

thanks for your answer! Your information was very helpful. I think it owuld be good to have it in white paper or in the Readme.txt file.

I still try to find the best way to perform the Differential Gene Expression (DGE) analysis with the MicroarrrayExpression data file and have some more questions:

What was the threshold value used in the microarrays to discriminate between signal and noise (background) and thereby determining if a value is significantly expressed above threshold (value of 1 in PACall file)?
The Normalization White paper says the threshold was roughly about 5 (image below).
Though, i find expression values below 5 in the MicroarrayExpression file.

bottom page 8 of 11: “In addition, analysis of contingency tables at log2 expression threshold = 5 (roughly equivalent to the present/absent threshold in the standard quality control report for Agilent arrays)”

Which value should i use as threshold to determine if a gene is expressed?

Furthermore, to obtain a suitable data frame for subsequent DGE analysis with the limma package in R, do you recommend to multiply the microarrayEpression dataset with the PACall dataset to set expression values below threshold to zero and perform the DGE? Or does this artificially introduced value of 0 for non-expressed values lead to problems in my DGE?
Do you recommend to use a representative probe (with the highest expression values ?) and omit the other probes for a specific gene, or do you recommend to average between all probes before performing the DGE?

Thanks for your help so far!

jeremyinseattle · October 19, 2022, 3:21pm

Hi @waltheja. I don’t know the answer to 1 or 2 but will ask around and get back to you.

For #3, I actually wrote a paper on that topic and have and R function to do the calculation for you, whichever method you try. In short, taking the highest expressed probe is the easiest way to do it and will probably work fine. I would not suggest averaging them. You could also use our table from this paper, where we compared each microarray probe to matched RNA-seq data to find the ones that are the most consistent for each gene.

Topic		Replies	Views
What is the format of gene expression value in allen human brain atlas? Science atlas-human-brain-adult , transcriptomics , analysis	1	539	September 8, 2022
Transcriptomics (RNA-seq/microarray) data normalization - FAQ transcriptomics , tbi	17	6605	April 27, 2021
Gene expression level	13	632	May 24, 2023
How do you Interpret Gene Expression Values (counts)? Science transcriptomics , rna-seq	2	834	June 10, 2022
Differences between gene expression expressed as z-score or log2 values in Allen Brain Atlas Science atlas-human-brain-adult , transcriptomics , analysis	3	4328	July 6, 2021

Normalized Microarray Datasets

Related topics