How do I find present/marginal/absent calls in the NHP microarray data?



Currently, all of our microarray (and RNA-seq) atlases provide user-friendly, downloadable files with gene expression data, along with associated gene and sample meta-data. In addition, we try to provide access to the raw data (or something as close to raw data as we are legally able) for each sample in the data set. In the case of the NIH Blueprint NHP Atlas, these raw data files are “.CEL” files that contain the gene expression data, as well as all the quality (present/marginal/absent, or PMA) calls associated with each probe for that sample. To access these PMA calls, please perform the following steps:

  1. Go to the Download page and click on the appropriate “xml link” to meta-information and URL to download each of the raw microarray files.
  2. This will open an XML page, which can be saved and read (for example) in Microsoft Excel.
  3. Locate the “download-link” column in this table. This link will allow you to download the CEL file associated with each sample. If anyone knows how to download these files in batch rather than one by one, please post a comment accordingly.
  4. Once downloaded these CEL files can be manipulated and PMA calls can be extracted using the affy R library..

Code snippet, found here:

library(affy) dataIn <- ReadAffy() # read in the data
rmaData <- rma(dataIn) # normalise and summarise, could also use gcrma etc etc 
calls <- mas5calls(data) # get PMA calls 
calls <- exprs(calls) 
absent <- rowSums(calls == 'A') # how may samples are each gene 'absent' in 
absent <- which (absent == ncol(calls)) # which genes are 'absent' in all samples 
rmaFiltered <- rmaData[-absent,] # filters out the genes 'absent' in all samples