Whole mouse brain gene expression data


I’m doing a project comparing brain networks between mouse and human brains, and I’m trying to find the gene expression data for the whole mouse brain. I’ve found the human data no problem, but there doesn’t seem to be anywhere to get the whole brain gene expression values for the mouse brain? Any help would be appreciated. Thanks.

All the best,


Hi @Oliverdillon,

The Allen Mouse Brain Atlas has exactly this at ~200um resolution, which can be downloaded from the API. See this community forum post for some additional information on the topic. If you are looking for higher resolution, single cell RNA-seq data sets are becoming available (see end of this post for a few examples).


Hi @jeremyinseattle,

Is that the Download the 200um density volume for the Mouse Brain Atlas SectionDataSet 69816930: file? I’ve downloaded a the files form that link, and got a xml file titled data_set that I assume is the expression values. But I’m having trouble converting it to a readable format like csv, do you know the easiest way to open it in a readable way?
I’m sure this is a fairly basic question, but my knowledge on things like this is a bit limited so I apologise for haha. Thanks.

All the best,


Hi Ollie,

I put together this how-to a while ago which describes a method for how to convert the XML output from Structure Unionization, which is a way to summarize mouse gene expression by brain region across the mouse brain rather than by voxel. For both cases, these steps listed below for conversion from xlm to csv/xls should apply.

How to structure unionize (mouse or aging mouse studies)

(If Excel fails, try this: http://www.convertcsv.com/xml-to-csv.htm)

This will give you a table with all of the columns you care about, plus about 50 extra columns that you can ignore. The important columns are:

  • expression-energy = the value you will likely want to use as the gene expression quantification (it is a combination of intensity and density)
  • expression-density = fraction of pixel covered by stain in the structure (roughly)
  • acronym = structure acronym (for looking up in the reference atlas)
  • atlas-id = structure number in the atlas
  • safe-name & name = human readable structure name (not sure the difference between the columns)
  • sphinx-id35 = structure number that you should use for programming
  • structure-id-path = position in the hierarchy of structures (I think you can get the level of the ontology by counting the number of ids separated by /s)
  • st-level = I think this is the actual level of the ontology, but it is sometimes missing

Let me know if you have any additional questions, although I’ve just about exhausted my knowledge on this topic so hopefully others in the community will reply if needed!


1 Like

Hi Jeremy,

The file conversion worked very well, thank you! However, I’m not sure if there is something I’m doing wrong, because the file I’m opening isn’t giving me the expression values. I downloaded the files from this link: download the 200um energy and intensity volumes for mouse brain atlas sectiondataset 69816930:

This gives me 3 files: data_set, energy, and intensity. Initially I thought the data_set file was the one I needed to open, but now that I have I can see that it doesn’t have the expression values in. The energy, and intensity files are in RAW or MHD format, is one of them the one I should be looking at for the gene expression values?


All the best,


The file you want is energy.raw = A raw uncompressed float (32-bit) little-endian volume representing average expression energy per voxel. A value of “-1” represents no data. This file is returned by default if the volumes parameter is null. Unfortunately, I don’t know how to convert this to a usable format. Does anyone else in the community know?

Another option is to use the cocoframer R library developed by @lucasg. To download the expression energy for sectiondataset 69816930 could be done with a single line of R code: get_aba_ish_data("69816930")

Here’s a quick look at the read function under the hood in cocoframer in case you need to translate it to another language:

raw_file <- open("energy.raw", # Open a connection to the .raw file
                 open = "rb")  # rb = Read Binary

vol_dims <- c(67, 41, 58) # brain volume dimensions (67 x 41 x 58 for 200 um grids)

vol_raw <- readBin(con = raw_file,     # File connection
                   what =  "double",   # Value type - double is a 32-bit float
                   size = 4,           # number of bytes per element. 32-bit values, so 4 bytes
                   n = prod(vol_dims)) # number of voxels (x * y * z)

close(raw_file) # close file connection

vol_raw is then a 3-dimensional array object containing the energy values. In R, this can be converted to a table of x,y,z coordinates and values using:

vol_df <- reshape2::melt(vol_raw,
                         varnames = c("x","y","z"))

There’s a similar example for Matlab on the API page here:

Hope that helps!