Whole mouse brain gene expression data

Hi,

I’m doing a project comparing brain networks between mouse and human brains, and I’m trying to find the gene expression data for the whole mouse brain. I’ve found the human data no problem, but there doesn’t seem to be anywhere to get the whole brain gene expression values for the mouse brain? Any help would be appreciated. Thanks.

All the best,

Oliver

Hi @Oliverdillon,

The Allen Mouse Brain Atlas has exactly this at ~200um resolution, which can be downloaded from the API. See this community forum post for some additional information on the topic. If you are looking for higher resolution, single cell RNA-seq data sets are becoming available (see end of this post for a few examples).

Best,
Jeremy

2 Likes

Hi @jeremyinseattle,

Is that the Download the 200um density volume for the Mouse Brain Atlas SectionDataSet 69816930: file? I’ve downloaded a the files form that link, and got a xml file titled data_set that I assume is the expression values. But I’m having trouble converting it to a readable format like csv, do you know the easiest way to open it in a readable way?
I’m sure this is a fairly basic question, but my knowledge on things like this is a bit limited so I apologise for haha. Thanks.

All the best,

Ollie

Hi Ollie,

I put together this how-to a while ago which describes a method for how to convert the XML output from Structure Unionization, which is a way to summarize mouse gene expression by brain region across the mouse brain rather than by voxel. For both cases, these steps listed below for conversion from xlm to csv/xls should apply.

How to structure unionize (mouse or aging mouse studies)

(If Excel fails, try this: http://www.convertcsv.com/xml-to-csv.htm)

This will give you a table with all of the columns you care about, plus about 50 extra columns that you can ignore. The important columns are:

  • expression-energy = the value you will likely want to use as the gene expression quantification (it is a combination of intensity and density)
  • expression-density = fraction of pixel covered by stain in the structure (roughly)
  • acronym = structure acronym (for looking up in the reference atlas)
  • atlas-id = structure number in the atlas
  • safe-name & name = human readable structure name (not sure the difference between the columns)
  • sphinx-id35 = structure number that you should use for programming
  • structure-id-path = position in the hierarchy of structures (I think you can get the level of the ontology by counting the number of ids separated by /s)
  • st-level = I think this is the actual level of the ontology, but it is sometimes missing

Let me know if you have any additional questions, although I’ve just about exhausted my knowledge on this topic so hopefully others in the community will reply if needed!

Best,
Jeremy

3 Likes

Hi Jeremy,

The file conversion worked very well, thank you! However, I’m not sure if there is something I’m doing wrong, because the file I’m opening isn’t giving me the expression values. I downloaded the files from this link: download the 200um energy and intensity volumes for mouse brain atlas sectiondataset 69816930:

This gives me 3 files: data_set, energy, and intensity. Initially I thought the data_set file was the one I needed to open, but now that I have I can see that it doesn’t have the expression values in. The energy, and intensity files are in RAW or MHD format, is one of them the one I should be looking at for the gene expression values?

Thanks.

All the best,

Ollie

The file you want is energy.raw = A raw uncompressed float (32-bit) little-endian volume representing average expression energy per voxel. A value of “-1” represents no data. This file is returned by default if the volumes parameter is null. Unfortunately, I don’t know how to convert this to a usable format. Does anyone else in the community know?

Another option is to use the cocoframer R library developed by @lucasg. To download the expression energy for sectiondataset 69816930 could be done with a single line of R code: get_aba_ish_data("69816930")

Here’s a quick look at the read function under the hood in cocoframer in case you need to translate it to another language:

raw_file <- open("energy.raw", # Open a connection to the .raw file
                 open = "rb")  # rb = Read Binary

vol_dims <- c(67, 41, 58) # brain volume dimensions (67 x 41 x 58 for 200 um grids)

vol_raw <- readBin(con = raw_file,     # File connection
                   what =  "double",   # Value type - double is a 32-bit float
                   size = 4,           # number of bytes per element. 32-bit values, so 4 bytes
                   n = prod(vol_dims)) # number of voxels (x * y * z)

close(raw_file) # close file connection

vol_raw is then a 3-dimensional array object containing the energy values. In R, this can be converted to a table of x,y,z coordinates and values using:

vol_df <- reshape2::melt(vol_raw,
                         varnames = c("x","y","z"))

There’s a similar example for Matlab on the API page here:
help.brain-map.org/display/mousebrain/API#API-Expression3DGridsz

Hope that helps!
-Lucas

1 Like

Hi Jeremy,

That works great for a single gene, but is there a way to download the whole gene expression dataset?

Hi @Momo,

This has to be done through the API. I think this link addresses it, but please post if you need more information.

Jeremy

Dear @jeremyinseattle,

Thanks for your reply. I think the thread link you mentioned shows a way to download multiple genes, but it wouldn’t be easy to get 21k genes out of it. I was wondering if there is a code like

that would work for the whole genome. The gene IDs based on “mouse_expression_data_sets.csv” file shows that the IDs are not in numerical order, which makes it a bit suboptimal for a simple ‘for’ loop of this code.

There is no single function like the one you want–you will have to make a series of calls to get this information from all of the genes. This is a request that comes up every once in a while and that we are aware of but unfortunately don’t currently have a push-button solution.

1 Like

Hi,

Gene expression data can easily be downloaded and visualised using brainrender if you’re using python. It would be very easy to extend the current functionality to download all genes data.
Let me know if you’d like to know more about how to achieve this!

1 Like

Hi @FedeClaudi. This is really cool, thanks for sharing! Have you spoken to anyone at the Allen Institute about this? I think your tool would address several recurring questions we get (including this one) and it would be great to make sure the people who work on the relevant mouse atlases here are also aware of it. We have a list of useful community tools on the Forum and will add this to that list.

Thanks!

I haven’t personally, but @adamltyson (with whom I’ve developed brainglobe’s atlas API which brainrender uses) might have?
I’m more than happy for brainrender to be used by anyone and for it to supplement the visualisation software produced by Allen.

There are some people at the Allen aware of the BrainGlobe tools (e.g. Brian Long), but I don’t know that they’re being used.