Quantification of gene expression by in situ hybridization: finding and using the raw values

Many of the histological datasets that have been developed by the Allen Institute display gene expression by in situ hybridization (ISH), using colorimetric detection of RNA transcripts. Detailed methods describing how these experiments and analyses were performed are available in Technical Whitepapers that are customized to each dataset; some of those are listed below for reference.

People often want to use ‘raw’ expression values, or computed numbers that are generated as part of the analysis for visualization of data. The ISH process is only semi-quantitative due to signal amplification, so people who choose to use this information should be aware of that caveat and proceed with their analysis accordingly. Data quantification is sometimes presented as heatmaps, specifically to draw attention to the fact that these values are guides rather than absolute quantitative ‘truths’ reflecting transcript abundance.

That said, the ‘raw’ values are still useful and are available through the API, if they are not listed directly on the web page corresponding to an experiment.

For example, if you are interested in accessing raw expression values shown in this type of image from the Allen Developing Mouse Brain Atlas, you can find instructions on how to access this data through the API: http://help.brain-map.org/display/devmouse/API. There is a lot of useful information here, but the section on Structure Unionization provides information on how to access and download raw values that are calculated for a given gene of interest.

The API is a great way to dig into the data programmatically, and together with the Technical Documentation (we also call these whitepapers), you can learn so much more than by looking at the data images alone!

Technical Documents for:

We have also received several questions about how to perform structure unionization and return the output for more than one gene at a time (along with associated gene symbols). The API call below shows an example for how to do this for the genes Fam84b, Fam184a, and Rasa and return the results in json format. Several online json to csv conversion tools are available (via a Google search) if you are uncomfortable with this format.

http://api.brain-map.org/api/v2/data/query.json?criteria=model::StructureUnionize,rma::criteria,section_data_set(genes[acronym$in'Fam84b','Fam184a','Rasa2']),rma::include,section_data_set(genes),rma::options[only$eq'genes.acronym,data_sets.id'][start_row$eq8000][num_rows$eq2000]

Hello,
I am trying to get the raw ISH expression values for a group of genes throughout development. The Manual Annotation give values that were annotated or predicted by down or up walks. I would like to use values that were actually annotated.
The “comments” column says whether it is an up or down walk annotation.
If the comment is blank (does not say anything), does it mean that that structures was actually manually annotated and not predicted?
Thank you
Juan

Hello Juan - that should be correct - if comment is blank it was directly annotated.

Hi,

I too am interested in the precise process which was used to convert from the colorimetric pixel values in an ISH slice image, and the corresponding heatmap pixel values in that slice’s expression data image. I can’t seem to find this described in any of the documentation, including supplementary document 2 of “Genome-wide atlas of gene expression in the adult mouse brain” by Lein et al (Nature, 2007), which has a section entitled “Automated Gene Expression Detection”.

One thing I couldn’t find was a mathematical statement of the heatmap used in the Allen brain map expression images

I’ve correlated ISH and expression images (precisely, image 100873513 which is slice 99 from experiment 100041580 in the developing mouse atlas*) pixel by pixel to try to determine the mapping.

Here’s a plot in RGB space showing:

800 randomly selected pixels in an ISH image which correlate with pixels in the Allen expression image which have been deemed to be expressing the gene in question (Emx2) (purple outlines to points.)

and also

800 randomly selected pixels from the ISH image which correlate with NON expressing pixels from the expression image. (Green outlines to points.)

In each case, the fill of the circle is the RGB colour value from the ISH image pixel. It’s a little hard to get a feel for it in 2D, but I can only upload one view of the graph.

So, the general colour scheme is: yellower: non expressing; purpler/darker: expressing.

why so much overlap in the points?

In the graph, I’ve included no information about the quantity of expression (because I don’t know the heatmap for the expression map).

thanks for reading!

Seb James

  • (Full size) images were obtained with:

http://api.brain-map.org/api/v2/image_download/100873513

and

http://api.brain-map.org/api/v2/image_download/100873513?view=expression

1 Like

Hi @sebjames,
Thanks for this great question. A description of the analysis methods related to data analysis are published in IEEE, by Ng, et al (2007). There’s a paywall on this - if you don’t have access, before you buy it, note that you won’t find the mathematical statements in there; its a more detailed description of the whitepapers that you can download from the Documentation tab of the Allen Mouse Brain Atlas portal. But hopefully one of the authors who worked on this project will weigh in here!

If someone could just answer this question, that alone would be helpful:

Seb,

Once the segmentation algorithm has identified expressing vs non-expressing pixels, the expressing pixels are colorized based on the luminosity of the ISH image (0.21 R + 0.72 G + 0.07 B) over some small local neighborhood. On the web application it is then colorized using the jet colormap. As such this not a “quantity” of expression as you put it. Rather, we wanted to provide a view of the color intensity in the expression mask.

Segmentation takes into account intensity of the local background which may contributes to the overlap that you are observing.

Lydia

Dear Lydia,

Thank you very much for responding, I do appreciate it. I didn’t write back immediately, as I wanted to make sure I had a clear set of questions for you.

I’m not sure I understand everything that you wrote, so I’ll add questions & commentary to quotes from your post:

“Once the segmentation algorithm has identified expressing vs non-expressing pixels…”

Ok, so this algorithm must be using the luminosity and/or some color information in the original ISH image to determine the expressing/non-expressing state for that pixel. I understand from the documentation that this algorithm is masking away non-tissue areas of the image and may be accounting for occulsions, tissue tears, etc. So does the algorithm use color information at this stage and if so, how?

“…the expressing pixels are colorized based on the luminosity of the ISH image (0.21 R + 0.72 G + 0.07 B) over some small local neighborhood”

Ok, so that sounds like for a given pixel that is determined to be expressing, you then go back to the ISH image, and perhaps position a 2-D Gaussian hill over the pixel and sample the ISH color information around this hill to gather a number for the expression image pixel. I don’t understand (0.21 R + 0.72 G + 0.07 B). That color axis in 3D color space is more or less green and none of the ISH images seem to have greenish color in them. When I plot expressing and non expressing ISH pixel colors, they seem to be arranged along a line in color space which is roughly grey. Here are the points, plotted as in my first post, along with the 0.21R + 0.72G + 0.07B line in thick green.

Ok, so the expression map is then plotted with the jet colormap, that is helpful. I notice that in the Brain Explorer application, you can actually find some numbers associated with the jet colormap. Are these accessible via the API? That is, is the range which the jet map represents available via the API? Do these numbers have any physical meaning, or are they arbitrary?

Finally, do you know if the luminosity of the stain in the ISH image is roughly linear with the number density of mRNA strands that code for the given protein, or is it some other relationship, such as exponential?

best regards,

Seb James

“So does the algorithm use color information at this stage and if so, how?”

It’s bee a long while and people has come and go. My best recollection is that the color information get converted to single “intensity” value during the segmentation process.

“Are these accessible via the API? That is, is the range which the jet map represents available via the API?”

Yes, they are. Have a look at these places first and follow up with questions
http://help.brain-map.org/display/mousebrain/API
http://api.brain-map.org/examples/foldchange/index.html

“Do these numbers have any physical meaning, or are they arbitrary?”
They are again local average of the gray scale value using that formula over “expression pixels” (see the whitepaper pdf)

“, do you know if the luminosity of the stain in the ISH image is roughly linear with the number density of mRNA strands that code for the given protein, or is it some other relationship, such as exponential?”

No, I do not. ISH is semi-quantitative (!) with the quantitative part being good spatial resolution but they color aspect is not so much.

Hi everyone,

For the in situ hybridization (ISH) assay presented in the Allen Mouse Brain Atlas, colorimetric ISH signal intensity is not linear with mRNA quantity in a cell. No quantitative assessment about amount of expression of a given transcript should be made based on the appearance of the purple precipitate, that goes much beyond qualitative terms such as ‘lots’ ‘some’ or ‘little.’ Refer to the detailed technical documentation on ISH process for an explanation of why. Additional experimental data would be required to establish the quantitative nature of transcript level, on a probe-by-probe basis.

Thanks for all the additional information. I believe I have enough information to help me to generate some expression heat maps from the ISH images now.

What I’ve decided to do is to take the ISH color information, and apply a transform in color space so that the “purplish” colours end up being aligned with one of the color space axes (it happens to be the one that was originally ‘blue’). I then make an elliptical tube around this axis, and all color pixels whose (transformed) color lands them inside the tube get a non-zero “expression” value taken from the projection of the point onto the axis. If the pixel is on this axis, but it nonetheless too ‘light’ then it gets expression=0 assigned to it.

It would still be great if the precise algorithm that was applied to turn ISH images into expression maps was published at some time. Is the code that was used open source or available by personal communication?

1 Like

Hi @jeremyinseattle,

I tried and was wondering for the API for genes Fam84b , Fam184a , and Rasa, does the return energy density values synthesize both coronal and sagittal experiments for each gene since there is no place in the API for me to specify experimental ID?

Thank you!

Hi,

Can you post your API query so I can see what you are seeing?

It’s just the example from above:

http://api.brain-map.org/api/v2/data/query.json?criteria=model::StructureUnionize,rma::criteria,section_data_set(genes[acronym$in’Fam84b’,‘Fam184a’,‘Rasa2’]),rma::include,section_data_set(genes),rma::options[only$eq’genes.acronym,data_sets.id’][start_row$eq8000][num_rows$eq2000]

Each “structure unionize” record is for specific experiment and structure.

http://api.brain-map.org/api/v2/data/query.xml?criteria=
model::StructureUnionize
,
rma::criteria,
section_data_set(genes[acronym$in'Fam84b','Fam184a','Rasa2'])
,
rma::include,
structure,section_data_set(plane_of_section)
,rma::options
[only$eq'structures.id,structures.acronym,structures.graph_order,genes.acronym,plane_of_sections.name,data_sets.id

This modification of the query may be helpful to see if the experiment was sagittal or coronal

This is an example xml for the first record

<structure-unionize>
<expression-density>0.00384803</expression-density>
<expression-energy>0.454006</expression-energy>
<id>453370663</id>
<section-data-set-id>75081395</section-data-set-id>
<structure-id>17437</structure-id>
<sum-expressing-pixel-intensity>5350170.0</sum-expressing-pixel-intensity>
<sum-expressing-pixels>45346.6</sum-expressing-pixels>
<sum-pixel-intensity>247297000.0</sum-pixel-intensity>
<sum-pixels>11784400.0</sum-pixels>
<voxel-energy-cv>0.839586</voxel-energy-cv>
<voxel-energy-mean>0.454002</voxel-energy-mean>
<section-data-set>
<id type="integer">75081395</id>
<plane-of-section>
<name>coronal</name>
</plane-of-section>
</section-data-set>
<structure>
<acronym>r8B</acronym>
<graph-order type="integer">2118</graph-order>
<id type="integer">17437</id>
</structure>
</structure-unionize>

Does this help?

Hello, what does voxel_energy mean and how could I convert it to expression density?

Hi @caroline78,
More information about term definition and calculating gene expression based on image density measurements is in the documentation for the Allen Mouse Brain Atlas, downloadable as a PDF here:


You can also learn more about the API in the Help section:
help.brain-map.org/display/mousebrain/API

Hope this helps!

1 Like

A very informative thread about how ISH intensities are colorized!

I am curious if there is a way to map the expression mask RGBs back to an intensity value? I understand a major caveat here is that probe to intensity signal is itself non-linear. Is the mapping from intensity to jet linear once a feature is detected in the mask?

1 Like

Hello,

My understanding is that the ISH data expressed as “expression energy” is not quantitative with respect to the actual amount of mRNA, because the relationship between color intensity and mRNA quantity is not linear.

  1. However, the data should be somehow linear within a certain range with the same probe. This would allow for relative quantitative comparisons of the gene expression in different brain areas within a certain range of expression. Is there any estimation of the range of expression energy values within which the signal can be considered fairly linear?

  2. I found that for some genes different probes have been used at different developmental stages of the mouse brain. In the case of the gene Cdh5, one probe was used for embryonic samples ( RP_080912_03_B02 ) and another for P56 ( RP_070219_02_C05 ).
    In this case the expression energy is quite different between the embryonic and postnatal samples.
    If a different probe was used, it is not possible to compare expression levels between stages.
    Is there any reason why two different probes were used?
    Has this been a common practice and should be checked for each gene?

Thank you for your help
Juan

1 Like