- Experimental Overview And Metadata
- Informatics Data Processing
The Allen Mouse Brain Atlas provides genome-wide in situ hybridization (ISH) image data for approximately 20,000 genes in adult mice. Each data set is processed through an informatics analysis pipeline to obtain spatially mapped quantified expression information.
From the API, you can:
- Download images
- Download quantified expression values by structure
- Download quantified expression values as 3-D grids
- Query the differential and correlative search services
- Query the image synchronization service
- Download atlas images, drawings and structure ontology
This document provides a brief overview of the data, database organization and example queries. API database object names are in camel case. See the main API documentation for more information on data models and query syntax.
Experimental data from this atlas is associated with the “Mouse Brain” Product.
Multiple genes were assayed using each Specimen. Typically, the sectioning scheme divided each brain into eight interleaving SectionDataSets with 200 µm sampling density (= 8 x 25 µm thickness).
Each gene was assayed with at least one sagittal SectionDataSet. A subset of genes also has a coronal SectionDataSet and/or replicate experiments. A sagittal SectionDataSet spans the left hemisphere starting with the most lateral section located at where the hippocampus starts to appear to just past the midline yielding ~20 SectionImages at 200 µm sampling density. A coronal SectionDataSet spans both hemispheres starting with the most posterior section showing the cerebellum and hindbrain to the most anterior section showing the olfactory bulb, yielding ~60 SectionImages at 200 µm sampling density. The left side of a coronal SectionImage corresponds to the left hemisphere.
A manual QC protocol defines the criteria for failing experiments due to production issues, discarding damaged SectionImages, verifying and adjusting the tissue bounding boxes, as well as for identifying “dark” artifacts such as bubbles and tears.
From the API, detailed information about Genes, Probes, SectionDataSets and SectionImages can be obtained using RMA queries.
From the above query, gene Pdyn has one sagittal SectionDataSet (id=69782969) and one coronal SectionDataSet (id=71717084). In the web application, images from the experiment are visualized in an experiment detail page. All displayed information, images and structural expression values are also available through the API.
See the image download page to learn how to download images at different resolutions and regions of interest.
Figure: Experiment detail page of a sagittal SectionDataSet (id=69782969) for gene Pdyn showing meta-information, images and computed structure expression graph.
The informatics data processing pipeline produces results that enable the navigation, analysis and visualization. The pipeline consists of the following components:
- an annotated 3-D reference space,
- an alignment module,
- an expression detection module,
- an expression gridding module, and
- a structure unionizer module.
The output of the pipeline is quantified expression values at a grid voxel level and at a structure level according to the integrated reference atlas ontology. The grid level data are used downstream to provide a differential and correlative gene search service and to support visualization of spatial relationships. See the informatics processing whitepaper for more details.
The backbone of the automated pipeline is an annotated 3-D reference space based on the same Specimen used for the coronal plates of the integrated reference atlas. A brain volume was reconstructed from the SectionImages using a combination of high frequency section-to-section histology registration with low-frequency histology to (ex-cranio) MRI registration. This first-stage reconstructed volume was then aligned with a sagittally sectioned Specimen. Once a straight mid-sagittal plane was achieved, a synthetic symmetric space was created by reflecting one hemisphere to the other side of the volume.
Over 800 Structures were extracted from the 2-D coronal reference atlas plates and interpolated to create symmetric 3-D annotations. Structures in the reference atlas are arranged in a hierarchical organization. Each structure has one parent and denotes a “part-of” relationship. Structures are assigned a color to visually emphasize their hierarchical positions in the brain.
In the May 2015 data release, we introduced a next generation common coordinate framework (CCF v3) based on a population average to support the integration of new mouse brain datasets in the Allen Brain Atlas Data Portal. See the Allen Mouse Common Coordinate Framework whitepaper for detailed construction information.
The Nissl volume and 3-D annotation from the Allen Reference Atlas were deformably registered to the new common coordinate framework to support potential cross-modality analysis of gene expression with new data modalities as they become available in the Data Portal.
Figure: The next generation Allen Mouse Common Coordinate Framework (CCF v3)
See the atlas drawings and ontologies page for more information.
All coronal data is registered to ReferenceSpace id = 9. All sagittal data is registered to ReferenceSpace id = 10.
ReferenceSpace id = 9 is in PIR orientation (+x = posterior, +y = inferior, +z = right). ReferenceSpace id = 10 is identical to ReferenceSpace id = 9. The reason for the two spaces is to allow left hemisphere sagittal data to correspond to the right hemisphere coronal reference atlas. This is implemented as a z-axis flip transform between the two reference spaces for the purposes of image synchronization (see below).
Figure: The common reference space is in PIR orientation where x axis = Anterior-to-Posterior, y axis = Superior-to-Inferior and z axis = Left-to-Right.
NOTE: 3-D annotation volumes were updated in the May 2015 release to reflect the introduction of the next generation of the Allen Mouse Common Coordinate Framework (CCFv3). Annotation volumes from the October 2014 release (mapped to CCFv2) can be access through our data download server (see instructions).
Three volumetric data files are available for download:
- atlasVolume : uchar (8bit) grayscale Nissl volume of the reconstructed brain at 25 µm resolution.
- annotation : uint (32bit) structural annotation volume at 25 µm resolution. The value represents the ID of the finest level structure annotated for the voxel. Note: the 3-D mask for any structure is composed of all voxels annotated for that structure and all of its descendents in the structure hierarchy.
- gridAnnotation : uint (32bit) structural annotation volume at grid (200 µm) resolution for gene expression analysis.
All volumetric data is stored in an uncompressed format with a simple text header file in MetaImage format. The raw numerical data is stored as a 1-D array as shown in the figure below.
Figure: Packing of 3-D volumetric data into a 1-D numerical array.
Example Matlab code snippet to read in the 25µm atlas and annotation volume:
% Download and unzip the atlasVolume and annotation zip files
% 25 micron volume size
size = [528 320 456];
% VOL = 3-D matrix of atlas Nissl volume
fid = fopen('atlasVolume/atlasVolume.raw', 'r', 'l' );
VOL = fread( fid, prod(size), 'uint8' );
fclose( fid );
VOL = reshape(VOL,size);
% ANO = 3-D matrix of annotation labels
fid = fopen('annotation.raw', 'r', 'l' );
ANO = fread( fid, prod(size), 'uint32' );
fclose( fid );
ANO = reshape(ANO,size);
% Display one coronal section
% Display one sagittal section
Example Matlab code snippet to read in the 200 µm grid annotation volume:
% Download and unzip the gridAnnotation zip files
% 200 micron volume size
sizeGrid = [67 41 58];
% ANOGD = 3-D matrix of grid-level annotation labels
fid = fopen( 'gridAnnotation.raw', 'r', 'l' );
ANOGD = fread( fid, prod(sizeGrid), 'uint32' );
fclose( fid );
ANOGD = reshape(ANOGD,sizeGrid);
% Display one coronal and one sagittal section
The aim of image alignment is to establish a mapping from each SectionImage to the 3-D reference space. The module reconstructs a 3-D Specimen volume from its constituent SectionImages and registers the volume to the 3-D reference model by maximizing image correlation.
Once registration is achieved, information from the 3-D reference model can be transferred to the reconstructed Specimen and vice versa. The resulting transform information is stored in the database. Each SectionImage has an Alignment2d object that represents the 2-D affine transform between an image pixel position and a location in the Specimen volume. Each SectionDataSet has an Alignment3d object that represents the 3-D affine transform between a location in the Specimen volume and a point in the 3-D reference model. Spatial correspondence between any two SectionDataSets from different Specimens can be established by composing these transforms.
For convenience, a set of “Image Sync” API methods is available to find corresponding position between SectionDataSets, the 3-D reference model and structures. Note that all locations on SectionImages are reported in pixel coordinates and all locations in 3-D ReferenceSpaces are reported in microns. These methods are used by the Web application to provide the image synchronization feature in the multiple image viewer (see Figure).
- Fetch alignment transforms parameters for the sagittal Pdyn SectionDataSet
- Sync a location between the sagittal and coronal Pdyn SectionDataSets
Figure: Point-based image synchronization on the Web application. Multiple SectionDataSets in the Zoom-and-Pan (Zap) viewer can be synchronized to the same approximate location in both sagittal and coronal planes. Screenshots taken before and after synchronization show genes Dpp6 and Myo16 and the relevant coronal and sagittal plates of the reference atlas. Gene Myo16 shows enriched expression in the medial habenula (MH).
For every ISH SectionImage, a grayscale mask is generated that identifies pixels corresponding to gene expression. The detection algorithm is based on adaptive thresholding and mathematical morphology.
The expression mask image is the same size and pixel resolution as the primary ISH image and can be downloaded through the image download service.
Figure: Web application presentation of expression detection for gene Pde10a. Screenshot of expression detection mask for Pde10a showing dense high expression in the striatum and low expression in the isocortex. The intensity is color-coded to range from blue (low expression intensity), through green (medium intensity) to red (high intensity).
For each SectionDataSet, the Gridding module creates a low resolution 3-D summary of the gene expression and projects the data to the common coordinate space of the 3-D reference model. Casting all data into a canonical space allows for easy cross-comparison of gene expression data from every Product. The expression data grids can also be viewed directly as 3-D volumes or used for analysis (i.e. differential and correlative searches).
Each image in a SectionDataSet is divided into a 200 x 200 µm grid. Pixel-based gene expression statistics are computed using information from the primary ISH and the expression mask:
- expression density = sum of expressing pixels / sum of all pixels in division
- expression intensity = sum of expressing pixel intensity / sum of expressing pixels
- expression energy = expression intensity * expression density
Each per-image 2-D expression grid is smoothed and rotated to form a 3-D grid. Z-direction smoothing is applied to the 3-D grid which is then transformed into the standard reference space.
Grid data can be downloaded for each SectionDataSet using the 3-D Expression Grid Data Service. The service returns a zip file containing the volumetric data for expression density, intensity and/or energy in an uncompressed format with a simple text header file in MetaImage format. Structural annotation for each grid voxel can be obtained via the ReferenceSpace gridAnnotation volume file.
Note: while the reference space spans both hemispheres, sagittal SectionDataSets only span the left hemisphere. Voxels with no data are assigned a value of “-1”.
- Download expression energy grid file for the coronal Pdyn SectionDataSet
- Download expression density and intensity grid files for the same SectionDataSet
NOTE: Grid data were updated in the May 2015 release to reflect the introduction of the next generation of the Allen Mouse Common Coordinate Framework (CCFv3). Grid data from the October 2014 release (mapped to CCFv2) can be access through our data download server (see instructions).
The expression data grid can be viewed in the Brain Explorer® 2 desktop program. Each grid voxel is rendered as a colorized sphere where the diameter represents expression energy and the color encoding expression intensity. In addition, a preview of the expression data grid is shown on the Web application as a series of maximum density projection images.
Example Matlab code snippet to read in the 200 µm energy grid volume:
% Download and unzip the energy grid file for Pdyn SectionDataSet
% 200 micron volume size
sizeGrid = [67 41 58];
% ENERGY = 3-D matrix of expression energy grid volume
fid = fopen('Pdyn_P56_coronal_71717084/energy.raw', 'r', 'l' );
ENERGY = fread( fid, prod(sizeGrid), 'float' );
fclose( fid );
ENERGY = reshape(ENERGY,sizeGrid);
% Display one coronal and one sagittal section
Expression statistics can be computed for each structure delineated in the reference atlas by combining/unionizing grid voxels with the same 3-D structural label. While the reference atlas is typically annotated at the lowest level of the tree, statistics at upper level structures can be obtained by combining measurements of the hierarchical children. This process produces expression density, intensity and energy measurements for each experiment and structures of interest.
Expression statistics are encapsulated as a StructureUnionize object associated with one Structure and one SectionDataSet and can be downloaded via RMA.
StructureUnionize data is used in the web application to display expression summary bar graphs for a set of coarse structures.
A expression grid search service has been implemented to allow users to instantly search over the ~25,000 SectionDataSets to find genes with specific expression patterns:
- The Differential Search function allows users to find genes which have higher expression in one structure (or set of structures) compared to another structure (or set of structures).
- The Correlation Search function enables the user to find genes that have a similar spatial expression profile to a seed gene when compared over a user-specified domain.
The expression grid search service is available through both the Web application and API.
To perform a Differential Search, a user specifies a set of target structures and a set of contrast structures. In the service, the set of voxels belonging to any of the target structures forms the target voxel set, and voxels belonging to any of the contrast structures form the contrast voxel set. For each SectionDataSet a fold change is computed as the ratio of average expression energy in the target voxel set over the average expression energy in the contrast voxel set. The return list is sorted in descending order by fold-change.
Example: Differential search for genes with higher expression in the thalamus than the isocortex
- Pipe1: Set up the contrast structure list by finding the structure isocortex within the Mouse Brain ontology
- Pipe2: Set up the target structure list by finding the structure thalamus within the Mouse Brain ontology
- Connect the two pipes to service::mouse_differential to perform the differential search
- Visualize the same search result in the web application
See the connected service page for definitions of service::mouse_differential parameters.
Figure: Screenshot of top returns of a differential search for genes with higher expression in the thalamus than the isocortex. Mini-expression summary graphs show enrichment in the thalamus (red) compared to other brain regions.
To perform a Correlation search, a user selects a seed SectionDataSet and a domain over which the similarity comparison is to be made. All voxels belonging to any of the domain structures form the domain voxel set. Pearson’s correlation coefficient is computed between the domain voxel set from the seed SectionDataSet and every other SectionDataSet in the Product. The return list is sorted by descending correlation coefficient.
Example: Correlation search for genes with similar expression to the sagittal Pdyn SectionDataSet
- Pipe: Set up the seed SectionDataSet by finding the sagittial SectionDataSet for gene Pdyn
- Connect the pipe to service::mouse_correlation to perform the correlation search
- Visualize the same search result in the Web application
See the connected service page for definitions of service::mouse_correlation parameters.
Figure: Screenshot of top returns of a correlation search for genes with similar expression as the sagittal Pdyn SectionDataSet.
In order to perform these computations quickly over the entire data set, a subset of voxels are loaded in memory. The full expression grid is 67x41x58=159,326 voxels spanning both hemispheres and includes background voxels. To load all voxels for all image series into memory would require 14GB of RAM. To reduce memory requirements and increase the efficiency of calculations, voxels spanning over 80% of all experiments were identified. Only these ~26,000 voxels were then used in the ‘‘full’’ search service requiring 4 GB of RAM and partially spanning one hemisphere.
To take advantage the data on both hemispheres in coronal data, a second ‘‘coronal only’’ search service is also available as an option. The coronal service spans both hemispheres covering 58,387 voxels and searches over the ~4,000 coronal image series.
It should be noted that this on-the-fly search service is derived from a fully automated processing pipeline. False positive and false negative results can occur due to artifacts on the tissue section or slide and/or algorithmic inaccuracies. Users should confirm results with visual inspection of the ISH images.