The Allen Institute for Brain Science is pleased to announce that, in collaboration with Amazon’s AWS Open Data initiative, we have released a set of more than 650,000 brain section images treated with in situ hybridization (ISH) to highlight the spatial distribution of over 20,000 genes. A full description of the dataset is available here:
Tools for accessing the images are provided and demonstrated here (you will need to clone the repository to run the notebook):
The data and metadata are accessible to anyone with AWS credentials. The Jupyter notebook referenced above uses Amazon’s boto3 API to retrieve data and metadata. Because this is a part of AWS’s Open Dataset initiative, the data is available for download free of cost.
The data is provided as TIFF images for maximum portability. Metadata is provided as json files which can be deserialized into dicts for exploration in Python or R. The dataset is divided into a series of section_data_sets, each representing a series of images taken from a specific donor and treated to highlight a specific gene. The metadata for the entire dataset maps each section_data_set to its highlighted gene (along with some data about the geometry of the tissue sample). Metadata associated with individual section_data_sets provides greater detail about the tissue sample as well as information regarding the size and resolution of each image. Images are provided at multiple resolutions so that users may preview small, low-quality images before downloading larger, high-quality images for analysis.