Injection coverage matrix for thalamus

Good day,

I was wondering what would be the easiest way to create an injection coverage matrix for the Thalamus ?
I understand it is possible to run a source search for thalamus and look at the returned individual experiments and their associated ‘Inj site vol’, but I would like to be able to aggregate this information and get a percentage coverage for each of the thalamic nuclei.

Many thanks!

Hi polina,

Welcome to the community forums!

If I understand correctly, you would like to build a matrix whose:

  • rows are injection experiments
  • columns are thalamic substructures
  • values are the fraction of the substructure covered by the injection

You can do this using AllenSDK, an open source Python package for accessing and working with Allen Institute data. Here is a script that downloads the required data, builds a matrix, and saves it out as a csv:

from allensdk.core.mouse_connectivity_cache import MouseConnectivityCache

import pandas as pd

mcc = MouseConnectivityCache()
structure_tree = mcc.get_structure_tree()

# obtain the unique numeric identifier for the thalamus
thalamus_id = structure_tree.get_structures_by_name(["Thalamus"])[0]["id"]

# obtain all experiments with injections into the thalamus
th_experiment_ids = [
    experiment["id"] 
    for experiment 
    in mcc.get_experiments(injection_structure_ids=[thalamus_id])
]

# get summary injection & projection values for each thalamic substructure on each thalamic injection
# note that these substructures may spatially contain one another!
th_unionizes = mcc.get_structure_unionizes(
    experiment_ids=th_experiment_ids,
    structure_ids=[thalamus_id],
    include_descendants=True
)

# For readability, add a column with full structure names
th_unionizes["structure_name"] = th_unionizes["structure_id"].apply(
    lambda stid: structure_tree.nodes([stid])[0]["name"]
)

# Obtain a matrix of injection signal volume
injection_volume = pd.pivot_table(
    th_unionizes[th_unionizes["is_injection"]], 
    values="projection_volume", 
    index="experiment_id", 
    columns="structure_name",
    fill_value=0.0
)

# obtain a matrix of total structure volume
structure_volume = pd.pivot_table(
    th_unionizes, 
    values="volume", 
    index="experiment_id", 
    columns="structure_name",
    fill_value=0.0
)

fraction_covered = injection_volume / structure_volume
fraction_covered.to_csv("fraction_covered.csv")

For more information on AllenSDK, please see the documentation, particularly the mouse connectivity section and notebook of examples. To learn more about the data and processing pipeline, check out the whitepapers. I also find the atlas viewer to be a good way to get a sense of these structures and their spatial relationships.

Cheers,
Nile

Dear Nile, many thanks for your reply, I will give this a go.
As first step, I am trying to install allensdk (pip install allensdk) but am running into an error documented here : https://github.com/AllenInstitute/AllenSDK/issues/426

Do you happen to know if there is indeed a new release of the sdk addressing this issue?

Cheers,
Polina

Well that’s no good :slightly_frowning_face: .

I couldn’t immediately reproduce this error (on ubuntu 18.04/python 3.7). Could you reply with:

  1. the OS and Python version you are using
  2. a copy-paste of the error that pip install -U allensdk emits.

and we’ll investigate.

Thank you!

Thanks!

I am on macOS Catalina and use python 3.8.2
And this is the error I get:
“…
Collecting tables==3.5.1
Using cached tables-3.5.1.tar.gz (8.3 MB)
ERROR: Command errored out with exit status 1:
command: /Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 -c 'import sys, setuptools, tokenize; sys.argv[0] = '”’"’/private/var/folders/96/r5m6dh8s0kg68xddvd7llszx4v0phs/T/pip-install-dtginugz/tables/setup.py’"’"’; file=’"’"’/private/var/folders/96/r5m6dh8s0kg68xddvd7llszx4v0phs/T/pip-install-dtginugz/tables/setup.py’"’"’;f=getattr(tokenize, ‘"’"‘open’"’"’, open)(file);code=f.read().replace(’"’"’\r\n’"’"’, ‘"’"’\n’"’"’);f.close();exec(compile(code, file, ‘"’"‘exec’"’"’))’ egg_info --egg-base /private/var/folders/96/r5m6dh8s0kg68xddvd7llszx4v0phs/T/pip-pip-egg-info-pib5932i
cwd: /private/var/folders/96/r5m6dh8s0kg68xddvd7llszx4v0phs/T/pip-install-dtginugz/tables/
Complete output (12 lines):
/var/folders/96/r5m6dh8s0kg68xddvd7llszx4v0phs/T/H5closey8mcy32w.c:2:5: warning: implicit declaration of function ‘H5close’ is invalid in C99 [-Wimplicit-function-declaration]
H5close();
^
1 warning generated.
ld: library not found for -lhdf5
clang: error: linker command failed with exit code 1 (use -v to see invocation)
* Using Python 3.8.2 (v3.8.2:7b3ab5921f, Feb 24 2020, 17:52:18)
* USE_PKGCONFIG: False
… ERROR:: Could not find a local HDF5 installation.
You may need to explicitly state where your local HDF5 headers and
library can be found by setting the HDF5_DIR environment
variable or by using the --hdf5 command-line option.
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
WARNING: You are using pip version 20.1; however, version 20.1.1 is available.
You should consider upgrading via the ‘/Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 -m pip install --upgrade pip’ command."

Here’s what I think is going on:

AllenSDK pins tables to version 3.5.1 in order to avoid the issue that you linked above. Version 3.5.1 of tables does not come with pre-built wheels for Python 3.8. When you pip install tables into Python 3.8, the tables setup script tries to compile and link its c components. This is failing, since the linker cannot find hdf5.

Probably the easiest way to work around this problem would be to create a Python 3.7 environment using venv or conda and install AllenSDK into that environment.

Nile, huge thanks! The sdk is finally installed and I could run the script and get back a .csv with the thalamic substructures and the fractions of injection coverage.

One question I have - is there a away to somehow add up the injection coverage fractions per substructure to get one total value representing the total coverage from all experiments for a particular region (to get a better feeling for how thorough the available data is) ? I am aware that it is not straightforward adding the fractions, given that the experiments cover spatially overlapping substructure regions. Maybe you have a solution!

Many thanks,
Polina

Glad to hear you were able to generate the matrix!

As you say, the region-level summary data (structure unionizes) won’t tell you about the within-region distribution of the injection signal. However, we also make available the voxel-level grid data on which these structure-level summaries are based. See this section of the mouse connectivity examples for some visuals and examples and the informatics whitepaper for more details on how these data are used in our pipeline.

You mentioned that your goal is to get a feel for the distribution of the available data. One potential starting point would be to ask for each thalamic voxel “how many experiments injected into this voxel”:

from allensdk.core.mouse_connectivity_cache import MouseConnectivityCache

import numpy as np
import nrrd

# setting isometric resolution to 50 microns for speedy downloads & low memory use
# other options are 10, 25, & 100
mcc = MouseConnectivityCache(resolution=50)
structure_tree = mcc.get_structure_tree()

# obtain the unique numeric identifier for the thalamus
thalamus_id = structure_tree.get_structures_by_name(["Thalamus"])[0]["id"]

# obtain all experiments with injections into the thalamus
th_experiment_ids = [
    experiment["id"] 
    for experiment 
    in mcc.get_experiments(injection_structure_ids=[thalamus_id])
]

# count a voxel as "injection" if it is at least half-full of injection signal
inj_threshold = 0.5

# loop over the thalamic injection experiments, counting injections per voxel
inj_counts = None
for exp_id in th_experiment_ids:

    # injection density is "how much of this voxel is filled with injection signal"
    # it ranges from 0 to 1
    inj_density, header = mcc.get_injection_density(exp_id)
    
    inj_density[inj_density < inj_threshold] = 0.0
    inj_density[inj_density >= inj_threshold] = 1.0
    
    if inj_counts is None:
        inj_counts = inj_density
    else:
        inj_counts += inj_density
        
# write the counts out to a nrrd file
nrrd.write("injection_counts.nrrd", inj_counts, header)

# write itksnap-friendly files for overlaying the injection counts with structures
mcc.get_reference_space().write_itksnap_labels("annotation.nrrd", "label_description.txt")

This saves out a counts volume and a copy of the annotation, you can load these into a 3d visualization tool like itk-SNAP in order to see the spatial distribution of thalamic injections at a glance:


(in this example, the voxel under the cursor is part of the Mediodorsal nucleus and has >=50% covered during 7 injections).

In addition to visualizing these data, you can also compute on them. For instance, if you masked a particular substructure, you could

  1. count the number of voxels within the structure that have been injected at least n times
  2. divide by the total number of voxels in the structure

in order to get a single “fraction of voxels covered across all experiments” number for the structure.