I am new in using the Allen Brain Observatory, and I am trying to understand how the data are organized and how I can access them with the AllenSDK. I followed some of the instructions from this tutorial , and I got confused with the different id available. For example I used
# Download cells for a set of experiments and convert to DataFrame
cells = boc.get_cell_specimens()
cells = pd.DataFrame.from_records(cells)
and in cells dataframe, there are “cell_specimen_id” ; “experiment_container_id” ; “specimen_id”
the dataframe pvalb_exp_table, there are “id”, “donor_name” and “specimen_name”.
Can you tell me or point me to a tutorial which would indicate what’s the difference between these different ids please? I would notably be interested in having a list of cells id that I could group per Cre-line and animal.
These fields are indeed confusing. I don’t know that there is convenient documentation I can point you to, but I will try to unpack them here.
In the cell specimens table (the output of boc.get_cell_specimens()):
“cell_specimen_id” is a unique ID for the individual cell. This ID will also be in the data object for the session(s) that cell is imaged in.
“experiment_container_id” is a unique ID for the experiment container, which consists of three sessions imaged from the same group of cells. (see this diagram: Brain Observatory — Allen SDK dev documentation ). Each of these sessions also has a unique ID, and I’ll show that in a moment.
“specimen_id” is an unique ID for the subject (i.e. mouse) that the data was collected from. This is redundant with other fields.
When you use boc.get_experiment_containers(), you are getting a list of experiment containers that meet a given criteria.
“id” here is the experiment_container_id. (see comment below)
“donor_name” is the id for the subject. (This is the field that I use for identifying data by animal)
“specimen_name” combines the genotype and donor_name for a subject. This is how subjects are tracked in our internal management systems, but we often reduce this to the donor_name for simplicity.
If you were to use boc.get_ophys_experiments() to get a list of individual sessions, you will find that each entry has an “id” field, which is the id for that individual session, as well as an “experiment_container_id” field, which is the id of the experiment container that it belongs to. The fact that “id” can be different things for different queries has to do with the relational database that organizes all these fields, and (to me) is very confusing.
To get a list of cells based on Cre-line and animal, I would either:
use the cell specimens table and filter by your Cre-line and the specimen_id field.
or
2.select containers (or individual sessions) by Cre line, then by donor_name, and then get the cell_specimen_ids from the data object. This only gives you the specimens within that given session.
and in summary_stats, notably for container 670396939, (PValb in VISp) I get 13 for N_cells.
However, when I try to extract the cell traces for this container:
experiment_container_id = 670396939
###For each container there are 3 imaging session, each one having its own data file, so we need to specify which datafile/session we want
#We can get for a given container_id the list of the imaging session with the function boc.get_ophys_experiments
sessions = boc.get_ophys_experiments(experiment_container_ids =[experiment_container_id])
sessions_df = pd.DataFrame(sessions)
session_id_natural_scene = boc.get_ophys_experiments(experiment_container_ids = [experiment_container_id], stimuli = ["natural_scenes"])[0]['id']
###GET DATA
#get_ophys_experiment_data returns the data object giving us access to the NWB file for a SINGLE IMAGING SESSION
data_set = boc.get_ophys_experiment_data(session_id_natural_scene)
#From there we can access different traces or data (ROI, maximum projection, DF/F traces...)
#the function get_dff_traces returns 2 objects: the time_steps, and the dff traces
ts, dff = data_set.get_dff_traces()
I only get 7 traces.
I therefore understand that for a given container, the sessions A, B and C corresponding to different experimental protocols do not record the same cell, am I right?
Sorry for the delay - just getting back from being out for a short bit. This sounds completely right: there were only 7 cells identified in this specific session, but 13 identified across all three sessions. When I have a minute later today I’ll confirm just to be sure.