I am new in using the Allen Brain Observatory, and I am trying to understand how the data are organized and how I can access them with the AllenSDK. I followed some of the instructions from this tutorial , and I got confused with the different id available. For example I used
# Download cells for a set of experiments and convert to DataFrame
cells = boc.get_cell_specimens()
cells = pd.DataFrame.from_records(cells)
and in cells dataframe, there are “cell_specimen_id” ; “experiment_container_id” ; “specimen_id”
the dataframe pvalb_exp_table, there are “id”, “donor_name” and “specimen_name”.
Can you tell me or point me to a tutorial which would indicate what’s the difference between these different ids please? I would notably be interested in having a list of cells id that I could group per Cre-line and animal.
These fields are indeed confusing. I don’t know that there is convenient documentation I can point you to, but I will try to unpack them here.
In the cell specimens table (the output of boc.get_cell_specimens()):
“cell_specimen_id” is a unique ID for the individual cell. This ID will also be in the data object for the session(s) that cell is imaged in.
“experiment_container_id” is a unique ID for the experiment container, which consists of three sessions imaged from the same group of cells. (see this diagram: Brain Observatory — Allen SDK dev documentation ). Each of these sessions also has a unique ID, and I’ll show that in a moment.
“specimen_id” is an unique ID for the subject (i.e. mouse) that the data was collected from. This is redundant with other fields.
When you use boc.get_experiment_containers(), you are getting a list of experiment containers that meet a given criteria.
“id” here is the experiment_container_id. (see comment below)
“donor_name” is the id for the subject. (This is the field that I use for identifying data by animal)
“specimen_name” combines the genotype and donor_name for a subject. This is how subjects are tracked in our internal management systems, but we often reduce this to the donor_name for simplicity.
If you were to use boc.get_ophys_experiments() to get a list of individual sessions, you will find that each entry has an “id” field, which is the id for that individual session, as well as an “experiment_container_id” field, which is the id of the experiment container that it belongs to. The fact that “id” can be different things for different queries has to do with the relational database that organizes all these fields, and (to me) is very confusing.
To get a list of cells based on Cre-line and animal, I would either:
use the cell specimens table and filter by your Cre-line and the specimen_id field.
or
2.select containers (or individual sessions) by Cre line, then by donor_name, and then get the cell_specimen_ids from the data object. This only gives you the specimens within that given session.