Extracting data for all genes from an ROI

I was writing to ask about how we might obtain a full gene expression dataset for a handful of brain regions from the adult mouse (e.g. the DR and PAG).

We’re interested in getting data by anatomical region as opposed to voxel. Because the ABA interface already permits looking at differentially expressed genes between regions, it would be great if we could obtain the data used as the basis for that tool. My understanding is that the API does have quantified data by structure that we can access.

This information would be incredibly valuable to us. For instance, we’d want to leverage the expression data to refine the way we classify cell types in single-cell RNA sequencing.

A good starting point would be Structure Unionization (see this post). This gets the converse of what you want: all regions for one gene, but does wrap up by region as requested. I’m not sure about getting all gene data for a few regions, but maybe someone else on here knows.

Here is the API call that would get you structure unionization data by region.

http://api.brain-map.org/api/v2/data/query.json?criteria=model::StructureUnionize,rma::criteria,structure[acronym$eq'DR']

This returns an entry for each of the tens of thousands of section data sets (experiments) in the atlas, so it is likely that you will want to refine it further to get just what you need. There is lots of documentation available such as the links in Jeremy’s post above and these:
http://help.brain-map.org/display/api/Allen%2BBrain%2BAtlas%2BAPI
http://help.brain-map.org/display/api/RESTful+Model+Access+(RMA)

Thank you! In this JSON, is there information about the gene being looked at in each section dataset? Or am I misunderstanding what you’re doing here? Really appreciate the help.

No, as it is written that information is not there, but it could be. Perhaps you can more explicitly describe what it is you need. So far we have identified expression per structure with gene information added. Anything else?

This is very close! The last step would be find out what gene is being assayed in each of the thousands of “section_data_set_ids” (and perhaps other information about each experiment) so that we can look at gene expression in the region, and potentially restrict our analyses to certain experiments.

How about this one?
http://api.brain-map.org/api/v2/data/query.json?criteria=model::SectionDataSet,rma::criteria,structure_unionizes(structure[acronym$eq'DR']),genes,rma::include,genes,structure_unionizes

So this seems great! I’m assuming these are the Structure Unionize attributes for all the experiments specifically in DR?

Just a couple of quick q’s: I appended “rma::options[num_rows$eqall]” to the end of that last query, but both the max number of displayed rows (25971) and total rows (26038) are fewer than the total (26069) in the first query that you sent me:
http://api.brain-map.org/api/v2/data/query.json?criteria=model::StructureUnionize,rma::criteria,structure[acronym$eq’DR’]

Any idea what might be going on here, with both the row display and overall number of rows?

Adding these additional fields is the equivalent of doing a database “INNER JOIN”. It is likely that not every record will have a corresponding record in another table.

If you are really interested in tracking it down further, I am going to refer you to the API documentation to write queries that return more information that you can use to help your investigation.