How to find specific cell types that express a certain gene?

Hi everyone! My name is Ramsi and I am looking at the function of Opsin 3, a non-visual opsin (GPCR), in the brain. Specifically, I am looking to find cells in the hippocampus, cortex, and thalamus that express OPN3. From there, I would like to find out what other GPCRs are coexpressed with OPN3 in certain cell types. So as a summary, I would need to:

  1. Identify the cells that express OPN3
  2. In those cells, identify what other GPCRS re coexpressed

Any recommendations of how I can achieve that? Ideally I would like to download the data and analyze it. Thank you in advance!

Hi Ramsi,

I can show you how to answer these questions using the ABC Atlas GUI, while others can demonstrate how to do this programmatically.

By having OPN3 gene expression side by side with cell types in the thalamus (supertypes are shown), I can see that supertype 0662 TH Prkcd Grin2c Glut_9 has high OPN3 expression, as does supertype 0663 TH Prkcd Grin2c Glut_10. The MERFISH dots enlarge to show the same cells across different filters, by hovering over the dots with the cursor. Click here to see these images in ABC Atlas.

To see which other GPCRs are coexpressed, you can type in different GPCRs in the gene search box and color the plot for gene expression, and compare to see if expression is present in the same cell types. Click here to see the image in ABC Atlas.

Hi Ramsi,

Here is the programmatic solution that Rachel alluded to (sorry for the delay).

Opn3 is not in the gene panel used for our Whole Mouse Brain spatial transcriptomics data, so I’m going to assume you want to look at the single cell transcriptomics data (which, I’m going to say up front, is very large). If I am wrong and you do want to download and look at the spatial transcriptomics data (Opn3 is one of the genes that were imputed for our spatial data), the examples I am pointing you towards ought to work for that, too with some minor modifications.

The python API for downloading this data is provided by the abc_atlas_access library. The official documentation for that (and a very thorough set of example Jupyter notebooks) can be found at this site.

Cutting straight to the chase, this GitHub comment shows how to use abc_atlas_access to download all of our single cell transcriptomics data. Again, as pointed out in the issue, this is 400 GB of data, so make sure you have enough space for it.

If you follow the code in that GitHub post, you will have downloaded 4 million cells sampled in 32,000 genes spread out over a couple dozen .h5ad files.

Once the data is downloaded, you can get dataframes describing the genes and cells in the dataset with

gene_metadata = abc_cache.get_metadata_dataframe(
        directory='WMB-10X',
        file_name='gene')

cell_metdata = abc_cache.get_metadata_dataframe(
        directory='WMB-10X',
        file_name='cell_metadata')

You can find the path to a specific .h5ad file with

h5ad_path = abc_cache.get_data_path(directory='WMB-10Xv2', file_name='WMB-10Xv2-HY/raw')
print(h5ad_path)
/Users/scott.daniel/KnowledgeEngineering/SAC/2025/abc_cache/expression_matrices/WMB-10Xv2/20230630/WMB-10Xv2-HY-raw.h5ad

At which point, you can use anndata, numpy, and whatever other python libraries you like to explore the data.

One more caveat: these h5ad files are structured to be easy to slice by cell and difficult to slice by gene. If you end up trying to grab data for all of the cells in a subset of genes, you should expect the operation to take a while.

I know this is a somewhat high-level answer to your question. Let me know if any particular point is unclear or difficult.