There seem to be a few different ways to get to the RNA-seq data (the Cell Features Search Tool, the complete datasets, or even through GEO or even through BICCN , so I’m hoping someone can shed a bit of light on the best way to do the following:
For a subset of genes (~10), identify whether expression is different in inhibitory (say, all GAD-cre lines) and excitatory (e.g Emx-Cre) cells, across all brain regions. Is it possible to run such a differential comparison using the online tools? Or, do we need to work with the raw data (fpkm_table.csv?) and compute the differential expression ourselves (e.g., with DESeq)?
(This is for a project in my class, and I’d like to identify the most straightforward way for students to do this! We’re working in Python, and so any tools or pointers to work with this data in Python would be much appreciated!)
Hi @ajuavinett,
This is a complicated question, but I’ll do my best to answer. In short, we current have several different data sets which have been put in different places with different web tools, depending on their source of origin, but we are working to centralize all of this in the relatively near future. For now, here are your options:
- Doing this differential search in mouse V1/ALM or human MTG is relatively straightforward from these links. This does not include all brain regions, but for the question of inhibitory vs. excitatory genes will likely provide a very similar answer.
- Download the data from the complete data sets and perform such an analysis yourself. Most people at the Allen Institute use R, and there are several methods for differential expression there (like DESeq). This likely can be done in python as well, but I don’t know how.
- Download data from BICCN and perform the analysis yourself. The advantage of this is that there is a lot more data from several labs, but I would NOT suggest using it for your purpose because most of these data sets do not have cell type assignments, which would make the analysis much more complicated.
I would not suggest getting data from GEO unless you have questions related to differential splicing or something else where the raw data (FASTQ files, BAM files) are needed. Let us know if you’d like additional details about any of these options and best of luck in your course. I would also not suggest using the Cell Features Search Tool for this question yet, as we currently don’t have differential expression capabilities (but stay tuned…).
Jeremy
Thanks @jeremyinseattle! Yeah, I realize that this is a rather involved question, and I’m a bit new to transcriptomics, so I appreciate your patience.
I’m wondering how valid it is to work with the calculated medians (from here) for the mouse cortex & hippocampus data as a way of comparing expression of the inhibitory and excitatory clusters for specific genes. For example, if we pulled out all of the excitatory clusters vs. the inhibitory clusters for a specific gene, would that be a valid sort of ‘meta comparison’?
Yes, that is a valid meta-comparison. I had forgotten that those files were available. Depending on how specific of a marker you want you could compare the average median value in the GABAergic set of clusters vs. glutamatergic set of clusters, or you could find more specific markers by comparing the minimum median value in GABAergic vs. the maximum median value in glutamatergic clusters. I more intermediate level of specificity could be obtained by comparing quantiles (e.g., 10th quanitile GABAergic vs. 90th quantile glutamatergic). To get glutatmatergic markers, swap the values above. However you do this analysis, you should find a LOT of genes (e.g., dozens to thousands, depending on thresholding).
1 Like