Question on Patch-seq data

We are looking at patch-seq transcriptomic data, do we need to a house keeping gene like GADPH to compare values of specific rna in different cells?

There are different normalization methods that can be used with single-cell RNA-seq data, but we typically analyze the Patch-seq transcriptomic data using a counts per million reads (CPM) normalization (rather than normalizing to specific genes or sets of genes).

You can find the CPM-normalized gene expression matrix here:
http://data.nemoarchive.org/other/AIBS/AIBS_patchseq/transcriptome/scell/SMARTseq/processed/analysis/20200611/20200513_Mouse_PatchSeq_Release_cpm.v2.csv.tar

Thank you for the quick response! I, along with some other high school students, am using your database to examine the differential expression of genes known to regulate dendrite morphology. We have looked at the CPM values for some of these genes (minibrain (dryk1a), Tubulin (tubb2a), Cut (cux1)) where we do see some differences in cells with different mophologies, however we also looked at genes that should be ‘housekeeping’ and expected to see the CPM values of the ‘housekeeping’ genes to be the same. Because this was not the case is it valid to say that any difference in the expression of genes such as minibrain (expressed as CPM) meaningful?

Yes, it seems like basically what you’re asking is how to evaluate how meaningful a given difference in gene expression is in the data set. Single-cell RNA-seq data is variable, so it’s unlikely to find exactly the same levels of different genes (even things you wouldn’t expect to be different across a given set of cells). So, some typical criteria for identifying differentially expressed genes are (1) how big is the difference in expression and (2) how often is a gene expressed at all in one set of cells vs another.

The scrattch.hicat R package has methods for identifying differentially expressed genes (and there are other packages with similar functionality in R and other languages). You could check out its vignette for some information about how it identifies the sets of differentially expressed genes used for clustering (in particular, the “Parameter Specification” section discusses several of the criteria used).

And if you want to use scratth.hicat for this, you could look at functions like de_stats_selected_pairs or de_selected_pairs to help you identify differentially expressed genes between sets of neurons that you’re interested in. You could, for example, apply those functions using sets of morphologically-relevant genes and/or housekeeping genes and see if the difference scores/statistics are higher for the morphology-related genes.