I am fairly new to working with transcriptomics data. I am analyzing the gene expression matrix from the Mouse Whole Cortex and Hippocampus SMART-seq data. I am wondering what the matrix value (count of intron and exons) means in terms of expression level of the gene. Additionally, is there a threshold in which we can eliminate noise from the data and say “a value of X or above means the cell is expressing the gene”?
Hi @AKHILAS! Each matrix value corresponds to the number molecules of RNA measured for a given cell anywhere within the genomic boundaries of a given gene (e.g., “reads” or “fragments”). This is an unnormalized measure of gene expression which can be higher with higher sequencing depth and for longer genes. Typically we normalize this value by taking log2([counts per million]+1), although note that many computational algorithms use the count matrix itself and provide their own normalization. Unfortunately, there is not a single cutoff value that can be used to eliminate noise, although typically expression of more highly expressed genes is more reliable.
Hi Jeremy! Thank you so much for your explanation. This was very helpful in understanding the data I am working with.