How gene-level RPKM is calculated (Brainspan)

JacquelineHeighway · June 24, 2020, 5:11am

Hi Allen Brian Map community,

I am trying to understand how exon- and gene-level RPKM has been calculated for the brainspan project. I guess I am after some clarification, please correct me if I am wrong with the following.

For exon-level expression, a program will count the number of reads mapping to an exon, then you do the RPKM calculation for the exon. In this way, reads that cross an exon-exon boundary (eg between exons 1 and 2 of gene A) will be counted twice, once in exon 1, once in exon 2.

For gene-level expression, a program will count the number of reads mapping to gene A, then you do the RPKM calculation as before. In this way, a read will only be counted once.

If you do the normalisation as suggested above for both gene- and exon-level expression, there is a bias for exons to have higher RPKM than the genes they come from (as it is possible for a single read to map to multiple exons). This is especially a concern for genes which contain many short exons. Yet, when I look at the data, the bias is not there (eg exon 1 RPKM is approx equal to gene A RPKM). I am wondering, how is this possible? To calculate the gene-level RPKM, did you simply add the counts from the list of exons in that gene?

Thank you for your time

Kind regards,
Jacqui

jeremyinseattle · July 24, 2020, 5:12pm

Hi @JacquelineHeighway. Sorry for the delayed response. I would encourage you to visit the Documentation link on the main BrainSpan website for “Developmental Transcriptome,” as all the information you requested should be included there. I say that I would encourage that because it sounds like you already did and that is where you question comes from. I would suggest contacting the Sestan lab at Yale if you still have questions after reviewing the documentation, as they (along with several other collaborators listed on the main BrainSpan website) were collaborators on this project and were the ones that generated these data files. What I can say is that in most normalization methods, reads are scaled when they cross exon boundaries such that if half the read is in each exon, each exon will get half a read for the RPKM calculation rather than a full read. Such a scaling avoids the issue you bring up.

Topic		Replies	Views
Normalization steps preformed on the BrainSpan RPKM gene-level RNA-seq data Technical atlas-human-brain-developing	4	370	November 27, 2024
Gene length used for RPKM calculation Technical atlas-human-brain-developing , analysis , rna-seq	1	1069	February 13, 2020
Is RPKM information included in the scRNA seq dataset? Allen Mouse Brain Atlas analysis	1	144	August 15, 2024
BrainSpan Atlas Human Developmental Transcriptome: Were all the brain samples RNA sequenced in the same experiment?	2	504	January 4, 2022
Does the BrainSpan RNA-Seq data cover only protein-coding genes? atlas-human-brain-developing , transcriptomics , rna-seq	1	734	November 15, 2021

How gene-level RPKM is calculated (Brainspan)

Related topics