I’m using Allen Brain’s Human Cortex scRNA data as a reference set in my analysis. I’m trying to integrate my data with Allen Brain’s. However, I found the gene symbols used in Allen Brain’s data seems not compatible with mine (which is based on Ensembl GRCh38.92). Is there a way to convert Allen Brain’s scRNA gene symbols to common IDs (e.g. Ensembl gene ID, Genecode ID, etc)? And may I know which version of ID (e.g. GRCh38 version 100) they refer to?
Thanks a lot!
Information about the gene expression component of the Allen Cell Types Database is available in the “Transcriptomics Overview” documentation. For your question: “For human, raw read files were aligned to the GRCh38 human genome sequence (Genome Reference Consortium, 2011) with the RefSeq transcriptome version GRCh38.p2 (current as of 4/13/2015) and likewise updated by removing duplicate Entrez gene entries from the gtf reference file.”
Regarding gene conversion, if you want to do this one gene at a time, I would suggest using GeneCards, where external IDs are listed near the top of the page for a given gene. If you want to do this for many genes at once, I would suggest BioMart, which has both a web portal and and R language library (biomaRt) for gene ID conversion.
Really appreciate your instruction. This is very helpful and I can work from here to convert all the gene names to the same reference.