I’m using Allen Brain’s Human Cortex scRNA data as a reference set in my analysis. I’m trying to integrate my data with Allen Brain’s. However, I found the gene symbols used in Allen Brain’s data seems not compatible with mine (which is based on Ensembl GRCh38.92). Is there a way to convert Allen Brain’s scRNA gene symbols to common IDs (e.g. Ensembl gene ID, Genecode ID, etc)? And may I know which version of ID (e.g. GRCh38 version 100) they refer to?
Information about the gene expression component of the Allen Cell Types Database is available in the “Transcriptomics Overview” documentation. For your question: “For human, raw read files were aligned to the GRCh38 human genome sequence (Genome Reference Consortium, 2011) with the RefSeq transcriptome version GRCh38.p2 (current as of 4/13/2015) and likewise updated by removing duplicate Entrez gene entries from the gtf reference file.”
Regarding gene conversion, if you want to do this one gene at a time, I would suggest using GeneCards, where external IDs are listed near the top of the page for a given gene. If you want to do this for many genes at once, I would suggest BioMart, which has both a web portal and and R language library (biomaRt) for gene ID conversion.