M1 10X Reference does not map to GRCh38.p2

I need to align the Human M1 10X dataset to GRCh38.p13 (ensembl v98). I can do this using ENSG IDs. However, the dataset on your website has no ENSG IDs. The corresponding publication states that the M1 dataset is from GRCh38.p2 (ensembl 79). So I created an EnsDb using AnnotationHub from Homo_sapiens.GRCh38.79.gtf. I checked to see if the gene names of M1 10X are in the ensembl v79 annotations. There are 21305 genes in M1 10X that are not included in ensembl 79. Can someone please tell me why this is happening? It would be a significant improvement to provide ENSG IDs for each feature (which cellranger does) so that people can use the M1 10X dataset on data mapped to more recent references.

I found the zip file with gene info. I used biomaRt to get the ENSG ID numbers from entrez id’s. Of the 50281 genes of the M1 dataset, only 27994 ENSG ID’s were retreived from biomaRt. Would somebody please tell me what is going on?

Hello Daniel

Apologies for the delay.

The GRCh38.p2 reference is an NCBI reference, and uses entrezIDs instead of Ensembl reference ids. You will need to find a lookup table linking entrezIDs to ensembl ids.

The “Entrez gene ids” file in the “Metadata files” section of this site should work:
https://www.gencodegenes.org/human/release_22.html

Best regards,
Wayne

Wayne,

The entrez id’s are expired, making it hard to find them in other databases. I have tried several different R packages. This would be helpful if the fastq data or raw sequence files were provided to remap the transcript sequences, but I do not see these on Allen website or the links provided in the original publication.

EDIT: just found some fast q’s here: http://data.nemoarchive.org/biccn/lab/lein/lein/transcriptomic/sncell/raw/
However, I do not know if they are from the M1 data, it doesn’t say in the file names or path description

Thanks,

Dan

Hi Dan,

The files you referenced are the L5 dissected Macaque M1 .fastqs. Attached is a table of the 10xV3 M1 .fastq file locations for all species (https://github.com/AllenInstitute/BICCN_M1_Evo/blob/master/NeMO_location_all_species_FINAL.xlsx).

NeMO is currently working to aggregate these and our processed data into a single bucket, which will be located below in the future.

Nik