Gene ID Formatting to match Reference

mmond · October 4, 2024, 5:17pm

Hi Everyone,

I’m having an error getting my genes to map to those in the Mouse whole brain reference. I get this error: “RuntimeError: After comparing query data to reference data, no valid marker genes could be found at any level in the taxonomy.”

I’ve confirmed that the reference JSON uses Ensembl IDs and those are contained in my H5AD. However, I think my issue is arising from the way they are saved under gene_ids. The Ensembl IDs are present, but the mapper is only referencing the first column. I feel like its an easy fix but I’m stumped, any guidance?

danielsf · October 4, 2024, 5:55pm

Hello @mmond,

The MapMyCells code looks for gene identifiers in the index of the var dataframe. Right now, it looks like your dataframe is such that

var.index.values = ['Xkr4', 'Gm1992', 'Gm19938'...]

and you need to be in a state where

var.index.values = ['ENSMUSG00000051951', 'ENSMUSG00000089699', ...]

Off-the-cuff, the easiest way to make this transformation would be

import anndata
src = anndata.read_h5ad('/path/to/original_file.h5ad')

src_var = src.var
dst_var = src_var.reset_index().set_index('gene_ids')

dst = anndata.AnnData(X=src.X, obs=src.obs, var=dst_var)
dst.write_h5ad('/path/to/reformatted_file.h5ad')

I am a little confused that you are having this problem. The online MapMyCells tool has a step that infers ENSEMBL IDs from gene symbols, in the event that var is indexed on gene symbols. Are you using the online app, or running the code locally*? If you are running the online MapMyCells tool, would you mind posting the full run-ID when/if you encounter this error again. I would be fascinated to see what it is not inferring ENSEMBL IDs from your gene symbols.

*if you are running the code locally, I am not surprised. The step that transforms gene symbols to Ensembl IDs is in a separate data validator module that isn’t a default part of the pipeline when running the code locally.

mmond · October 4, 2024, 6:19pm

That seems to have done it. Running now, thanks. I am running locally, so that must be the issue.

Topic		Replies	Views
Error with marker genes mapping to the taxonomy MapMyCells	2	44	September 30, 2024
N could not be mapped to EnsemblID MapMyCells	2	42	July 17, 2024
Mapping failed because of application errors MapMyCells	12	413	February 26, 2025
Mapping algorithm failed because of application error MapMyCells	3	415	November 30, 2023
Mapping failed MapMyCells MapMyCells	7	113	July 12, 2024

Gene ID Formatting to match Reference

Related topics