Error in data output in MapMyCells

mrazzoli · March 11, 2025, 9:14pm

I’m adapting a spatial transcriptomics dataset to load into MapMyCells. It’s mouse data and I’m working the R Studio environment. I downloaded the R script but I hit a roadblock when attempting the first step for generating output data. This is the errror that I’m getting - which to the best of my knowledge seems to refer to a Python type environment -
Error in scipy.sparse.csr_matrix(count_matrix) :
could not find function “scipy.sparse.csr_matrix”
Can you please help?

danielsf · March 11, 2025, 10:15pm

I am not an R user and cannot specifically speak to this bug (though I admit you are right, that line of code does look suspiciously like python).

However…

As of this week, MapMyCells will accept CSV files as inputs. If you in put a file that looks like

,ENSMUSG00001,ENSMUSG000002, ENSMUSG000003,....
cell_label_a,1,0,3,....
cell_label_b,2,2,0,....
cell_label_c,0,0,4,....
....

where that first row is the ENSEMBL IDs of your genes and the numbers are the raw counts of each gene in each cell, MapMyCells will be able to map it.

I’ll ping some of our R mavens and see what they have to say about the bug you actually encountered.

jeremyinseattle · March 11, 2025, 11:16pm

Yep, that is python code. It looks like I missed a line in converting the script to R. This is how that section of the script should read.

# Import libraries
library(anndata)
library(Matrix)

# Convert count matrix to a CSR (row-based access) sparse matrix and save in anndata format. 		
count_matrix = as(as(as(count_matrix, "dMatrix"), "generalMatrix"), "RsparseMatrix")

We’ll fix the script as well. Thanks for pointing out the bug!

mrazzoli · March 12, 2025, 1:46pm

Thank you both,
yes I can turn our data to a straight-shot CSV without needing to do the merge of the data in R. I didn’t get it from the description on the website and I was following all the highlighted steps believing I would still need to merge count data that come with entrez_ids, with another file containing the ensemble and other identifiers…
If that fails, I will resort to what Jeremy suggested, and see if I get that to work.
Thank you so much for your help and for putting together this resource for the larger scientific community,
All the best
Maria

mrazzoli · March 13, 2025, 8:43pm

Hello there, after attempting both strategies, MapMyCells fails with this error Run ID: 1741898412745-2366de79-38d5-4bf3-ad10-e0d4fd4d21fc
I have mouse data from spatial transcriptomics, which I translated into EnsemblID as columns, and sample ID per row, in a csv format. Can you please help me understand where my mistake is? Thank you in advance, Maria

danielsf · March 13, 2025, 9:03pm

The error log you sent points to the failure of your CSV input. The job appears to be failing because some of your gene names end in a non-unicode space character encoded with bytes \xa0. I’ve never see that before. This stack overflow post might provide some context for what I am talking about.

Are you able to open your CSV file with an arbitrary text editor? Again: I’m not 100% sure how this error could have been introduced into the file. I just want to figure out how fatal this error is (i.e. is this an edge case we really should be supporting).

FYI as discussed in this post, every failed MapMyCells run ought to give you the ability to download the log file containing the specific error that caused your job to fail. We’ve tried to make the error messages clear enough that users can understand what happened (but sometimes we fall short of this goal).

danielsf · March 13, 2025, 9:06pm

The solution might be as simple as re-generating your CSV file, adding a step to make sure to remove any whitespace from the ENSEMBL IDs of genes.

mrazzoli · March 13, 2025, 9:20pm

Thanks for the prompt reply.
I will check my csv as you instructed to see if I can get past this error.
I tried also to read into the failure log but while I see where the problem lines are, it’s beyond my ability to locate them in the csv file…unless the faulty lines match the order in which the enseblIDs are listed as column in my csv

danielsf · March 14, 2025, 7:08am

If it helps, I am pretty sure the problem is confined to the first line of your file (the line defining the gene identifiers). Several gene identifiers have this problem, but, helpfully, it looks like the first gene identifier is one of them.

For what it’s worth, I’m pretty sure the odd character is the non-breaking space character.

mrazzoli · March 18, 2025, 4:34pm

Hello again
I finally got it to work!
Now though I realize that there’s a broken link

Step 2###### Choose a reference taxonomy and mapping algorithm[

Learn about available cell type references, algorithms, and output files.

](https://portal.brain-map.org/explore/cell-type-references-and-algorithms)
I was trying to put a face to a name (ie, cluster name

5224 Astro-TE NN_3 |

|

) and figure out the exact meaning of the various column heading in my output, but like I said, the above link (learn about…) isn’t working.Can you please tell me where to find this information?
Thanks in advance
Maria

danielsf · March 18, 2025, 4:51pm

Hi @mrazzoli

Glad you got your data through! Sorry about the broken link. There are a limited number of people with permission to edit that page. We are pinging them. However, I think we can answer your question here.

If you are asking, what is the significance of the columns in the output files I got from MapMyCells, you can consult this page or these example Jupyter notebooks (here and here; the second is in support of a talk I have not yet given and is maybe not as well documented as the first).

If you are asking “what do ‘Astro’, ‘TE’ and ‘NN’ mean in ‘5224 Astro-TE NN_3’”, I think you want to go to this page, scroll down to “How to use ABC Atlas” and click the blue buttons for Whole Mouse Brain Acronyms (or Whole Human Brain Acronyms, if that becomes relevant), which will download spreadsheets defining the acronyms in our cell type names.

Does this answer your questions?

danielsf · March 18, 2025, 4:52pm

For what it’s worth:

The link you were trying to access has been superceded by this page, which now treats algorithms, taxonomies, and output files separately.

mrazzoli · March 18, 2025, 7:37pm

Thank you so much!
you are a great resource!
Maria

Mvai · April 2, 2025, 10:02pm

Hello! I tried the solution here of outputting a csv and everything seems to look like the csv example provided but I get this error log, do you have any suggestions
9.16958e-04 seconds == WARNING: Input data is in CSV format; converting to h5ad file at count_matrix-2025-04-02-21-31-24.h5ad
5.51898e+01 seconds == Mapping genes to mouse genes
5.62806e+01 seconds == an ERROR occurred ====
Traceback (most recent call last):
File cell_type_mapper/cli/validate_h5ad.py, line 228, in run
result_path, has_warnings = validate_h5ad(
File cell_type_mapper/validation/validate_h5ad.py, line 105, in validate_h5ad
result = _validate_h5ad(
File cell_type_mapper/validation/validate_h5ad.py, line 362, in _validate_h5ad
update_uns(
File cell_type_mapper/utils/anndata_utils.py, line 73, in update_uns
uns = read_uns_from_h5ad(h5ad_path)
File cell_type_mapper/utils/anndata_utils.py, line 45, in read_uns_from_h5ad
with h5py.File(h5ad_path, ‘r’) as src:
File h5py/_hl/files.py, line 564, in init
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File h5py/_hl/files.py, line 238, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 102, in h5py.h5f.open
OSError: Unable to synchronously open file (file signature not found)

5.62807e+01 seconds == CLEANING UP
5.63643e+01 seconds == Mapping algorithm failed because of application errors.
5.63643e+01 seconds == Validation error: e=OSError(‘Unable to synchronously open file (file signature not found)’), type(e)=<class ‘OSError’>, fname=‘run.py’, lineno=145
Traceback (most recent call last):
File “/apps/run.py”, line 145, in run
runner.run()
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/cli/validate_h5ad.py”, line 228, in run
result_path, has_warnings = validate_h5ad(
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/validate_h5ad.py”, line 105, in validate_h5ad
result = _validate_h5ad(
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/validate_h5ad.py”, line 362, in _validate_h5ad
update_uns(
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/utils/anndata_utils.py”, line 73, in update_uns
uns = read_uns_from_h5ad(h5ad_path)
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/utils/anndata_utils.py”, line 45, in read_uns_from_h5ad
with h5py.File(h5ad_path, ‘r’) as src:
File “/usr/local/lib/python3.10/site-packages/h5py/_hl/files.py”, line 564, in init
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File “/usr/local/lib/python3.10/site-packages/h5py/_hl/files.py”, line 238, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 102, in h5py.h5f.open
OSError: Unable to synchronously open file (file signature not found)

danielsf · April 2, 2025, 10:17pm

Can you please re-run your data and then supply us with the Run ID (see this post) of your failed run? That will allow us to find your job in the cloud backend and do a more detailed exploration of what went wrong.

We are looking for something like this

Run ID: 12346-abcde-78910-fghij-1121314

(or, if you still have the Run ID of your original failure, that is great, too)

danielsf · April 2, 2025, 10:48pm

I forgot that I get emailed whenever a MapMyCells job goes through. I was able to find your job and reproduce the bug. It appears, embarrassingly, that something goes wrong when you send through a CSV file that actually has ENSEMBL IDs (in my heady overconfidence, I spent most of my time testing CSV files with gene symbols).

I will try to figure out what is going on and issue a bugfix. I will let you know when it has been deployed.

danielsf · April 3, 2025, 2:36pm

@Mvai

Okay. I have pushed the fix. Try again. Your data should map successfully.

Thanks for catching this bug!

Mvai · April 3, 2025, 2:53pm

Thank you so much, it works now!

Topic		Replies	Views
Mapmycell pipeline for user Cell Taxonomies	3	118	May 1, 2024
Mapping failed because of application errors troubleshooting MapMyCells	4	345	October 25, 2023
N could not be mapped to EnsemblID MapMyCells	2	43	July 17, 2024
Generating input for MapMyCells from Spatial Data MapMyCells	6	523	December 6, 2023
Mapping failed MapMyCells MapMyCells	7	114	July 12, 2024

Error in data output in MapMyCells

Step 2###### Choose a reference taxonomy and mapping algorithm[

Related topics