I’m adapting a spatial transcriptomics dataset to load into MapMyCells. It’s mouse data and I’m working the R Studio environment. I downloaded the R script but I hit a roadblock when attempting the first step for generating output data. This is the errror that I’m getting - which to the best of my knowledge seems to refer to a Python type environment -
Error in scipy.sparse.csr_matrix(count_matrix) :
could not find function “scipy.sparse.csr_matrix”
Can you please help?
I am not an R user and cannot specifically speak to this bug (though I admit you are right, that line of code does look suspiciously like python).
However…
As of this week, MapMyCells will accept CSV files as inputs. If you in put a file that looks like
,ENSMUSG00001,ENSMUSG000002, ENSMUSG000003,....
cell_label_a,1,0,3,....
cell_label_b,2,2,0,....
cell_label_c,0,0,4,....
....
where that first row is the ENSEMBL IDs of your genes and the numbers are the raw counts of each gene in each cell, MapMyCells will be able to map it.
I’ll ping some of our R mavens and see what they have to say about the bug you actually encountered.
Yep, that is python code. It looks like I missed a line in converting the script to R. This is how that section of the script should read.
# Import libraries
library(anndata)
library(Matrix)
# Convert count matrix to a CSR (row-based access) sparse matrix and save in anndata format.
count_matrix = as(as(as(count_matrix, "dMatrix"), "generalMatrix"), "RsparseMatrix")
We’ll fix the script as well. Thanks for pointing out the bug!
Thank you both,
yes I can turn our data to a straight-shot CSV without needing to do the merge of the data in R. I didn’t get it from the description on the website and I was following all the highlighted steps believing I would still need to merge count data that come with entrez_ids, with another file containing the ensemble and other identifiers…
If that fails, I will resort to what Jeremy suggested, and see if I get that to work.
Thank you so much for your help and for putting together this resource for the larger scientific community,
All the best
Maria
Hello there, after attempting both strategies, MapMyCells fails with this error Run ID: 1741898412745-2366de79-38d5-4bf3-ad10-e0d4fd4d21fc
I have mouse data from spatial transcriptomics, which I translated into EnsemblID as columns, and sample ID per row, in a csv format. Can you please help me understand where my mistake is? Thank you in advance, Maria
The error log you sent points to the failure of your CSV input. The job appears to be failing because some of your gene names end in a non-unicode space character encoded with bytes \xa0
. I’ve never see that before. This stack overflow post might provide some context for what I am talking about.
Are you able to open your CSV file with an arbitrary text editor? Again: I’m not 100% sure how this error could have been introduced into the file. I just want to figure out how fatal this error is (i.e. is this an edge case we really should be supporting).
FYI as discussed in this post, every failed MapMyCells run ought to give you the ability to download the log file containing the specific error that caused your job to fail. We’ve tried to make the error messages clear enough that users can understand what happened (but sometimes we fall short of this goal).
The solution might be as simple as re-generating your CSV file, adding a step to make sure to remove any whitespace from the ENSEMBL IDs of genes.
Thanks for the prompt reply.
I will check my csv as you instructed to see if I can get past this error.
I tried also to read into the failure log but while I see where the problem lines are, it’s beyond my ability to locate them in the csv file…unless the faulty lines match the order in which the enseblIDs are listed as column in my csv
If it helps, I am pretty sure the problem is confined to the first line of your file (the line defining the gene identifiers). Several gene identifiers have this problem, but, helpfully, it looks like the first gene identifier is one of them.
For what it’s worth, I’m pretty sure the odd character is the non-breaking space character.
Hello again
I finally got it to work!
Now though I realize that there’s a broken link
Step 2###### Choose a reference taxonomy and mapping algorithm[
Learn about available cell type references, algorithms, and output files.
](https://portal.brain-map.org/explore/cell-type-references-and-algorithms)
I was trying to put a face to a name (ie, cluster name
5224 Astro-TE NN_3 |
- |
) and figure out the exact meaning of the various column heading in my output, but like I said, the above link (learn about…) isn’t working.Can you please tell me where to find this information?
Thanks in advance
Maria
Hi @mrazzoli
Glad you got your data through! Sorry about the broken link. There are a limited number of people with permission to edit that page. We are pinging them. However, I think we can answer your question here.
If you are asking, what is the significance of the columns in the output files I got from MapMyCells, you can consult this page or these example Jupyter notebooks (here and here; the second is in support of a talk I have not yet given and is maybe not as well documented as the first).
If you are asking “what do ‘Astro’, ‘TE’ and ‘NN’ mean in ‘5224 Astro-TE NN_3’”, I think you want to go to this page, scroll down to “How to use ABC Atlas” and click the blue buttons for Whole Mouse Brain Acronyms (or Whole Human Brain Acronyms, if that becomes relevant), which will download spreadsheets defining the acronyms in our cell type names.
Does this answer your questions?
For what it’s worth:
The link you were trying to access has been superceded by this page, which now treats algorithms, taxonomies, and output files separately.
Thank you so much!
you are a great resource!
Maria