Error to uploading my H5ad file on Mapmycell

hansol · May 28, 2024, 2:36pm

dear community
I tried to use Mapmycell
here is the error

I have RDS file after seurat (see how my file looks like, screenshot_)

Screenshot 2024-05-28 1634311121×552 25.4 KB
so I transformed it to h5ad with below commend:
SaveH5Seurat(seu_scratch_p56, filename = “seu_scratch_p56.h5Seurat”)
Convert(“seu_scratch_p56.h5Seurat”, dest = “h5ad”)
now I uploaded this h5ad file on Mapmycell
I got error : Mapping Failed

Use log files for troubleshooting MapMyCells issues. Post them in the community forums for further assistance.

Mapping algorithm failed because of application errors.

Please confirm that your input data is in cell (rows) by gene (columns) format.

Run ID: 1716906071516-d79e81b5-b84d-4e21-a70f-4b49c71bfcbb

Post in the community forum for help

may I ask your help for me to fix the error?
thanks

danielsf · May 28, 2024, 3:45pm

Hi @hansol,

It looks like the full text of your error message is

The ‘X’ field in this h5ad file lacks the ‘encoding-type’ metadata field. That field is necessary for this software to determine if this data is stored as a sparse or dense matrix. Please see the anndata specification here On-disk format — anndata 0.1.dev50+g77581cc documentation

This means that your H5AD file is missing some metadata that is a part of the AnnData specification. I suspect that, for whatever reason, the Convert function you are invoking is not writing that metadata out to the output h5ad file.

I’ve consulted with some R users on our team. They recommend you write out the H5AD file directly following the example here. Specifically, you will need to do something like

library(anndata) # Load library
# (First, extra count matrix from Seurat object and ensure it has row and column names.)
genes   <- colnames(counts) 
samples <- rownames(counts)
sparse_counts <- as(counts, "dgCMatrix")  # Convert to sparse matrix, if not already
countAD <- AnnData(X   = sparse_counts,   # Create the anndata object
                   var = data.frame(genes=genes,row.names=genes),
                   obs = data.frame(samples=samples,row.names=samples))
write_h5ad(countAD, "counts.h5ad", compression='gzip') # Write it out as h5ad

Does this help?

hansol · May 28, 2024, 4:08pm

thanks I managed almost but to run ANNDATA

python311.dll module error (python311.dll - The specified module could not be found.) happened
i am wondering as I am using R not Python… but why it caused this?
thanks a lot

danielsf · May 28, 2024, 5:29pm

anndata is a python library. Running it in R means using the R reticulate library to call python from R so that you have access to anndata’s functionality. The error message you quoted makes it sound (to me) like python did not get installed on your system.

How did you install anndata in your R environment?

Here is the documentation I usually consult on running anndata from R. It gives a few pointers about making sure everything is installed correctly. Is this what you followed (more or less)?

hansol · May 28, 2024, 5:59pm

thanks for quick reply

Yes I have Python on my system. Also I used the link you sent me to re-install Anndata

You can install anndata for R from CRAN as follows:

install.packages("anndata")

Normally, reticulate should take care of installing Miniconda and the Python anndata.

If not, try running:

reticulate::install_miniconda()
anndata::install_anndata()

still same error… do you have any other way for me to change CSV file to H5ad file to upload to your mapmycell? I have gene in column/ cell in row in CSV form

hansol · May 28, 2024, 6:38pm

countAD ← AnnData(X = sparse_counts, # Create the anndata object

               var = data.frame(genes=genes,row.names=genes),

               obs = data.frame(samples=samples,row.names=samples))

Error in py_get_attr_impl(x, name, silent) :
AttributeError: ‘module’ object has no attribute ‘remap_output_streams’
In addition: Warning message:
In py_initialize(config$python, config$libpython, config$pythonhome, :
Python 2 reached EOL on January 1, 2020. Python 2 compatability be removed in an upcoming reticulate release.

additionally… when I used another PC

hansol · May 28, 2024, 7:03pm

Also when I used Python (Jupyter note)
its not working… I think something i am stuck…
converting CSV file to H5ad, i dont know why it is so hard…

danielsf · May 28, 2024, 7:37pm

Sorry. I did not realize you were comfortable using Python.

Can you regenerate your Jupyter notebook error, but grab a screenshot that includes the entire pink window? You clipped off the important part of the error message, unfortunately.

Thanks.

hansol · May 28, 2024, 7:42pm

danielsf · May 28, 2024, 7:43pm

Though, if you don’t want to go to that trouble, there is a read_csv function in the anndata library.

If, for instance, your data is in a file called junk.txt that looks like

,g1,g2,g3
c1,1.0,2.0,3.0
c2,4.0,5.0,6.0

you could run

>>> import anndata
>>> a = anndata.read_csv('junk.txt', first_column_names=True)
>>> a.write_h5ad('junk.h5ad')
>>>

hansol · May 28, 2024, 7:46pm

danielsf · May 28, 2024, 7:52pm

the problem with your Jupyter notebook is here

The cell-by-gene array which you store as result has strings in the first column. It should just be the numerical values. The NaN values in that matrix aren’t ideal, either. Is this log-normalized data?

I’m also concerned that you created your anndata.AnnData object here

without specifying the obs or var dataframes. obs is a dataframe that identifies each cell (row) in your cell-by-gene matrix. var is a dataframe that identifies each gene (column) in your cell-by-gene matrix. It may be illuminating to look at the second “in Python” examle on this page (I know I already pointed you to this page, but the first example starts from a more complicated data model than I think you are working with). Here is the part of that page that does the simplest creation of obs and var.

The creation of the AnnData object then looks like

hansol · May 28, 2024, 8:09pm

it worked i will try to upload now! thanks

hansol · May 28, 2024, 8:57pm

this worked perfectly
so now I made

comparison = table(mapping$subclass_name,seu_scratch_p56@meta.data[[“seurat_clusters”]])
heatmap(comparison, ylab = “Mapping”, xlab = “Clustering”,margins = c(10,10))

and then now

dataSeurat ← CreateSeuratObject(counts = seu_scratch_p56@assays[[“RNA”]]@counts, meta.data = mapping)

library( Seurat)

Standard Seurat pipeline

dataSeurat ← NormalizeData(dataSeurat, verbose = FALSE)
dataSeurat ← FindVariableFeatures(dataSeurat, verbose = FALSE)
dataSeurat ← ScaleData(dataSeurat, verbose = FALSE)
dataSeurat ← RunPCA(dataSeurat, verbose = FALSE)
dataSeurat ← RunUMAP(dataSeurat, dims = 1:10, verbose = FALSE)

DimPlot(dataSeurat, reduction = “umap”, group.by=“subclass_name”, label=TRUE) + NoLegend()
I tried this and it showed the error :

Error in [.data.frame(data, , group) : undefined columns selected
In addition: Warning message:
The following requested variables were not found: subclass_name

may I ask what did I wrongly? thanks a lot

danielsf · May 28, 2024, 9:20pm

This is just a guess, but there are 4-5 lines at the top of the file that you downloaded which are just metadata. These are marked with a # at the front, so you will probably need to tell Seurat to ignore lines that start with # (or open the CSV and delete those lines).

Did that work?

hansol · May 29, 2024, 7:58am

hi Deanel, it is not about # command uncommand things… I think

in my dataseurat I do not have subclass_name

so after mapping
I have mapping data from MAP MYCELL
and as your tutorial (MapMyCells Use Case: Single Nucleus RNA-seq from Human MTG - brain-map.org) suggested I go through
especially, → #7 To visualize the mapping results, we need both the mapping results and the original query cellxgene matrix for comparison. If this is not already read in, you can read it in from the anndata object uploaded to MapMyCells.

### Since the query data corresponds to dataQC above, we will call it dataQC again
dataQC_h5ad <- read_h5ad('Hodge2019.h5ad')
dataQC <- t(as.matrix(dataQC_h5ad$X))
rownames(dataQC) <- rownames(dataQC_h5ad$var)
colnames(dataQC) <- rownames(dataQC_h5ad$obs)

here as my R did not work with anndata anyway i just used my seuratObj (original one)

#Create the Seurat object
dataSeurat <- CreateSeuratObject(counts = dataQC, meta.data = mapping)

here instead of dataQC I used seu_scratch_p56@assays[[“RNA”]]@counts
seu_scratch_p56 (myseurat obj) which I used to create anndata with below command:

gene_matrix <- seu_scratch_p56[["RNA"]]$data
write.csv(gene_matrix, file = "gene_matrix.csv", row.names = TRUE)

thanks for helping me!

hansol · May 29, 2024, 1:33pm

Instead of Anndata format, what I can extract directly from Seurat Object in order to replace here dataQC_h5ad$X (what is it inside)? thanks!

danielsf · May 29, 2024, 3:11pm

I’m an exclusive python user, so I don’t have any specific knowledge of R or Seurat. My understanding is that this line

dataSeurat <- CreateSeuratObject(counts = dataQC, meta.data = mapping)

should have joined the original cell-by-gene data with the mapping from MapMyCells. What columns are in your dataSeurat object?

hansol · May 29, 2024, 3:34pm

I dont know whether it is matter of language…and also tutorial is written based on R …

then could you let me know

what is content of this dataQC (what is in X)? dataQC ← t(as.matrix(dataQC_h5ad$X))
is there any better tutorial or vignette for Python user? After mapping how they can compare the mapping with Allen ?

thanks a lot

danielsf · May 29, 2024, 4:32pm

The X matrix is a convention in h5ad files. Specifically, in an h5ad file

X is the cell-by-gene expression matrix (just an array of floats or integers where each row is a cell and each column is a gene)
obs is a dataframe containing the metadata for each cell. Each row in this dataframe corresponds to a row in X.
var is a dataframe containing the metadata for each gene. Eachrow in this dataframe corresponds to a column in X.

I’m not sure we have any visualization tutorials written in python, for better or worse.

MapMyCells should have given you a CSV file in which each row is a cell in your dataset and each column is either a cell type assignment or a quality metric for a cell type assignment (see documentation here). I’d recommend comparing that CSV file to your original dataset using whatever data analysis tools you are most comfortable with.

Topic		Replies	Views
Malformed input h5ad file MapMyCells	1	51	February 26, 2025
Mapping failed because of application errors troubleshooting MapMyCells	4	344	October 25, 2023
Generating input for MapMyCells from Spatial Data MapMyCells	6	522	December 6, 2023
Mapping failed because of application errors MapMyCells	12	413	February 26, 2025
Mapping failed, unable to see logs MapMyCells	6	42	January 13, 2025