Hi! I am trying to use 10X Genomics Spatial Gene Expression data as an input for the MapMyCells celltype mapping tool. I am having an issue with getting the input in the correct format.
These are the steps I am following but it still appears my .h5ad file is not in the correct format. Any recommendations? Thanks!
sceasy::convertFormat(seurat_spatial_object, from=“seurat”, to=“anndata”, assay = “Spatial”,
outFile=‘CCIvSham.h5ad’)
my_too_large_adata = read_h5ad(‘~/filename.h5ad’)
minimal_adata = my_too_large_adata$X
ad ← AnnData(
X = minimal_adata,
obs = data.frame(group = rownames(minimal_adata), row.names = rownames(minimal_adata)),
var = data.frame(type = colnames(minimal_adata), row.names = colnames(minimal_adata))
)
write_h5ad(ad,‘~file.h5ad’,
compression=‘gzip’)
ad
AnnData object with n_obs × n_vars = 34217 × 32272
obs: ‘group’
var: ‘type’
When I upload this final .h5ad file it fails at recognizing the input file due to incorrect formatting.
@skouneli thank you for your interest in MapMyCells.
It appears you’re doing the following:
- You start off with a Seurat object.
- You convert it to an h5ad file.
- You reduce the size of said h5ad file.
Could you confirm that you’ve transposed your Seurat object, if necessary, so that rows are cells and columns are genes? (The opposite tends to be true for default Seurat objects).
Related useful documentation can be found here “Creating h5ad input files in R & Python > 1.) If your data is stored as a csv file with sample names as columns and gene names in the first row:”
Alternatively, have you tried, whether the manual copy over of gene and cell identifiers that you excluded from your write-up of the “Reducing size of h5ad files in R & Python” offered any resolution?
Also: if it fails again, can you post your run ID in this chat? The run ID should be a semi-random string like “1701448506122-6fa0c2b5-88b5-45ea-b02f-4071ea7bfe87”; it will show up in the “your run has failed” message on the MapMyCells website. Having that information will allow us to get the detailed error message and provide a more precise diagnosis of what is happening. Thanks.
The RunID is 1701802295018-6aa01aea-f605-4e28-a148-2afc22b1a979
And it looks like my column names are genes and my rows are the spot/“cell” identifiers (see picture)
Thanks for your help!
I downloaded your input file. I was unable to open it with the anndata
library. I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/allen/aibs/technology/danielsf/miniconda3_230814/envs/cell_type_mapper/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 197, in read_h5ad
return read_h5ad_backed(filename, mode)
File "/allen/aibs/technology/danielsf/miniconda3_230814/envs/cell_type_mapper/lib/python3.9/site-packages/anndata/_io/h5ad.py", line 128, in read_h5ad_backed
f = h5py.File(filename, mode)
File "/allen/aibs/technology/danielsf/miniconda3_230814/envs/cell_type_mapper/lib/python3.9/site-packages/h5py/_hl/files.py", line 567, in __init__
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File "/allen/aibs/technology/danielsf/miniconda3_230814/envs/cell_type_mapper/lib/python3.9/site-packages/h5py/_hl/files.py", line 231, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 106, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 101677282, sblock->base_addr = 0, stored_eof = 369387362)
which, based on cursory googling, is an indication that the file is somehow corrupted.
Possible causes I can think of:
-
Maybe the buffer had not been totally flushed when you saved the file to disk (the anndata
library doesn’t always flush and close file objects when you think it ought to)
-
File upload might have terminated prematurely.
Can you please:
-
Close out of your R/python session. Start a new session, and verify that you can open this file on your local machine?
-
assuming that (1) works fine, just try again in case something when wrong during file upload the first time?
Thanks for all the recommendations. I did 1 and 2 and I am still getting an error on input file not in correct format (Run ID: 1701889390195-a3a109e1-3a09-4f6d-839d-c151566963c2)
A user above suggested manually copying over but I am not sure how to do that with my spatial Seurat object.
Thanks for any help/insights.
Same error on our end. h5py and anndata believe that your file is corrupted.
How large is the file on your system? The file that got uploaded is 166 MB