Good afternoon,
I am trying to upload an .h5ad file of size 3.6Gb with 79,263 cells and 25,262 genes. I converted the gene names to Ensembl IDs and the data type to CSR matrix of numpy float 32 values, same as the example mouse .h5ad files. The error I’m getting is “Upload data failed” with “File Upload Error: Failed” but no further explanation. I checked all the input requirements carefully and am quite sure I’m meeting them. Any idea what else could be causing the error?
Many thanks,
Margaret
Hi @mschro
There is a subtlety to modern browsers we did not appreciate when writing our documentation. Currently, the file upload limit is 2 GB, not 4 GB. Sorry for the confusion. Let us know if you need help splitting your file into two chunks to get it down under the limit.
Again: apologies for the confusion.
Cheers,
Scott Daniel
Thanks for the quick response, Scott! One more question: have you found it makes a difference in the mapping if the number of genes is downsampled such that only the X (5,000 or 10,000) most highly variable are used? I am considering shrinking the gene list rather than cutting down on the number of cells mapped at once.
MapMyCells compares your gene list to a list of blessed marker genes and only uses genes that occur in both lists. We haven’t publicized the list of marker genes, so I would not recommend cutting genes from your dataset, just in case the genes you cut are marker genes.
I just looked over your original message. That amount of data does not seem very large to me. It feels like you should be able to get it into the 2GB limit.
If you are using python, have you tried writing to h5ad with compression, i.e.
my_anndata_object.write_h5ad(
'path/to/output.h5ad',
compression='gzip',
compression_opts=4)
Also, if your data is raw counts (i.e. just integers) you can try saving it as a matrix of unsigned 16 bit integers (np.uint16
in python). That’s a quick 25% savings there (the way CSR matrices are stored, half of your data won’t benefit from this change).
Good to know about the marker genes. I’ll keep all the genes then.
I have not tried with compression, but I did switch to uint16 and now I’m able to process my data in manageable batches of <40K cells each (it’s working quite nicely!).
Thanks for all the help!