MapMyCells gives an error after Input File Validation

@danielsf @jeremyinseattle
I created an h5ad file, it is less than 500MB. I wanted to run the deep generative model to get cell type annotation from the human dataset. I still get an error after the file validation step. Can you help? I can send the file via email if you like. I created it following the guidelines.

Run ID: 1703029070342-bbfaa9ea-3c28-4c12-8cbf-bc4b2ee46905

Okay, I looked at the log files and fixed the file to resolve this and some subsequent errors. I choose Deep Generative Algorithm to extract labels for the 692,922 cells. I have healthy and diseased cell by gene data from human prefrontal cortex. The reference dataset here is from MTG of mostly Alzheimer’s patients. Is this still applicable for my data? Also, some cell types (Sst Chodl and L5 ET ) were assigned to as few as 30-35 cells, which is making me wonder if the deep generative method is the best algorithm here. Is there a recommendation?

For the same file (283 MB) the Hierarchical Mapping (and also the Correlation Mapping) algorithm errors out with:

Mapping algorithm failed because of application errors.
Unexpected e=OutOfMemoryError(‘CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 21.99 GiB total capacity; 1024 bytes already allocated; 2.56 MiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF’), type(e)=<class ‘torch.cuda.OutOfMemoryError’>, fname=‘run.py’, lineno=414

@aj95b Thank you for your interest in MapMyCells.

We’ll try to get back to you as soon as possible.

Please note that due to vacation and office closures, responses may be more delayed this time of year.

Thank you for your patience.

@aj95b we resolved an issue in our algorithm infrastructure code that led to the errors you were experiencing for Hierarchical Mapping & Correlation mapping.

Please try to rerun your files. They should now map correctly.

Regarding what constitutes reasonable results for certain cell types with different algorithms, I’ll have to defer to my scientific colleagues. They’ll return in the new year.

Thank you for your continued patience :slight_smile:

Regarding your scientific questions:

  • It should be reasonable to map data from prefrontal cortex to MTG, particularly for glia and inhibitory neurons, but also most likely for excitatory neurons. We have found that that same cell types are present in both regions, although there are some regional differences, which largely involve different genes from the cell type markers.
  • Sst Chodl and L5 ET cells are quite rare in human and are also quite distinct, so it is likely the mapping for these (and the low numbers that you find) is correct.
  • The reference here does include data from Alzheimer’s donors; however, it is originally based off a reference from healthy adult donors, and extended with aged donors with and without AD to include additional non-neuronal types not present in the younger donors. You should be safe mapping data from any adult or aged human populations.
  • At the Allen Institute we use a deep generative method for mapping human cortical data and find that it works best in our hands. That said, we’d encourage you to try multiple mapping algorithms and see what you get (e.g., for mouse whole brain we used hierarchical mapping). For the “class” and “subclass” level mappings it more than likely won’t make a difference, but for the finest-resolution “supertype” mappings, you’ll see some differences.

Let us know if you have any other questions/challenges and we’ll get back to you next year.

2 Likes

Hello,

Happy new year and thank you for making this tool! I am also experiencing an “OutOfMemoryError” after my h5ad file, which follows the cell x gene guidelines, has been validated. My file size is also only 448MB, so I’m not too sure why I’m receiving this error. I’ve also chosen Hierarchical Mapping and the 10x Whole Mouse Brain as my reference taxonomy. I can also email my file if either would help solve the issue.

Run ID: 1704388085790-e8c46ec9-e5c4-4355-abfb-d570137238d6

Thank you and I look forward to hearing from you!

Hi @kdang6 ,

Thank you for your patience. There was a bug in the infrastructure code calling the MapMyCells algorithm. That bug has now been fixed. I have tested MapMyCells with data of an appropriate size and I no longer get the OutOfMemoryError you were seeing. You should be clear to re-run your dataset. I apologize that it took us so long to fix this error.

Thank you for helping us make MapMyCells a more reliable tool.