Run failed with .csv.gz input file, access denied to output log

I recently submitted a .csv.gz input file, with cells as rows and genes as columns, to MapMyCells. The run failed (run ID 1771365448267-9bf9ae75-bafb-475c-8529-785359b31b74), and when I tried to download the logs, it opened a new tab with this message (after I disabled my popup blocker):

“This XML file does not appear to have any style information associated with it. The document tree is shown below.

AccessDenied

Access Denied

I copied a snippet of the count matrix .csv file below, from before I gzipped it (it is now .csv.gz). I tried re-running it, as this worked for another user, but this also failed and I got the same access denied message on the logs (run ID 1771365448267-9bf9ae75-bafb-475c-8529-785359b31b74 for the second try).

                            MIR1302-2HG FAM138A OR4F5 AL627309.1 AL627309.3

EZ087_seurat_AAACAGCCATAGCGAG.1 0 0 0 0 0
EZ087_seurat_AAACATGCAAGTGTCC.1 0 0 0 0 0
EZ087_seurat_AAACATGCAATAATGG.1 0 0 0 1 0
EZ087_seurat_AAACATGCAATGAATG.1 0 0 0 0 0
EZ087_seurat_AAACATGCACCTGCTC.1 0 0 0 0 0
EZ087_seurat_AAACCAACAAACCTTG.1 0 0 0 0 0
AL627309.4
EZ087_seurat_AAACAGCCATAGCGAG.1 0
EZ087_seurat_AAACATGCAAGTGTCC.1 0
EZ087_seurat_AAACATGCAATAATGG.1 0
EZ087_seurat_AAACATGCAATGAATG.1 0
EZ087_seurat_AAACATGCACCTGCTC.1 0
EZ087_seurat_AAACCAACAAACCTTG.1 0

It’s possible that my .csv.gz file has quotation marks for the column giving the cell names, but a previous post led me to understand that MapMyCells would be configured to allow this.

For some reason the first five column names of the .csv file snipped copied in as code, but they should be text.

Hi @ddressman91

I acknowledge that the answer I’m about to give you is not very satisfactory.

Failing and then not allowing you to download the logs is generally a sign that the cloud instance running MapMyCells ran out of memory. Crashing because of an “out of memory” error does not give the machine a chance to write any output for download.

Something about your data (~131,000 cells by ~27,000 genes) provoked such a crash. Memory use in MapMyCells is a bit complicated. It is governed by the number of processors assigned to a job, the number of cells each processor is told to map at a time, and the number of marker genes in the taxonomy (this is probably what got you; the Whole Human Brain taxonomy involves more marker genes than Whole Mouse Brain).

Anyway: I downloaded your data, grabbed the first 50,000 cells, wrote them to a CSV file, gzipped it, and was able to successfully map them. I would recommend that you just split up your data into slightly smaller chunks and map each chunk individually (again: 50,000 cells seems fine).

We probably need to do some work on MapMyCells so that it manages memory a little better (dynamically setting the number of cells processed by each processor based on how many genes are in the data). That will take some time. In the meantime: smaller sized inputs are your friend.

I hope that gets you unblocked.

Cheers,

Scott

1 Like

That’s fine, thanks for the info. I was going by the Gb limit for input files, but previous to making my full gzipped file, I split the count matrix into 4 parts, so I will probably submit those individually and try again.

Hi Scott,

Chunks 1-3 ran fine, but I got an error on chunk 4. I pasted the text of the validation log below if you can’t see it, run ID is 1771449255044-12064d00-2ec9-4c2d-a3f7-0ceb6ad26034. It looks like there’s an NA somewhere (or several of them) that is causing the problem, but I checked the cell names and the count matrix in R and couldn’t find any NA’s. Could NA’s be introduced in the conversion to an .h5ad file or conversion between gene symbols and ENSEMBL IDs?

Log text:

5.10693e-04 seconds == WARNING: Input data is in CSV format; converting to h5ad file at MSA.ctrl.PD.snRNAseq.countmatrix.slice4.csv-2026-02-18-21-19-51.h5ad
7.94722e+01 seconds == an ERROR occurred ====
Traceback (most recent call last):
File cell_type_mapper/validation/csv_utils.py, line 99, in convert_csv_to_h5ad
adata = anndata.io.read_csv(
File anndata/_io/read.py, line 49, in read_csv
return read_text(filename, delimiter, first_column_names, dtype)
File anndata/_io/read.py, line 351, in read_text
return _read_text(f, delimiter, first_column_names, dtype)
File anndata/_io/read.py, line 440, in _read_text
data.append(np.array(line_list[1:], dtype=dtype))
ValueError: could not convert string to float: ‘NA’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File cell_type_mapper/cli/validate_h5ad.py, line 242, in run
result_path, has_warnings = validate_h5ad(
File cell_type_mapper/validation/validate_h5ad.py, line 97, in validate_h5ad
result = _validate_h5ad(
File cell_type_mapper/validation/validate_h5ad.py, line 136, in _validate_h5ad
write_to_new_path) = convert_csv_to_h5ad(
File cell_type_mapper/validation/csv_utils.py, line 110, in convert_csv_to_h5ad
raise RuntimeError(full_msg)
RuntimeError: =======An error occurred when reading your CSV with anndata:
Traceback (most recent call last):
File cell_type_mapper/validation/csv_utils.py, line 99, in convert_csv_to_h5ad
adata = anndata.io.read_csv(
File anndata/_io/read.py, line 49, in read_csv
return read_text(filename, delimiter, first_column_names, dtype)
File anndata/_io/read.py, line 351, in read_text
return _read_text(f, delimiter, first_column_names, dtype)
File anndata/_io/read.py, line 440, in _read_text
data.append(np.array(line_list[1:], dtype=dtype))
ValueError: could not convert string to float: ‘NA’

Please confirm that your CSV is a table in which each row is a cell and each column is a gene.

7.94722e+01 seconds == CLEANING UP
7.95488e+01 seconds == Mapping algorithm failed because of application errors.
7.95488e+01 seconds == Validation error: e=RuntimeError(‘=======An error occurred when reading your CSV with anndata:\nTraceback (most recent call last):\n File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/csv_utils.py”, line 99, in convert_csv_to_h5ad\n adata = anndata.io.read_csv(\n File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 49, in read_csv\n return read_text(filename, delimiter, first_column_names, dtype)\n File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 351, in read_text\n return _read_text(f, delimiter, first_column_names, dtype)\n File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 440, in _read_text\n data.append(np.array(line_list[1:], dtype=dtype))\nValueError: could not convert string to float: ‘NA’\n\nPlease confirm that your CSV is a table in which each row is a cell and each column is a gene.’), type(e)=<class ‘RuntimeError’>, fname=‘run.py’, lineno=153
Traceback (most recent call last):
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/csv_utils.py”, line 99, in convert_csv_to_h5ad
adata = anndata.io.read_csv(
File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 49, in read_csv
return read_text(filename, delimiter, first_column_names, dtype)
File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 351, in read_text
return _read_text(f, delimiter, first_column_names, dtype)
File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 440, in _read_text
data.append(np.array(line_list[1:], dtype=dtype))
ValueError: could not convert string to float: ‘NA’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/apps/run.py”, line 153, in run
runner.run()
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/cli/validate_h5ad.py”, line 242, in run
result_path, has_warnings = validate_h5ad(
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/validate_h5ad.py”, line 97, in validate_h5ad
result = _validate_h5ad(
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/validate_h5ad.py”, line 136, in _validate_h5ad
write_to_new_path) = convert_csv_to_h5ad(
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/csv_utils.py”, line 110, in convert_csv_to_h5ad
raise RuntimeError(full_msg)
RuntimeError: =======An error occurred when reading your CSV with anndata:
Traceback (most recent call last):
File “/usr/local/lib/python3.10/site-packages/cell_type_mapper/validation/csv_utils.py”, line 99, in convert_csv_to_h5ad
adata = anndata.io.read_csv(
File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 49, in read_csv
return read_text(filename, delimiter, first_column_names, dtype)
File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 351, in read_text
return _read_text(f, delimiter, first_column_names, dtype)
File “/usr/local/lib/python3.10/site-packages/anndata/_io/read.py”, line 440, in _read_text
data.append(np.array(line_list[1:], dtype=dtype))
ValueError: could not convert string to float: ‘NA’

Please confirm that your CSV is a table in which each row is a cell and each column is a gene.

Hi,

I downloaded your slice4.csv.gz file. For some reason, the last line of your CSV is entirely made up of NA (even the cell label is "NA”). I wonder if R “just knows” to skip that line, which is why it didn’t turn up in your debugging efforts. Anyway: clip that line off and you should be fine.

Cheers,

Scott

1 Like