Working with imputed gene dataset

Hello. I am trying to work with annotated data containing imputed gene expression values, as found here:

print(abc_cache.list_data_files(‘MERFISH-C57BL6J-638850-imputed’))

[‘C57BL6J-638850-imputed/log2’]

However, this dataset appears to contain cells for all 69 sections, is almost 50GB in size, and is very slow to work with. I am looking for files broken down by individual sections, which for non-imputed genes is located here: abc_cache.list_data_files(‘MERFISH-C57BL6J-638850-sections’)), but I am unable to find any such thing for imputed gene expressions.

Does such a thing exist? Am I looking in the wrong place? I can theoretically generate the files myself, but it would take hours, and before doing that, I wanted to make sure I am not missing something simple.

Thanks again!

  • Tom

Hi Tom,

For better or worse, the data you are working with does not exist in a per-section form. The single monolithic file is all we have for the Yao et al. 2023 MERFISH data with imputed gene expression.

Cheers,

Scott

Thanks Scott. Based on that feedback, I bit the bullet and went ahead and wrote code to generate per-section files on my own. I’m still learning about the AnnData format, so had to read documentation on that first. It seems to work now, and takes about 10 minutes to generate file for each individual section.

  • Tom