SEA-AD snRNASeq dataset format and loading in R for analysis using Seuratv5

I am the Scientific Project Manager for the SEA-AD Consortium. I am posting a community question received via email related to working with SEA-AD data:

I am trying to use the dataset: SEAAD_MTG_RNAseq_final-nuclei.2024-02-13.h5ad from Gabitto MI et al., 2023, but I have not been able to load in the data in R for analysis using Seuratv5. As an alternative I have downloaded the dataset from the cellxgene.cziscience repository, as an rds object, however, in this SeuratObject the gene names are in Ensembl preventing me from integrating it with other datasets including my own where the gene names are in Gene ID.

I am not sure if I should contact you or the authors directly, but, I was wondering if perhaps this dataset is also available in a different format like rds with Gene_ID?

Hello,

Unfortunately R is unable to read in the entire dataset as there are more than 2^32 non-zero elements in the sparse matrix and R is limited to 32-bit integers for now. I would recommend using scanpy to load the dataset, subset it (using something like adata[adata.obs[“Subclass”] = “Microglia”, :], which would select only Microglia cells), and write out the subset to load into R.

I would not recommend using the cellxgene version of the data as there is (1) considerably less metadata associated with that object and (2) the counts matrix is different because cellxgene is frozen on a different version of the genome annotation than the official 10x reference that we mapped to. If you did want to map their ensembl IDs to gene names, you would have to look at the GTF file that cellxgene uses (noted in single-cell-curation/schema/3.0.0/schema.md at main · chanzuckerberg/single-cell-curation · GitHub).

Best,
Kyle