A9 all nuclei consolidated dataset

Dear team,
I noticed that the SEA-AD consortium recently released an updated version of the consolidated A9 dataset (https://sea-ad-single-cell-profiling.s3.amazonaws.com/PFC/RNAseq/SEAAD_DFC_RNAseq_all-nuclei.2026-06-22.h5ad). The cell counts differ across donors—particularly H20.33.001, H20.33.005, H20.33.008, H20.33.016, H20.33.019, H20.33.026, and H20.33.027—compared with the 2024 dataset. In all cases except H20.33.016, the new dataset appears to include all previously available cells plus additional cells that were not present before. I also noticed that the updated dataset is missing three useful metadata columns: “Class Confidence,” “Sub Class Confidence,” and “Supertype Confidence.” Could you help me understand the differences between these datasets and advise which version is recommended for use?

Thanks in advance.

Best regards,
Rajesh

Hello @Rajesh ,

The new objects should be used. They differ as we re-analyzed the MTG and DFC datasets in the context of the additional allo- and neo-cortical regions. I can share the manuscript with you that describes the changes when it comes out, but the codebase is here: GitHub - AllenInstitute/SEA-AD_Multiregion_2026 · GitHub . As noted in the Changelog, the cell type taxonomy is the same except for expansion to include types found outside of MTG. As we re-mapped the data, there are going to be small numbers of cells that switch types (mostly those on the boarder between two groups) as well as shift between pass/fail in quality control cutoffs.

While compiling the dataset, we realized a small fraction of libraries (12 of ~900) appeared swapped after looking at the whole genome sequencing and the variants in RNA reads. They are here: SEA-AD_Multiregion_2026/00_data_curation/00_single_nucleus_multiome/06_finalize_data_assets/mixup_investigation_02-14-2025.csv at main · AllenInstitute/SEA-AD_Multiregion_2026 · GitHub . The new MTG and DFC objects have swapped these libraries to their correct donors/brain regions. The others were caught prior to release.

Finally, we decided to remove Class/Subclass/Supertype confidence from the main objects as we have found they are not particularly well calibrated (they do tend to be lower in mis-called types in reference benchmarking, but cell type abundance in the reference has a big influence on the final number). They are still available for those who want to find them here: AWS S3 Explorer and AWS S3 Explorer in the iterative_scANVI output files.

Best, Kyle

Dear Kyle,

Thanks for the prompt and detailed response.

I have a related query: Is the 2024 consolidated ATAC MTG object still good to use? Will you be releasing a consolidated PFC ATAC object as well?

Thanks.

Best regards,

Rajesh

Dear Kyle,

Thanks for the response. I have a related query: I see that the iterative_scANVI file is still labelled “2024” even though it is within the 2026 folder and the time stamp shows that it has been recently modified. Should I assume that this file has been updated for the corrected library information?

Thanks.

Best regards,

Rajesh