SEA-AD snATAC-seq data

I am the Scientific Project Manager for the SEA-AD Consortium. I am posting a community question received via email related to working with SEA-AD data:

I am interested in using the SEA-AD dataset. I was wondering if you had anndata or seurat objects of processed snATAC-seq data, like the ones that you have given public access for the snRNA-seq data? Specifically, I am looking for the processed count data for the snATAC-seq as well as the snATAC-seq from the Multiome sequencing, to proceed with standard integration workflows such as those used in Signac. Could someone help me with my query? Any help is much appreciated.

We have available for the middle temporal gyrus an AnnData object on our AWS Open Data site which includes UMIs per peak for all cells (AWS S3 Explorer)
We do not yet have this available for the frontal cortex (DFC data).
Note that the MTG AnnData object has UMIs/peak for all cells, not the peaks called per subclass. For that you would need to subset the fragment file from our raw data on Sage Bionetworks AD Knowledge Portal (https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage/StudyData?Study=syn26223298) and use the cell annotations in the AnnData object to run the analysis.

Follow up Q:
I understand that you’ve divided the data of separate brain tissues into separate cell type specific objects. I would like to integrate two tissue types of one cell type. Before I do this, I couldn’t find information on the specific pre-processing steps of the DLPFC, as your paper focuses on the MTG. I was wondering if both tissues were integrated and then separated into individual objects or not? It’s so that I know whether I need to integrate them using Harmony or whether I simply just need to merge the two datasets.

A: Both datasets are mapped to a common taxonomy (e.g. Sst_25 in one corresponds to Sst_25 in the other), which we describe in the updated manuscript on bioRxiv (version 3). They were integrated with scVI for the visualization/UMAP in the ABC atlas https://knowledge.brain-map.org/data/UMSVXTDIAZTAFKGE43T/explore, but are provided as separate datasets (with separate region-specific UMAP embeddings) in our AWS bucket/on cellxgene. We do not have any joint embeddings of a specific cell type.

Hello Eitan,

I successfully downloaded the fragments file from the synapse directory. How do I annotate the files with the donor ID’s? Synapse would ideally include a manifest file following the download, but that doesn’t seem to be the case.

I’d appreciate any help you can offer.

Regards,

I have resolved this using syn.get_annotations(synapse_id). Thank you.

Hello Eitan,

I’m currently working the peak calling analysis, following the recommended approach of running the analysis per cell type; generating pseudobulk profiles and calling peaks per cell types before merging the results to get a consensus peak (as is done in the SCENIC+ tutorial). However, I have a couple of questions:

  1. Is the AnnData deposited on AWS generated by running the analysis across all cells (pseudobulk profiles across all cells), or was it generated by running the analysis as in done in the SCENIC+ tutorial? I’m asking because I may not have the necessary compute infrastructure to re-run the analysis if the latter approach was used.

  2. When generating pseudobulks per cell type, I’ve noticed that for some samples, certain cell types do not have matching barcodes in the corresponding fragment file even though they are annotated in the AnnData object. For example, when using the cell type annotations from the AnnData to run the analysis with export_pseudobulk step of the SCENIC+ pipeline, I get an error. After further manual investigation, I found that for a cell type like “Sst Chodl,” coming from sample id 114XXXXXX there are no matching barcodes in the corresponding fragment file. Do you have any idea why this might be the case?

I would greatly appreciate any insights you might have.

Thank you in advance for your assistance!