I am quite new to RNA-seq analysis, and recently I started working with ASAP Human Postmortem-Derived Brain Sequencing Collection data. I wanted to ask two questions regarding this dataset:
-
Is there a protocol available that describes how the log2-normalised dataset (ASAP-PMDBS-10X-log2.h5ad) was generated? Was it scaling to a fixed library size followed by a log2(CPM + 1)?
-
I noticed that the metadata doesn’t include explicit cell type annotations. Are cell types meant to be inferred from the cluster numbers? If so, could you please advise on how to map clusters to specific cell types?
Thank you in advance for your help.
Hi @HG_58828,
Welcome to the community and to RNA-seq analysis! The project you reference is a recent collaboration and we’re excited to be sharing data and results at an early stage of analysis. With that in mind, the we performed only a basic log2-normalization, as you said, with no additional scaling:
values = log2(sum(cell_counts) / 10^6 + 1))
For your second question, you are correct that the clusters have not yet been formally annotated. For inferring cell types, I would recommend using Annotation Comparison Explorer, which directly allows comparison of cell type (and other) annotations for individual clusters. First, go to the website and find the correct dataset (as below). You can also filter the data if you’d like (e.g., to focus only on a single data set or to omit clusters that likely contain lower quality cells), but this probably isn’t necessary in your case.
Second, you can directly visualize the annotations for individual clusters by configuring the bottom panel as follows:
I’m showing cluster_005 as an example, but if you change what is shown in (1), you can see different clusters. You also also view additional/different metadata columns by editing what is in the filter in (2) (e.g., to see higher/lower resolution cell types). In general, for clusters found mostly in neocortex, I’d trust the “SEAAD” annotations, and for clusters found elsewhere in the brain, I’d trust the “WHB” annotations.
ACE has lots of resources for getting started on the left panel, or feel free to reach out with additional questions about it (or if this didn’t address your questions about the ASAP-PMDBS data sets).
Best,
Jeremy
Best,
Jeremy
Alternatively, if you are accessing the data throughthe abc_atlas_access tool, the python API will allow you to access dataframes that link each cell to the cell type with which it was annotated (using the MapMyCells tool).
The code you want to run is something like
>>> from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache
>>> data_dir = "/path/to/dir/where/you/are/downloading/data/"
>>> cache = AbcProjectCache.from_cache_dir(data_dir)
>>> annotation_files = cache.list_metadata_files(
directory="ASAP-PMDBS-taxonomy"
)
>>> annotation_files
['mmc_results_seaad', 'mmc_results_siletti_whb']
# get dataframe of SEA-AD annotations
>>> seaad_df = cache.get_metadata_dataframe(
directory="ASAP-PMDBS-taxonomy",
file_name="mmc_results_seaad"
)
# get dataframe of Whole Human Brain annotations
>>> whb_df = cache.get_metadata_dataframe(
directory="ASAP-PMDBS-taxonomy",
file_name="mmc_results_siletti_whb"
)
>>>
There are Jupyter notebooks discussing this process and how to interpret the data you get