Normalisation protocol and cell types

HG_58828 · July 28, 2025, 12:26pm

I am quite new to RNA-seq analysis, and recently I started working with ASAP Human Postmortem-Derived Brain Sequencing Collection data. I wanted to ask two questions regarding this dataset:

Is there a protocol available that describes how the log2-normalised dataset (ASAP-PMDBS-10X-log2.h5ad) was generated? Was it scaling to a fixed library size followed by a log2(CPM + 1)?
I noticed that the metadata doesn’t include explicit cell type annotations. Are cell types meant to be inferred from the cluster numbers? If so, could you please advise on how to map clusters to specific cell types?

Thank you in advance for your help.

jeremyinseattle · July 28, 2025, 5:37pm

Hi @HG_58828,

Welcome to the community and to RNA-seq analysis! The project you reference is a recent collaboration and we’re excited to be sharing data and results at an early stage of analysis. With that in mind, the we performed only a basic log2-normalization, as you said, with no additional scaling:

values = log2(sum(cell_counts) / 10^6 + 1))

For your second question, you are correct that the clusters have not yet been formally annotated. For inferring cell types, I would recommend using Annotation Comparison Explorer, which directly allows comparison of cell type (and other) annotations for individual clusters. First, go to the website and find the correct dataset (as below). You can also filter the data if you’d like (e.g., to focus only on a single data set or to omit clusters that likely contain lower quality cells), but this probably isn’t necessary in your case.

Second, you can directly visualize the annotations for individual clusters by configuring the bottom panel as follows:

I’m showing cluster_005 as an example, but if you change what is shown in (1), you can see different clusters. You also also view additional/different metadata columns by editing what is in the filter in (2) (e.g., to see higher/lower resolution cell types). In general, for clusters found mostly in neocortex, I’d trust the “SEAAD” annotations, and for clusters found elsewhere in the brain, I’d trust the “WHB” annotations.

ACE has lots of resources for getting started on the left panel, or feel free to reach out with additional questions about it (or if this didn’t address your questions about the ASAP-PMDBS data sets).

Best,
Jeremy

danielsf · July 28, 2025, 6:06pm

Alternatively, if you are accessing the data throughthe abc_atlas_access tool, the python API will allow you to access dataframes that link each cell to the cell type with which it was annotated (using the MapMyCells tool).

The code you want to run is something like

>>> from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache
>>> data_dir = "/path/to/dir/where/you/are/downloading/data/"
>>> cache = AbcProjectCache.from_cache_dir(data_dir)
>>> annotation_files = cache.list_metadata_files(
    directory="ASAP-PMDBS-taxonomy"
)
>>> annotation_files
['mmc_results_seaad', 'mmc_results_siletti_whb']

# get dataframe of SEA-AD annotations
>>> seaad_df = cache.get_metadata_dataframe(
    directory="ASAP-PMDBS-taxonomy",
    file_name="mmc_results_seaad"
)

# get dataframe of Whole Human Brain annotations
>>> whb_df = cache.get_metadata_dataframe(
    directory="ASAP-PMDBS-taxonomy",
    file_name="mmc_results_siletti_whb"
)
>>>

There are Jupyter notebooks discussing this process and how to interpret the data you get

Topic		Replies	Views
Whole Human Brain (WHB) 10x RNA-seq Data Celltype Annotations how-to , annotation	1	85	November 25, 2024
New Whole Human Brain Data Explorable in the ABC Atlas Allen Brain Cell (ABC) Atlas	10	381	February 4, 2026
Human Motor Cortex Single Cell RNAseq Data - is this raw/unprocessed data? Technical atlas-cell-types , atlas-human-brain-adult , transcriptomics , analysis	3	802	May 3, 2021
Is smart-seq matrix human multiple cortical areas normalized? Science atlas-cell-types , rna-seq , human	1	474	July 18, 2022
Map cells to their cell type Allen Mouse Brain Atlas celltype , how-to	4	710	September 27, 2024

Normalisation protocol and cell types

Related topics