SEA-AD Cell types transcriptomic comparative viewer

abhinav · February 10, 2023, 5:07pm

Hello,
I would like to access the underlying data used in making the dotplot in Allen Brain Map

It’s not practical to explore all possible genes through the web. Is there a way to download this data?

kyle.travaglini · February 23, 2023, 5:16pm

Hello,

The raw data used to generate the dotplots can be downloaded from AWS here: AWS S3 Explorer. After reading in the AnnData object with scanpy (Scanpy – Single-Cell Analysis in Python — Scanpy 1.9.2 documentation), we used the sc.get.obs_df() (scanpy.get.obs_df — Scanpy 1.9.2 documentation) to pull each gene of interest (normalized expression values are stored in .X), its subclass or supertype, and the metadata variable of interest (e.g. Cognitive Status) into a single data frame. We then grouped by the combination of metadata and subclass/supertype using pandas and calculated the mean and fraction of non-zero values within each group. Hope this helps!

abhinav · February 23, 2023, 5:44pm

Thanks so much Kyle.

This is very helpful.

Would it be possible to share a sample python script using scanpy? It will be deeply appreciated.

kyle.travaglini · February 23, 2023, 6:23pm

Sure, something like this should work for a single gene/metadata combination. You’ll have to iterate with for loops over all genes/metadata you’re interested in, so some parallelization may be beneficial (I’d recommend something like joblib.Parallel — joblib 1.3.0.dev0 documentation)

import scanpy as sc
import pandas as pd
adata = sc.read_h5ad(...)
i = "Cognitive Status"
j = "APOE"
splitby = "subclass"
df = sc.get.obs_df(adata, [i, j, splitby])
df["fraction_expressed"] = df[j] > 0
fraction = df.loc[:, ["fraction_expressed", i, splitby]].groupby([i, splitby]).mean()
expression = df.loc[df[j] != 0, [j, i, splitby]].groupby([i, splitby]).mean().fillna(0)
df = pd.concat([fraction, expression], axis=1).reset_index()

abhinav · February 24, 2023, 9:15pm

Hi Kyle,

Thanks for the sample script.

I tried using it and ran into an error.

My code:

file = “SEAAD_MTG_RNAseq_final-nuclei.2022-08-18.h5ad”

adata = sc.read_h5ad(file)

cog_status = “Cognitive Status”

gene = “APOE”

splitby = “subclass”

df = sc.get.obs_df(adata, [cog_status, gene, splitby])

Error:

KeyError: “Could not find keys ‘[‘subclass’]’ in columns of adata.obs or in adata.var_names.”

kyle.travaglini · February 24, 2023, 9:32pm

I think we saved subclass as “subclass_label” You can check with: adata.obs.columns[adata.obs.columns.str.contains("subclass")]

abhinav · February 25, 2023, 1:23am

Hi Kyle,

The label was “Subclass” (upper case S).

I am able to run the script.

One last question: what is the unit for gene expression? The website mentions ln(UP10K+1). Could you please explain what this means and how this may relate to TPM or any other regular metric?

kyle.travaglini · February 28, 2023, 10:46pm

The expression values are natural log[[number of unique molecular identifiers (UMIs) for each gene in a given cell divided by the total number of UMIs in the same cell divided by 10,000] plus 1]. Transcripts per million and the related counts per million apply to RNAseq experiments without UMIs. 10x Genomics has more: https://kb.10xgenomics.com/hc/en-us/articles/115003684783-Should-I-calculate-TPM-RPKM-or-FPKM-instead-of-counts-for-10x-Genomics-data-

abhinav · February 28, 2023, 11:44pm

Thanks Kyle.

abhinav · March 2, 2023, 1:58am

Hi Kyle,

One more question:

In the cognitive status in the dot plot, there are three categories: Reference, No dementia, and Dementia.

I was unable to find clear explanation on the web site for this.

What are the differences between these?

kyle.travaglini · May 8, 2023, 6:25pm

Hi again,

Apologies for missing this! Reference is applied to the young neurotypical reference donors described in https://www.biorxiv.org/content/10.1101/2022.09.19.508480v1.full. No dementia/Dementia is applied to the aged, SEA-AD cohort and represents whether they had a clinical diagnosis of dementia at the time of their death.

Best,
Kyle

ejgkelvin · May 25, 2023, 6:12pm

Hi Kyle,

This is Chen from Dana-Farber Cancer Institute.
I have a question regarding understanding the SEA-AD dot plot.
The color of the dots represents the expression level. My questions is whether that is from ‘all cells in a certain cluster’ or ‘all POSITIVE cells in a certain cluster’, here ‘POSITIVE’ means the cells expressing a certain gene.

My understanding is that is is from POSITIVE cells. Because I see in many cases, e.g. CCND1. in Reference sample, only 10% of cells express CCND1 (judging by the size of the circle). But, the level (color) is very high (level 1.5 according to color scale). IIf the expression level is from all cells, then the CCND1 level in POSITIVE cells must be super-high, so the signal from 10% of cells is so strong, that even when diluted in 90% of negative cells remains very high.

Can you please let me know if I understand correctly？

Many thanks!

Chen

Topic		Replies	Views
Mouse MERFISH - how to download transcriptomic data? Technical atlas-cell-types , transcriptomics	3	388	September 27, 2023
ABC Atlas User Guide: Mouse Whole-Brain Transcriptomic Cell Type Atlas Data and Dimensions Allen Brain Cell (ABC) Atlas	3	650	July 18, 2023
How do I download specific cell type gene expression data from WHB? Technical atlas-cell-types , atlas-human-brain-adult , transcriptomics , celltype , analysis , how-to , software , human	2	226	September 9, 2024
Reading RNA-seq data into python Technical atlas-cell-types , allensdk , rna-seq	3	951	August 11, 2020
How can you get numeric data from interactive online tool? How To transcriptomics	1	54	October 31, 2024

SEA-AD Cell types transcriptomic comparative viewer

Related topics