SEA-AD Cell types transcriptomic comparative viewer

Hello,
I would like to access the underlying data used in making the dotplot in Allen Brain Map

It’s not practical to explore all possible genes through the web. Is there a way to download this data?

Hello,

The raw data used to generate the dotplots can be downloaded from AWS here: AWS S3 Explorer. After reading in the AnnData object with scanpy (Scanpy – Single-Cell Analysis in Python — Scanpy 1.9.2 documentation), we used the sc.get.obs_df() (scanpy.get.obs_df — Scanpy 1.9.2 documentation) to pull each gene of interest (normalized expression values are stored in .X), its subclass or supertype, and the metadata variable of interest (e.g. Cognitive Status) into a single data frame. We then grouped by the combination of metadata and subclass/supertype using pandas and calculated the mean and fraction of non-zero values within each group. Hope this helps!

Thanks so much Kyle.

This is very helpful.

Would it be possible to share a sample python script using scanpy? It will be deeply appreciated.

Sure, something like this should work for a single gene/metadata combination. You’ll have to iterate with for loops over all genes/metadata you’re interested in, so some parallelization may be beneficial (I’d recommend something like joblib.Parallel — joblib 1.3.0.dev0 documentation)

import scanpy as sc
import pandas as pd
adata = sc.read_h5ad(...)
i = "Cognitive Status"
j = "APOE"
splitby = "subclass"
df = sc.get.obs_df(adata, [i, j, splitby])
df["fraction_expressed"] = df[j] > 0
fraction = df.loc[:, ["fraction_expressed", i, splitby]].groupby([i, splitby]).mean()
expression = df.loc[df[j] != 0, [j, i, splitby]].groupby([i, splitby]).mean().fillna(0)
df = pd.concat([fraction, expression], axis=1).reset_index()

Hi Kyle,

Thanks for the sample script.

I tried using it and ran into an error.

My code:

file = “SEAAD_MTG_RNAseq_final-nuclei.2022-08-18.h5ad”

adata = sc.read_h5ad(file)

cog_status = “Cognitive Status”

gene = “APOE”

splitby = “subclass”

df = sc.get.obs_df(adata, [cog_status, gene, splitby])

Error:

KeyError: “Could not find keys ‘[‘subclass’]’ in columns of adata.obs or in adata.var_names.”

I think we saved subclass as “subclass_label” You can check with: adata.obs.columns[adata.obs.columns.str.contains("subclass")]

Hi Kyle,

The label was “Subclass” (upper case S).

I am able to run the script.

One last question: what is the unit for gene expression? The website mentions ln(UP10K+1). Could you please explain what this means and how this may relate to TPM or any other regular metric?

The expression values are natural log[[number of unique molecular identifiers (UMIs) for each gene in a given cell divided by the total number of UMIs in the same cell divided by 10,000] plus 1]. Transcripts per million and the related counts per million apply to RNAseq experiments without UMIs. 10x Genomics has more: https://kb.10xgenomics.com/hc/en-us/articles/115003684783-Should-I-calculate-TPM-RPKM-or-FPKM-instead-of-counts-for-10x-Genomics-data-

Thanks Kyle.

Hi Kyle,

One more question:

In the cognitive status in the dot plot, there are three categories: Reference, No dementia, and Dementia.

I was unable to find clear explanation on the web site for this.

What are the differences between these?