Thank you for the Allen Brain cell atlas data access tool! Is there a way to download a subset of the gene expression raw data? For example, looking at a specific cell type of interest such as Astrocytes. I saw that you have the capability to download a subset of genes, but I’d like to download the whole gene expression raw data for a given cell type. Thanks for your time/help.
Hi,
Good question. Unfortunately the files of the data aren’t structured in such a way that you can only download a given cell type. The human brain data are stored as two files (neurons vs non-neurons) while the mouse brain data are stored by assay and dissection. You’ll have to download the full dataset access what you are looking for.
That said, you should be able access the cells once you have the files downloaded. The genes are a bit more difficult to access hence examples, however for cell data things are more straight forward. If you follow the WHB part 2 tutorial to create the cell_extended table, you will be able to create a table of cells that are in the supercluster supercluster Astrocyte by selecting only rows with the value Astrocyte. You can then use the cell_id/indexes of the cell table you’ve just created to index into the anndata file. There’s example of loading the anndata and slicing the file in the first few cells of this notebook. You’ll just need to change the files you are loading and change the slicing in the 10th cell of that notebook (the one that creates gene_subset
) from column major [:, gene_index]
to row major [cell_index, :]
.
The operations I described above are pretty standard in Pandas, Numpy, and anndata. Check out tutorials and documentation on those libraries if you are not already familiar.
Good luck!
Yep, that is exactly what I am doing now to get a cell-type specific gene expression matrix. Just wanted to double check there wasn’t functionality to do this already. Thanks so much for your help/time!