Open for (neuro)science symposium: Day 1, Allen Cell Types Database

Thanks for attending the webinar or viewing the video of the Open for (neuro)science symposium day 1: Allen Cell Types Database, featuring @bosiljkatasic, @nik.jorstad, @rhngla, @RebeccaH, @stripathy1, and Fenna Krienen! The speakers discussed their research using the publicly available Allen Cell Types Database.

Please feel free to chime in here with additional questions or follow-up questions to those asked at the live symposium. You can find the recording of this symposium and accompanying tutorials here.

Additional questions to those answered during the symposium Q&A:

Question: does the noise have a function in the cell or is it just a byproduct?
Bosiljka: This is a deep question. I don’t know. You may imagine that some genes need to be really tightly controlled, and noise is low, but some may be very noisy, for example to increase neuronal diversity! So, I think during evolution, some genes have been differentially ‘noised’ to suit their function.

Question: Are the subclass/cell type labels of the Allen Institute smartseq datasets informed by the 10x data and vice versa (it was mentioned they were integrated I think)?
Bosiljka: In the latest datasets, we assign these names based on integrated 10x and Smart-seq data.

Question: How do you account for extremes of cell states that lie outside training data?
Bosiljka: Not sure I fully understand the question. We define types/states (I would not want to say what they are, but we call them types for short) based on the data. And if anything stands out, we describe it as standing out.

Question: Is the dendrogram clustering completely unsupervised? Or are there some genes that are select, if so which?
Bosiljka: Dendrogram clustering is unsupervised, but genes are iteratively selected at different levels of clustering in our hierarchical + iterative clustering approach ( - see github). Genes are selected to be above technical noise and are redefined at different levels of the hierarchy during clustering.

Question: Are there plans to sample natural mouse populations in addition to common lab lines?
Bosiljka: Not now. But would be really interesting.

Question: Do you create new gene annotations based on the all the transcriptomic datasets?
Bosiljka: We should! We just didn’t have time to do that, but the single cell data is full of information that can help annotate the genome!

Question: did you analyze language-related cortical regions comparing to monkey?
Nik: For this dataset we only sampled MTG. One interesting note is that half our chimpanzee donors were language trained and the other half were not. So we are looking for transcriptomic signatures in MTG that may be associated with language training!

Question: Are you significantly better going in one direction (e.g. t-to-e) than in the other?
Rohan: In terms of just the overall reconstruction errors, we do slightly better on predicting transcriptomic data. However, in my opinion, this depends crucially on what one considers as better. Knowing the appropriate noise models and gene (or feature) sets that reflect cellular identity are a big part of this puzzle. We provide a feature-wise comparison of cross-modal reconstructions in our paper to help the user understand whether the reconstructions would be useful for the features of interest.

Question: Do you expect the trained models to generalize across cortical and sub cortical regions?
Rohan: The current model was only trained on visual cortex GABAergic neurons. The models we have at present would work for cells from other regions to the extent that cellular identity in these regions is similar. We’ve done some tests (also included in our paper) with data from motor cortex inhibitory neurons that suggest it might work well there, but more data from other regions will help us understand this better.

Question: Do you think that these modality alignment techniques could be applied to methods such as Neuropixels or calcium imaging where you have fewer dimensions to align to?
Rohan: An advantage of our optimization approach is that it can include additional modalities, while allowing flexibility for modality-specific complexity. I imagine models like LFADS (Pandarinath et al. 2018) could be adapted to help us integrate in vivo single neuron activity data as well, although one would have to be careful to extract task specific components from cell type specific contributions to variation in the data.

Question: We know sensory experience can change morphology of neurons and also function. What should one expect for transcriptomic changes?
Rohan: This is an exciting frontier. While datasets we’ve used so far constitute a snapshot of transcriptomic and electrophysiological signatures in adult mouse brains, it would be very interesting to look at datasets that focus on development and sensory experience in the future!

Question: I wanted to know how robust the algorithm was to hyperparameters and stochasticity? Also is it possible for the algorithm to align modalities when they differ in what they individually define as a cell type?
Rohan: Thanks! We’ve done tests (included in the paper) to show robustness of results to latent space dimensionality, and interpretable hyperparameters that are directly involved in the loss function. The second part of your question is at the heart of the argument about what is a “cell type”. There is variation in the data that is influenced (and possibly independent) of cellular “identity”. For example some features may be a function of the cell “state” and its recent history rather than a reflection of stereotypical properties related to its role in the functioning brain circuit. Such heterogeneity can be interpreted as a modality specific cell type, in the absence of other information. Our intuition is that including different modalities helps us regularize the representations, and consequently come up with robust definitions of what useful cell types might be.

Question: It seems a major limitation to working with human tissue - for example in the case of tissue surgically resected from an individual with an intractable condition like epilepsy - is that we can’t always choose which type of material we are getting. Based upon our understanding of cellular heterogeneity across cortical regions in the human, can existing datasets in one part of cortex still be informative for analysis in another part of cortex? Or are the regions so heterogeneous as to make it difficult to compare?
Rebecca: I think that analyses in different regions of cortex can still be informative for another part of cortex. Interneurons are generally shared across different cortical regions, for example. And while excitatory neurons do differ across regions, there is still a lot of shared gene expression signature between those types, so cell types can be matched up at a subclass level.

Question: What do you think the big picture purpose of cell types is? From a computational perspective, you only need 2 types, E and I, and there’s a lot of evidence that, once wired into recurrent network, cells should exhibit shared noise and variance.
Shreejoy: Different cell types do different things, just like members of a sports team. I think with recent methods we’ve gotten really great at articulating what the types are but are really only beginning to start to understand the function of the different cell types in health and disease.