I read the article “A question regarding data processing in the article ‘Consistent cross-modal identification of cortical neurons with coupled autoencoders.’” I am unclear about how the 24 electrophysiological features mentioned in the article are obtained through the ipfx package. I would like to know how to derive these features from the NWB data file using the ipfx package and whether they correspond to some variables in the output.json file. Thank you for your help!
Thank you for your interest in our work.
Features are calculated using the feature calculation submodules under IPFX: https://ipfx.readthedocs.io/en/latest/ipfx.html
Example feature calculations from nwb data are provided here: https://ipfx.readthedocs.io/en/latest/auto_examples/index.html
Feature names used in the paper are identical to the outputs of the corresponding calculation submodules.
Thank you very much for your response. I ran the “All analysis” part of the Stimulus-specific Analysis from Gallery of Examples — ipfx 1.0.8 documentation. Based on the GitHub script data_proc_E.ipynb
from the article, I believe the following 24 features should be included:
ap_1_threshold_v_short_square
ap_1_peak_v_short_square
ap_1_upstroke_short_square
ap_1_downstroke_short_square
ap_1_upstroke_downstroke_ratio_short_square
ap_1_width_short_square
ap_1_fast_trough_v_short_square
short_square_current
input_resistance
tau 7 v_baseline
sag_nearest_minus_100
sag_measured_at
rheobase_i
ap_1_threshold_v_0_long_square
ap_1_peak_v_0_long_square
ap_1_upstroke_0_long_square
ap_1_downstroke_0_long_square
ap_1_upstroke_downstroke_ratio_0_long_square
ap_1_width_0_long_square
ap_1_fast_trough_v_0_long_square
avg_rate_0_long_square
latency_0_long_square
stimulus_amplitude_0_long_square .
When comparing these features with my obtained results, I encountered the following issues:
- For some features, such as
ap_1_threshold_v_short_square
, I found a variable calledthreshold_v_short_square
in the output’s"cell_record"
section, but it doesn’t includeap_1
. Are these two equivalent? - For features like
input_resistance
, I found many sweeps in thecell_features
section that contain this value. Which sweep should I choose? - For certain values such as
sag_nearest_minus_100
,sag_measured_at
,ap_1_upstroke_short_square
, andap_1_width_short_square
, I couldn’t find exact matches in the results.
The data_proc_E.ipynb
script can be found at(coupledAE-patchseq/notebooks at cplAE-TE · AllenInstitute/coupledAE-patchseq · GitHub). I would be very grateful If you could tell me how to find these features.
Hi,
Glad you were able to get that to run. The scientist who wrote the paper and who is most familiar with the work is on vacation until early November. He is the best person to answer this detailed question. When he returns, I will bring this to his attention and we will get back to you.
Thank you,
Susan
Given the naming convention of the features, I think the features mentioned in the data_proc_E.ipynb
notebook probably came from running the run_feature_collection
script in IPFX. Its output includes a similar set of features to the “all analysis” example notebook (many of them directly equivalent) but also has a number of additional ones.
Hi Susan,
Thank you for your prompt response and for letting me know. I’m happy to wait until the scientist returns. I appreciate your help and will look forward to hearing from you.
Best regards,
beilouer
Thank you for the clarification! I’ll take a closer look at the run_feature_collection
script and compare the features with those in the data_proc_E.ipynb
notebook.
I’m sorry to bother you again. I have other related questions, and I wonder if you can answer them.
- Have the NWB data files stored at DANDI Archive, as applied in the article, already undergone quality control, standardization, and other preprocessing steps?I noticed that the article mentioned the use of 3,411 cells, but the website DANDI Archive shows 4,435 files. Does this mean that some data has already been filtered out? Can I directly use these data for SPCA analysis, electrophysiological feature extraction, and other related operations with the ipfx package? If not, what preprocessing steps do I still need to perform?
- What is the principle behind the SPCA analysis in the ipfx package? Is it performing Sparse Principal Component Analysis (sPCA) on the raw voltage and current time-series data? I’m not entirely clear on this part.
I would be very grateful if you could provide answers. Thank you!
I’m sorry to bother you again. I have other related questions, and I wonder if you can answer them.
- Have the NWB data files applied in the article, already undergone quality control, standardization, and other preprocessing steps?I noticed that the article mentioned the use of 3,411 cells, but the website shows 4,435 files. Does this mean that some data has already been filtered out? Can I directly use these data for SPCA analysis, electrophysiological feature extraction, and other related operations with the ipfx package? If not, what preprocessing steps do I still need to perform?
- What is the principle behind the SPCA analysis in the ipfx package? Is it performing Sparse Principal Component Analysis (sPCA) on the raw voltage and current time-series data? I’m not entirely clear on this part.
I would be very grateful if you could provide answers. Thank you!
The NWB files in the DANDI archive have passed electrophysiology quality control checks. Those Patch-seq data were originally published in an earlier 2020 Cell paper, so you may want to look at that paper for more information about data processing. There is also more information about the data in that DANDI set on this Allen Institute web site page.
There are more cells in the data set than used in the “Consistent cross-modal identification” paper for two main reasons - first, the data set on DANDI contains a set of excitatory mouse visual cortex neurons that were used to compare to human excitatory cortex neurons in another publication. Second, the “Consistent cross-modal identification” paper only uses cells that were judged to have consistent transcriptomic mapping by the criteria described in the 2020 Cell paper. But again, the published files on DANDI pass the electrophysiology-related QC criteria, so you could use them for IPFX operations as-is.
The sPCA analysis performed by the IPFX package is described in more detail in a 2019 Nature Neuroscience paper as well as the 2020 Cell paper. Sparse principal component analysis is performed on a combination of voltage traces (of the action potential waveform, responses to hyperpolarizing current steps, and interspike voltage trajectories) as well as on binned and interpolated time series of action potential features (AP height, width, threshold, etc.) across multiple depolarizing current steps. The components from the different types of data sets are collated together for additional analysis (clustering, UMAPs, etc.).
Thank you very much for your response; it has truly helped me a lot. I apologize as this is my first time working with electrophysiological data, so I’m not very familiar with it yet. I still have a few more questions:
- I observed that sPCA is obtained through feature vectors.I still don’t quite understand what this feature vector represents. Could you help explain the principle behind extracting these feature vectors? For instance, taking the feature vector for ‘first_ap_dv’ as an example.
- The article mentioned that, in addition to data processing and quality control, normalization was also performed. I would like to ask whether the NBW files in the DANDI archive have also undergone this normalization. If so, could you briefly explain the principle behind this normalization?
- The data from the article ‘Phenotypic variation of transcriptomic cell types in mouse motor cortex’ were also used in ‘Consistent cross-modal identification of cortical neurons with coupled autoencoders’. These data are stored in the NBW files in the DANDI archive as well. Have they also undergone quality control and normalization, and can they be used directly?
Thank you again for your time and assistance!
A “feature vector” can be a couple of different types of time series. I would look at Supplementary Figure 4 in the 2019 Nature Neuroscience paper I referenced above for an illustration of the sPCA analysis process. Figure 1a-d also show other examples of different feature vectors. For “first_ap_dv” (referred to as AP dV/dt in that paper), it’s the time series of the derivative of the voltage waveform during action potentials evoked by different stimulus types (e.g., a short current pulse, a 1 sec long current step). Those are concatenated into one “vector” and then sPCA is performed on the vectors of all cells. In other cases, it’s a time series of the binned averages of something like an AP feature (like AP threshold in Fig. 1c). Responses to different amplitude current steps are concatenated together for a given cell.
I’m not 100% sure about all the details of the normalization you’re referring to (you may have to wait until the lead author of that study is back, as Susan mentioned). But it would have been done on the features themselves (i.e., the outputs of the IPFX analysis), rather than on the raw electrophysiological data (the NWB files). I think the same applies for the files from the “Phenotypic variation…” paper, so you should be able to just analyze those files, as well.
Hi Gouwens,
I’m sorry to bother you again,I have encountered some additional questions that I need to ask you.I have read the article “Integrated Morphoelectric and Transcriptomic Classification of Cortical GABAergic Cells,” and I greatly admire this work. I am currently conducting some related research, and I would like to ask a few questions about the methods they used to extract morphological features, as mentioned in the article. The article states that “The code for electrophysiological and morphological feature analysis and clustering is available as part of the open-source Allen SDK repository, IPFX repository, and DRCME.” The article “Classification of electrophysiological and morphological neuron types in the mouse visual cortex” also provides a similar response. I have the following questions:
- If I want to extract morphological features, which specific package and submodule should I use to calculate them, and can morphological features be extracted directly from raw SWC data files?
- If I use raw SWC data from other articles, can I extract morphological features using the same methods?
- Do you know if there is a database available that contains both transcriptomic and morphological data for the same cells?
I deeply appreciate your help. Thank you very much!
For getting morphological features from SWC files, I would use the packages skeleton_keys
and neuron_morphology
- skeleton_keys
was written to perform a morphological analysis pipeline using neuron_morphology
for the underlying feature analysis. You can use it to analyze any SWCs, but certain morphological features may require additional information (like drawings of layer boundaries for cortical neurons) to calculate.
There isn’t a single database that has all the data for the same cells together, but there are instructions and links for accessing the different data modalities. You can download cell-by-gene expression matrices as well as the SWC files for the morphologies and use the metadata and file manifests to determine which files belong to which cells.
You can also browse those data in an interactive web viewer.