A question regarding data processing in the article "Consistent cross-modal identification of cortical neurons with coupled autoencoders."

The NWB files in the DANDI archive have passed electrophysiology quality control checks. Those Patch-seq data were originally published in an earlier 2020 Cell paper, so you may want to look at that paper for more information about data processing. There is also more information about the data in that DANDI set on this Allen Institute web site page.

There are more cells in the data set than used in the “Consistent cross-modal identification” paper for two main reasons - first, the data set on DANDI contains a set of excitatory mouse visual cortex neurons that were used to compare to human excitatory cortex neurons in another publication. Second, the “Consistent cross-modal identification” paper only uses cells that were judged to have consistent transcriptomic mapping by the criteria described in the 2020 Cell paper. But again, the published files on DANDI pass the electrophysiology-related QC criteria, so you could use them for IPFX operations as-is.

The sPCA analysis performed by the IPFX package is described in more detail in a 2019 Nature Neuroscience paper as well as the 2020 Cell paper. Sparse principal component analysis is performed on a combination of voltage traces (of the action potential waveform, responses to hyperpolarizing current steps, and interspike voltage trajectories) as well as on binned and interpolated time series of action potential features (AP height, width, threshold, etc.) across multiple depolarizing current steps. The components from the different types of data sets are collated together for additional analysis (clustering, UMAPs, etc.).