Questions on Timescale of Cell Types EPHYS data

Hello, my name is Tyler Brassel with the QUON-titative biology lab. Our team is interested in applying deep learning approaches to analyze the Allen Cell Types Database and we have a few questions regarding the EPHYS data. When we began examining long square datasets, we noticed two different sampling rates were used, 50,000 Hz and 200,000 Hz, most likely corresponding to an improvement in experimental apparatus. In order to use a neural network to better understand the features of this data, we need to downsample the data such that all the points correspond to a common timescale. We initially believed that we could simply find the common factor between the two sampling rates and downsample accordingly.

Upon closer examination we can see that datasets do not share a consistent number of data points even when considering the different sampling rates. For example, with a long square stimulus of -110 pA 323 sweeps have 401,000 data points, 6 sweeps have 401,050 data points, 8 sweeps have 1,604,001 data points, and 1 sweep has 1,604,000 data points. Similar differences are seen across the dataset. It would make sense that samples with 4 times the sampling rate would also have about 4 times the number of data points, but this does not explain the additional 1 or 50 data points.

A look at the documentation suggests that some sweeps include a test pulse. We wonder if the discrepancy in the data ranges is meant to convey the presence of a test pulse. In addition, a team member found one of the sweeps had an index_range of (150000, 1604000), meaning the index_range does not start at 0. Even when they tried to account for this range we still got several different output lengths corresponding to the ranges I mentioned earlier. Our questions are as follows:

  1. Why do some sweeps have 1 or even 50 more data points than others?
  2. What are the characteristics of a test pulse and how can we best detect a test pulse at the beginning of a sweep?
  3. Does the index_range attribute include or exclude the test pulse?
  4. Why does the index_range start at a number other than 0?
  5. Given this information, what do you feel is the best way to extrapolate a common timescale for this data? We have unsuccessfully tried using functions like allensdk.ephys.extract_cell_features.get_square_stim_characteristics(), which relies on knowing beforehand if a test pulse has occurred.

Thank you for your time. Any help answering these questions is appreciated.

Hi Tyler,

Thanks for you interest.
The earlier experiments had higher sampling rate, which was later reduced because it became apparent that 50 kHz rate was sufficient for this type of data.

The sweeps of a particular type (e.g. Long squares) should have the same time duration of the stimulus pulse, but stimulus may start at different times on different sweeps. To deal with this IPFX detects and keeping track of the different epochs within the sweep. After creating a sweep object
you can get the epochs as sweep.epochs, that return a dict with epoch names as keys and and (start, end) tuple of indices as values. You can see more an example of using epochs in the recently updated tutorial page https://ipfx.readthedocs.io/en/latest/tutorial.html
Sweeps may have different duration depending on the duration of the test epoch and when the recording was cut off. However the “experiment” epoch (that is “stimulus” epoch padded on both sides with the 0.5 s should have the same length for the Long Squares.

With this here are answers to your specific questions:

  1. Entire sweeps do not need to have the same number of data points. Only the stimulus epoch and experiment epoch should have the same number of data points, assuming the same sampling rate. The difference of 1 in the stimulus epoch may happen though as a result in the uncertainty in the acquisition.

  2. The test is a low amplitude pulse is a square pulse. It is already detected by IPFX when creating a sweep. To get the test pulse for a given sweep object do sweep.epochs["test"]

  3. We dropped the index_range attribute in favor of the more informative and comprehensive epochs dict. The version of IPFX you were dealing this was an earlier pre-release version. We are closing in on a v1.0.0 which you will be able to get from pipy. Or just get the latest master branch from github.

  4. I believe the index_range excluded the test pulse hence it started from a nonzero value.

  5. Accessing epochs attribute is the way to go. If yo also want to get the amplitude then you should use ipfx.stimulus_features.get_stim_characteristics. We are essentially deprecating the use of ephys analysis from the AllenSDK in favor of IPFX, so I encourage you to use functionality from the IPFX only.

The new version of IPFX supports data in the nwb2 format (previously ipfx supported nwb1), the data in the nwb2 format is publicly available on DANDI archive, but we yet to provide the documentation for how to access this data.
Please let me know if you have further questions.

Hi @tbrassel,

Great to hear the data is useful for your work - and it looks like there’s a response to your questions here in Github as well.