Hi again,

Regarding the first_ap_vector extraction, according to the code,

```
def first_ap_vectors(sweeps_list, spike_info_list,
target_sampling_rate=50000, window_length=0.003,
skip_clipped=False):
"""Average waveforms of first APs from sweeps
Parameters
----------
sweeps_list: list
List of Sweep objects
spike_info_list: list
List of spike info DataFrames
target_sampling_rate: float (optional, default 50000)
Desired sampling rate of output (Hz)
window_length: float (optional, default 0.003)
Length of AP waveform (seconds)
Returns
-------
ap_v: array of shape (target_sampling_rate * window_length)
Waveform of average AP
ap_dv: array of shape (target_sampling_rate * window_length - 1)
Waveform of first derivative of ap_v
"""
if skip_clipped:
nonclipped_sweeps_list = []
nonclipped_spike_info_list = []
for swp, si in zip(sweeps_list, spike_info_list):
if not si["clipped"].values[0]:
nonclipped_sweeps_list.append(swp)
nonclipped_spike_info_list.append(si)
sweeps_list = nonclipped_sweeps_list
spike_info_list = nonclipped_spike_info_list
if len(sweeps_list) == 0:
length_in_points = int(target_sampling_rate * window_length)
zero_v = np.zeros(length_in_points)
return zero_v, np.diff(zero_v)
```

if there is no sweep with at least 1 spike (for example in short-square stimuli ), then the value is replaced by an array of 0, as seen in this plot, where for this cell, there was no short_square or Ramp protocols.

Similarly I have a question regarding the binned-features (peak, threshold, etc). I saw in your documentation that when a bin has no data, you extrapolate from the neighboring bins that contains data. Then,

- if the first data containing-bin is not bin 0, then the first bins take the value of the first data containing bin (and similar for last bins)
- If between two data containing bins there are some bins without data, you do a linear extrapolation from the two data containing bins

That can be seen in this plot from sweep 29 of cell 517330781, where the first spike occurs in the 5th bin (triangle) so the first four bins in which there is no spike (circles) have the same value than the 5th. Similarly, between two data containing bins (triangle), the bins value are linearly extrapolated (circles).

I also understood that if a sweep was missing, you extrapolated in the same way from the two neighboring data-containing sweeps bin by bin.

My question is what is the motivation for extrapolating values (by assigning array of 0s or by linear extrapolation from neighboring bins/sweep ) rather than considering the array/bin/sweep as empty (with NaN value like I think you did when there is no sub-threshold trace for a cell)?

I realize it is a lot of points, so I thank you very much for all your help!

Best,

Julien