Using the IPFX module to calculate the ephys features

Hello !

I’m trying to calculate the ephys features of some cells from the Allen Cell Types Database as it’s done in this paper by Gouwens, Sorensen, Berg et al, 2019 : “Classification of electrophysiological and morphological neuron types in the mouse visual cortex”.

To do so I’m first using the IPFX module in order to extract the feature vectors, but I’m encountering some problems. Here are my steps :

1/I took the cell ids from the Supplementary dataset 3 of the paper and split them into an inhibitory csv and an excitatory csv (one id per line). Since some ids in this dataset are not present in the Allen Cell Types Database, my inhibitory csv has 972 cells (instead of 1010) and my excitatory csv has 885 cells (instead of 923).

2/Then I tried to run the run_feature_vector_extraction.py script by only changing in the CollectFeatureVectorParameters class the default argument the output_dir by the destination of my future output file, and the default argument of the input by my inhibitory (or excitatory) csv file.

3/While the script is running it downloaded all the ephys.nwb and ephys_sweeps.json files of each cell id. But then I get this error :

Traceback (most recent call last):
File “E:/MARGAUX/…/ipfx_test_2.py”, line 339, in
if name == “main”: main()
File “E:/MARGAUX/…/ipfx_test_2.py”, line 334, in main
run_feature_vector_extraction(ids=ids, **module.args)
File “E:/MARGAUX/…/ipfx_test_2.py”, line 311, in run_feature_vector_extraction
used_ids, results, error_set = su.filter_results(specimen_ids, results)
TypeError: cannot unpack non-iterable NoneType object

Can someone explain me what I do wrong ?

Thanks !
Margaux.

I think you hit on a bug that’s producing the TypeError instead of a more informative message (I just created an issue to fix that). But the underlying problem is that all of the cells failed to process correctly.

My guess as to why they are all failing is because you are analyzing NWB version 1 files from the Allen Cell Types Database instead of NWB version 2 files (which are currently being produced). The current version of IPFX supports only NWB version 2.

However, you can install a version of IPFX that support NWB version 1 files like this:

$ git clone --branch=nwb1-support https://github.com/AllenInstitute/ipfx.git
$ cd ipfx
$ pip install -e .

That should get you an IPFX that will handle the older NWB files.

When I tried to run the “$ pip install -e .” line I get this error:

Preparing metadata (setup.py) … error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
C:\Users\marga\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\setuptools\installer.p
y:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
error in ipfx setup command: ‘install_requires’ must be a string or list of strings containing valid project/version requirement specifiers; Parse error at “‘+
https:/’”: Expected stringEnd
[end of output]

I think it comes from the “git+https://github.com/neurodatawithoutborders/pynwb@dev” line in the requirements.txt file but I don’t know what to change to avoid this error.

Thanks,
Margaux.

I was able to reproduce your issue on my machine. From trying to get it to work, I think that more recent Python installations have some incompatibilities with this older version of the ipfx code.

To get around that, I was able to get it working through a combination of using an older Python (3.7) and patching the IPFX code.

If you’re using Anaconda, you can set up and start using an environment with Python 3.7 (named ipfxnwb1 in this example) by:

$ conda create -n ipfxnwb1 python=3.7
$ conda activate ipfxnwb

Next, you need to make a few changes to the IPFX code. I made a patch file that has those changes (which I can send to you - I can’t upload it in this forum, unfortunately). Once you have that file, you can apply it by navigating into the ipfx code directory and using the command:

$ git apply nwb1_install.patch

Once you do that, you can proceed with:

$ pip install -e .

That will install a bunch of older versions of IPFX’s dependencies, as well as ipfx itself.

I tested running the feature vector extraction script after that, and it worked on my machine. So I’m hopeful it will work for you, as well.

By using the nwb1_install.patch the pip install -e line worked. Then I ran the feature extraction script and it managed to get me to download the ephys.nwb and ephys_sweeps.json files of the 805 cells out of the 972 cells I wanted to download before getting this error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 72, in dataset_for_specimen_id
    nwb_file=nwb_path, sweep_info=sweep_info, ontology=ontology)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\aibs_data_set.py", line 16, in __init__
    self._nwb_data = nwb_reader.create_nwb_reader(nwb_file)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\nwb_reader.py", line 685, in create_nwb_reader
    nwb_version = get_nwb_version(nwb_file)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\nwb_reader.py", line 624, in get_nwb_version
    with h5py.File(nwb_file, 'r') as f:
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\files.py", line 408, in __init__
    swmr=swmr)
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 34531238, sblock->base_addr = 0, stored_eof = 83224070)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "E:\MARGAUX\new_ipfx_2\ipfx\extractor_2.py", line 82, in data_for_specimen_id
    data_set = su.dataset_for_specimen_id(specimen_id, data_source, ontology)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 76, in dataset_for_specimen_id
    return {"error": {"type": "dataset", "details": traceback.format_exc(limit=None)}}
NameError: name 'traceback' is not defined
"""

The first time I ran the script I managed to download around 700 cells before getting this error. Then I ran it a second time and managed to download 100 more cells, but now every time I run it I don’t download any more cells and just get this error

Thanks,
Margaux.

I’m glad the installation worked. I think there might be two issues going on here. The first is more straightforward, which is that the error isn’t being handled correctly by the script - when it encounters an error, it should log it and move on to the next cell. However, there is a missing import statement in the file script_utils.py. If you edit the file to add

import traceback

after the rest of the import statements but before the function definitions, I think it will log the error and move to the next cell.

I’m not sure why you’re getting the error in the first place. I think it may be a partially downloaded NWB file that is causing the trouble. The API tries not to download files you already have, so I think the reason you get the error each time you run it is because it tries to use an already downloaded file that is incomplete and hits the error. It runs the analysis in parallel, so that might be why you downloaded more files the second time you ran the script before hitting the error (because other threads completed their work and got more files before this error was encountered).

I’m hoping that with the traceback import issue fixed, the script will just log and skip over the problem file and continue with the rest of the cells. At that point, you could check the error log JSON file that it should produce and see which file is the problem (and then delete the NWB file and re-download it).

It took me a few hours, but I finally managed to download all the cells I wanted. Unfortunately, when creating the hdf5 file I got this error :

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Traceback (most recent call last):
  File "extractor_2.py", line 263, in <module>
    if __name__ == "__main__": main()
  File "extractor_2.py", line 258, in main
    run_feature_vector_extraction(ids=ids, **module.args)
  File "extractor_2.py", line 241, in run_feature_vector_extraction
    su.save_results_to_h5(used_ids, results_dict, output_dir, output_code)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 305, in save_results_to_h5
    compression="gzip")
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\group.py", line 136, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\dataset.py", line 118, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py\h5t.pyx", line 1634, in h5py.h5t.py_create
  File "h5py\h5t.pyx", line 1656, in h5py.h5t.py_create
  File "h5py\h5t.pyx", line 1711, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Does it mean that I shouldn’t put “h5” in output_file_type but rather some other type ?

You should be able to output in the H5 format (that’s what I used when testing on just a couple cells, and it worked for them). So there’s probably something different about your output; the h5py library used to be more flexible with saving that kind of thing, but as the error message says, that is no longer supported.

In any case, probably the fastest way to figure out what that is is to just save to the numpy format (use the option npy for output_file_type) and look at those results to see which one has entries with inconsistent lengths. (My guess is that it’s probably some kind of rounding issue and that there’s an extra point in some cases but not others.)

Hello, sorry for the delay!

I changed the output format to npy and I was able to access the files as well as the fv_errors_test.json which listed errors for some cell ids (I removed these cells for the rest of the procedure). And indeed one id had a shorter length in some files so I also removed thid id and it finally worked, thanks! Now I’ll try to do the sparse principal component analysis with the drcme package.

But how to avoid this length issue in the future? Should I run each time the script in npy, check the ids that do not have the same lengths, remove them (which is not optimal) before running the script again but with h5 as an output format ? Because it can be time consuming to do so when there are a lot of cells to process.

Best,

Margaux