Using the IPFX module to calculate the ephys features

Hello !

I’m trying to calculate the ephys features of some cells from the Allen Cell Types Database as it’s done in this paper by Gouwens, Sorensen, Berg et al, 2019 : “Classification of electrophysiological and morphological neuron types in the mouse visual cortex”.

To do so I’m first using the IPFX module in order to extract the feature vectors, but I’m encountering some problems. Here are my steps :

1/I took the cell ids from the Supplementary dataset 3 of the paper and split them into an inhibitory csv and an excitatory csv (one id per line). Since some ids in this dataset are not present in the Allen Cell Types Database, my inhibitory csv has 972 cells (instead of 1010) and my excitatory csv has 885 cells (instead of 923).

2/Then I tried to run the run_feature_vector_extraction.py script by only changing in the CollectFeatureVectorParameters class the default argument the output_dir by the destination of my future output file, and the default argument of the input by my inhibitory (or excitatory) csv file.

3/While the script is running it downloaded all the ephys.nwb and ephys_sweeps.json files of each cell id. But then I get this error :

Traceback (most recent call last):
File “E:/MARGAUX/…/ipfx_test_2.py”, line 339, in
if name == “main”: main()
File “E:/MARGAUX/…/ipfx_test_2.py”, line 334, in main
run_feature_vector_extraction(ids=ids, **module.args)
File “E:/MARGAUX/…/ipfx_test_2.py”, line 311, in run_feature_vector_extraction
used_ids, results, error_set = su.filter_results(specimen_ids, results)
TypeError: cannot unpack non-iterable NoneType object

Can someone explain me what I do wrong ?

Thanks !
Margaux.

I think you hit on a bug that’s producing the TypeError instead of a more informative message (I just created an issue to fix that). But the underlying problem is that all of the cells failed to process correctly.

My guess as to why they are all failing is because you are analyzing NWB version 1 files from the Allen Cell Types Database instead of NWB version 2 files (which are currently being produced). The current version of IPFX supports only NWB version 2.

However, you can install a version of IPFX that support NWB version 1 files like this:

$ git clone --branch=nwb1-support https://github.com/AllenInstitute/ipfx.git
$ cd ipfx
$ pip install -e .

That should get you an IPFX that will handle the older NWB files.

When I tried to run the “$ pip install -e .” line I get this error:

Preparing metadata (setup.py) … error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
C:\Users\marga\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\setuptools\installer.p
y:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
error in ipfx setup command: ‘install_requires’ must be a string or list of strings containing valid project/version requirement specifiers; Parse error at “‘+
https:/’”: Expected stringEnd
[end of output]

I think it comes from the “git+https://github.com/neurodatawithoutborders/pynwb@dev” line in the requirements.txt file but I don’t know what to change to avoid this error.

Thanks,
Margaux.

I was able to reproduce your issue on my machine. From trying to get it to work, I think that more recent Python installations have some incompatibilities with this older version of the ipfx code.

To get around that, I was able to get it working through a combination of using an older Python (3.7) and patching the IPFX code.

If you’re using Anaconda, you can set up and start using an environment with Python 3.7 (named ipfxnwb1 in this example) by:

$ conda create -n ipfxnwb1 python=3.7
$ conda activate ipfxnwb

Next, you need to make a few changes to the IPFX code. I made a patch file that has those changes (which I can send to you - I can’t upload it in this forum, unfortunately). Once you have that file, you can apply it by navigating into the ipfx code directory and using the command:

$ git apply nwb1_install.patch

Once you do that, you can proceed with:

$ pip install -e .

That will install a bunch of older versions of IPFX’s dependencies, as well as ipfx itself.

I tested running the feature vector extraction script after that, and it worked on my machine. So I’m hopeful it will work for you, as well.

2 Likes

By using the nwb1_install.patch the pip install -e line worked. Then I ran the feature extraction script and it managed to get me to download the ephys.nwb and ephys_sweeps.json files of the 805 cells out of the 972 cells I wanted to download before getting this error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 72, in dataset_for_specimen_id
    nwb_file=nwb_path, sweep_info=sweep_info, ontology=ontology)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\aibs_data_set.py", line 16, in __init__
    self._nwb_data = nwb_reader.create_nwb_reader(nwb_file)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\nwb_reader.py", line 685, in create_nwb_reader
    nwb_version = get_nwb_version(nwb_file)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\nwb_reader.py", line 624, in get_nwb_version
    with h5py.File(nwb_file, 'r') as f:
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\files.py", line 408, in __init__
    swmr=swmr)
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 34531238, sblock->base_addr = 0, stored_eof = 83224070)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "E:\MARGAUX\new_ipfx_2\ipfx\extractor_2.py", line 82, in data_for_specimen_id
    data_set = su.dataset_for_specimen_id(specimen_id, data_source, ontology)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 76, in dataset_for_specimen_id
    return {"error": {"type": "dataset", "details": traceback.format_exc(limit=None)}}
NameError: name 'traceback' is not defined
"""

The first time I ran the script I managed to download around 700 cells before getting this error. Then I ran it a second time and managed to download 100 more cells, but now every time I run it I don’t download any more cells and just get this error

Thanks,
Margaux.

I’m glad the installation worked. I think there might be two issues going on here. The first is more straightforward, which is that the error isn’t being handled correctly by the script - when it encounters an error, it should log it and move on to the next cell. However, there is a missing import statement in the file script_utils.py. If you edit the file to add

import traceback

after the rest of the import statements but before the function definitions, I think it will log the error and move to the next cell.

I’m not sure why you’re getting the error in the first place. I think it may be a partially downloaded NWB file that is causing the trouble. The API tries not to download files you already have, so I think the reason you get the error each time you run it is because it tries to use an already downloaded file that is incomplete and hits the error. It runs the analysis in parallel, so that might be why you downloaded more files the second time you ran the script before hitting the error (because other threads completed their work and got more files before this error was encountered).

I’m hoping that with the traceback import issue fixed, the script will just log and skip over the problem file and continue with the rest of the cells. At that point, you could check the error log JSON file that it should produce and see which file is the problem (and then delete the NWB file and re-download it).

It took me a few hours, but I finally managed to download all the cells I wanted. Unfortunately, when creating the hdf5 file I got this error :

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Traceback (most recent call last):
  File "extractor_2.py", line 263, in <module>
    if __name__ == "__main__": main()
  File "extractor_2.py", line 258, in main
    run_feature_vector_extraction(ids=ids, **module.args)
  File "extractor_2.py", line 241, in run_feature_vector_extraction
    su.save_results_to_h5(used_ids, results_dict, output_dir, output_code)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 305, in save_results_to_h5
    compression="gzip")
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\group.py", line 136, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\dataset.py", line 118, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py\h5t.pyx", line 1634, in h5py.h5t.py_create
  File "h5py\h5t.pyx", line 1656, in h5py.h5t.py_create
  File "h5py\h5t.pyx", line 1711, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Does it mean that I shouldn’t put “h5” in output_file_type but rather some other type ?

You should be able to output in the H5 format (that’s what I used when testing on just a couple cells, and it worked for them). So there’s probably something different about your output; the h5py library used to be more flexible with saving that kind of thing, but as the error message says, that is no longer supported.

In any case, probably the fastest way to figure out what that is is to just save to the numpy format (use the option npy for output_file_type) and look at those results to see which one has entries with inconsistent lengths. (My guess is that it’s probably some kind of rounding issue and that there’s an extra point in some cases but not others.)

Hello, sorry for the delay!

I changed the output format to npy and I was able to access the files as well as the fv_errors_test.json which listed errors for some cell ids (I removed these cells for the rest of the procedure). And indeed one id had a shorter length in some files so I also removed thid id and it finally worked, thanks! Now I’ll try to do the sparse principal component analysis with the drcme package.

But how to avoid this length issue in the future? Should I run each time the script in npy, check the ids that do not have the same lengths, remove them (which is not optimal) before running the script again but with h5 as an output format ? Because it can be time consuming to do so when there are a lot of cells to process.

Best,

Margaux

Can you share the error you got for the cell that had the shorter length (along with its specimen ID)? I am wondering if it’s a bug that might be fixed in newer versions of IPFX. If that’s the case, then you could likely reprocess those cells that currently have errors and not have to go through the npy output to find them.

It is the 326774520 id. In order to know that it was this id, I checked the lengths of each id for each npy file. I didn’t have any particular error that pointed me to this id, it was the only one that didn’t have the same lengths.

When I launched the extraction with the output h5 format I had this error :

Traceback (most recent call last):
  File "extractor_2.py", line 263, in <module>
    if __name__ == "__main__": main()
  File "extractor_2.py", line 258, in main
    run_feature_vector_extraction(ids=ids, **module.args)
  File "extractor_2.py", line 241, in run_feature_vector_extraction
    su.save_results_to_h5(used_ids, results_dict, output_dir, output_code)
  File "E:\MARGAUX\new_ipfx_2\ipfx\ipfx\script_utils.py", line 305, in save_results_to_h5
    compression="gzip")
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\group.py", line 136, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "C:\Users\marga\anaconda3\envs\ipfxnwb2\lib\site-packages\h5py\_hl\dataset.py", line 118, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py\h5t.pyx", line 1634, in h5py.h5t.py_create
  File "h5py\h5t.pyx", line 1656, in h5py.h5t.py_create
  File "h5py\h5t.pyx", line 1711, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

When I launched the script with the npy format as output, I only had this sentence which signaled a length problem:

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

The ids that I had previously removed because they were in the fv_errors_test.json had no length problem. Here are error messages of some of those ids :

    {
        "id": 480952106,
        "error": {
            "error": {
                "type": "dataset",
                "details": "Traceback (most recent call last):\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\ipfx\\script_utils.py\", line 72, in dataset_for_specimen_id\n    nwb_file=nwb_path, sweep_info=sweep_info, ontology=ontology)\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\ipfx\\aibs_data_set.py\", line 16, in __init__\n    self._nwb_data = nwb_reader.create_nwb_reader(nwb_file)\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\ipfx\\nwb_reader.py\", line 685, in create_nwb_reader\n    nwb_version = get_nwb_version(nwb_file)\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\ipfx\\nwb_reader.py\", line 624, in get_nwb_version\n    with h5py.File(nwb_file, 'r') as f:\n  File \"C:\\Users\\marga\\anaconda3\\envs\\ipfxnwb2\\lib\\site-packages\\h5py\\_hl\\files.py\", line 408, in __init__\n    swmr=swmr)\n  File \"C:\\Users\\marga\\anaconda3\\envs\\ipfxnwb2\\lib\\site-packages\\h5py\\_hl\\files.py\", line 173, in make_fid\n    fid = h5f.open(name, flags, fapl=fapl)\n  File \"h5py\\_objects.pyx\", line 54, in h5py._objects.with_phil.wrapper\n  File \"h5py\\_objects.pyx\", line 55, in h5py._objects.with_phil.wrapper\n  File \"h5py\\h5f.pyx\", line 88, in h5py.h5f.open\nOSError: Unable to open file (truncated file: eof = 34531238, sblock->base_addr = 0, stored_eof = 83224070)\n"
            }
        }
    },
    {
        "id": 490278904,
        "error": {
            "error": {
                "type": "processing",
                "details": "Traceback (most recent call last):\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\extractor_2.py\", line 182, in data_for_specimen_id\n    lsq_features, target_amplitudes, shift=10)\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\ipfx\\feature_vectors.py\", line 612, in identify_suprathreshold_spike_info\n    features, target_amplitudes, shift, amp_tolerance)\n  File \"E:\\MARGAUX\\new_ipfx_2\\ipfx\\ipfx\\feature_vectors.py\", line 695, in _identify_suprathreshold_indices\n    raise er.FeatureError(\"Could not find at least two spiking sweeps matching requested amplitude levels\")\nipfx.error.FeatureError: Could not find at least two spiking sweeps matching requested amplitude levels\n"
            }
        }
    }

Thanks for the additional information.

My hunch it being related to a bug that’s been fixed in more recent versions of IFPX was correct - it’s related to floating point number representations (i.e., when it calculates the duration to determine the number of bins for the feature vector, a sweep that starts and ends at 1.02 to 2.02 is considered to have a slightly different length than one that starts and ends at 1.01 to 2.01, which leads to different length feature vectors).

In the current version, we get around this issue by rounding to the nearest millisecond, so this patch for the NWB v1 code will do the same thing:

diff --git a/ipfx/feature_vectors.py b/ipfx/feature_vectors.py
index 38a1ae8..384c895 100644
--- a/ipfx/feature_vectors.py
+++ b/ipfx/feature_vectors.py
@@ -726,7 +726,10 @@ def psth_vector(spike_info_list, start, end, width=50):
         thresh_t = si["threshold_t"]
         spike_count = np.ones_like(thresh_t)
         one_ms = 0.001
-        duration = end - start
+
+        # round to nearest ms to deal with float approximations
+        duration = np.round(np.round(end, decimals=3) - np.round(start, decimals=3), decimals=3)
+
         n_bins = int(duration / one_ms) // width
         bin_edges = np.linspace(start, end, n_bins + 1) # includes right edge, so adding one to desired bin number
         bin_width = bin_edges[1] - bin_edges[0]
@@ -772,7 +775,10 @@ def inst_freq_vector(spike_info_list, start, end, width=20):
         inst_freq, inst_freq_times = _inst_freq_feature(thresh_t, start, end)
 
         one_ms = 0.001
-        duration = end - start
+
+        # round to nearest ms to deal with float approximations
+        duration = np.round(np.round(end, decimals=3) - np.round(start, decimals=3), decimals=3)
+
         n_bins = int(duration / one_ms) // width
         bin_edges = np.linspace(start, end, n_bins + 1) # includes right edge, so adding one to desired bin number
         bin_width = bin_edges[1] - bin_edges[0]
@@ -829,7 +835,10 @@ def spike_feature_vector(feature, spike_info_list, start, end, width=20):
             feature_values = feature_values[mask]
 
         one_ms = 0.001
-        duration = end - start
+
+        # round to nearest ms to deal with float approximations
+        duration = np.round(np.round(end, decimals=3) - np.round(start, decimals=3), decimals=3)
+
         n_bins = int(duration / one_ms) // width
         bin_edges = np.linspace(start, end, n_bins + 1) # includes right edge, so adding one to desired bin number
         bin_width = bin_edges[1] - bin_edges[0]

You can save that to a patch file and apply it to your local code. When I do that, saving to HDF5 files works.

For your other errors, it seems like some of them are related to NWB files that can’t be opened because they seem to be truncated - you may want to delete the files that produce those errors and re-download them.

Thanks it finally worked for all the ids !

1 Like

Thank you for posting this! I was also having trouble downloading ipfx because errors kept popping regrading how its dependencies were not being met…May you please also message me the patch file nwb1_install.patch . Thank you in advance!

Sure thing - I just sent you a direct message about getting an address to email the file.

Hi,

Thank you for the helpful discussion! We were also trying to extract ephys features from .nwb files. A question we were having was extracting features from multiple recording sessions of a cell. We are wondering,

  1. if it makes sense to do so, or if ephys features of a cell are extracted from just one session
  2. if there is a way for doing so, since in create_ephys_data_set function it seems to only take one nwb file
  3. if it’s a good idea that we manually combine sessions and create a new nwb file for each cell, and extract ephys features from there

Thanks for the help!

Best,
Alex

We were able to solve this problem, each neuron should have only one session in the dataset we are
looking into. Forum administrator please feel free to ignore/delete this post, thank you.

Best,
Alex

Hi,

Thank you for this helpful discussion! I am also trying to compute these different ephys features for some cells (10 cells to begin with), and I have few questions.
I downloaded the ipfx package. Then from Spider, I run the run_feature_extraction_vector.py, after modifying default arguments for 1) output_dir, 2) input for the csv containing 10 cells ids.
Here are my questions:

  1. When I first Run the script in Spider, the IPython console was “busy”, but nothing happened in 2h. I then interrupted the console and got this message:
Traceback (most recent call last):

  File "/Users/julienballbe/My_Work/ipfx-master/ipfx/bin/run_feature_vector_extraction.py", line 342, in <module>
    if __name__ == "__main__": main()

  File "/Users/julienballbe/My_Work/ipfx-master/ipfx/bin/run_feature_vector_extraction.py", line 337, in main
    run_feature_vector_extraction(ids=ids, **module.args)

  File "/Users/julienballbe/My_Work/ipfx-master/ipfx/bin/run_feature_vector_extraction.py", line 310, in run_feature_vector_extraction
    results = pool.map(get_data_partial, specimen_ids)

  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()

  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 651, in get
    self.wait(timeout)

  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 648, in wait
    self._event.wait(timeout)

  File "/opt/anaconda3/envs/Allen/lib/python3.7/threading.py", line 552, in wait
    signaled = self._cond.wait(timeout)

  File "/opt/anaconda3/envs/Allen/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()

KeyboardInterrupt
Output from spyder call 'get_cwd':
Process ForkPoolWorker-1480:
Process ForkPoolWorker-1488:
Process ForkPoolWorker-1481:
Process ForkPoolWorker-1485:
Process ForkPoolWorker-1484:
Process ForkPoolWorker-1483:
Process ForkPoolWorker-1482:
Process ForkPoolWorker-1487:
Process ForkPoolWorker-1489:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 352, in get
    res = self._reader.recv_bytes()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/pool.py", line 110, in worker
    task = get()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/queues.py", line 351, in get
    with self._rlock:
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
  File "/opt/anaconda3/envs/Allen/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

I then changed default input for run_parallel from True to False, and then the script actually did something, started to download nwb files etc… Do you have an idea why it was blocking at this step?

  1. As the script was running (after I changed run_parallel to False), the nwb files were actually downloading, even though I already have them downloaded in another folder. Do you know how I can indicate the path to the nab files already downloaded?

  2. At first when I tried to run the script (after changing the output folder), I got an error about the fact that the H5 file did not exist coming from line

if output_file_type == "h5":
        # Check that we can access the specified file before processing everything
        h5_file = h5py.File(os.path.join(output_dir, "fv_{}.h5".format(output_code)))
        h5_file.close()
    

, which was true as I though the script would generate one. I then indicated output_dir and output_code so that it directed to an already existing H5 file containing features vectors, and the Error disappeared. Is that normal, or is there an option to activated to create the H5 file if it does not already exist?

Thank you very much for any help you can give me!
Best wishes!
Julien

I’m not sure exactly why it didn’t work, but I have sometimes run into trouble using multiprocessing features inside an interactive environment (like Jupyter), so perhaps it’s a similar issue with Spyder, too. You could see if running the script directly also hangs.

The script does make use of the CellTypesCache feature of Allen SDK - when it downloads files, it uses a local manifest file to keep track of already-downloaded files. I think the default is to put it in a cell_types directory and create a manifest.json file within that. It’s possible that something got moved around and it can no longer find them, so it thinks it needs to find them again. I’d check to see if there is a manifest file and what it’s tracking within the file.

This might be due to h5py version differences - I think newer versions of h5py may only default to read-only (and require the file to already exist) instead of opting for a read/write mode if the file doesn’t already exist. Could you try changing the line to

h5_file = h5py.File(os.path.join(output_dir, "fv_{}.h5".format(output_code)), 'a')

and see if that fixes the error? If so, I’ll update the repository with that change.

Thank you for your relies!

I’m not sure exactly why it didn’t work, but I have sometimes run into trouble using multiprocessing features inside an interactive environment (like Jupyter), so perhaps it’s a similar issue with Spyder, too. You could see if running the script directly also hangs.

I tried as you suggested and Multiprocessing does work when when I run the script directly from Terminal.

This might be due to h5py version differences - I think newer versions of h5py may only default to read-only (and require the file to already exist) instead of opting for a read/write mode if the file doesn’t already exist. Could you try changing the line to

Yes that works perfectly, it now creates the file if it does not already exists

It’s possible that something got moved around and it can no longer find them, so it thinks it needs to find them again. I’d check to see if there is a manifest file and what it’s tracking within the file.

I tried to move 1) my original manifest file to the same folder than the run_feature_vector_extraction.py script and then 2) the run_feature_vector_extraction.py to the same folder where the cell_types directory is (with the manifest file, and all the cell files). But in either situation, the script keeps downloading the files in a new cell_types directory.
I found a temporary solution. I searched in the code where the CellTypesCache was called (in script_utils.py) and I modified line 52 to

ctc = CellTypesCache(manifest_file="/Users/julienballbe/My_Work/Allen_Data/Common_Script/Full_analysis_cell_types/manifest.json")

It seems to work, yet I don’t know if that’s a good solution to hardcode it…
When I ran the script I then encountered another error:

Traceback (most recent call last):
  File "run_feature_vector_extraction.py", line 345, in <module>
    if __name__ == "__main__": main()
  File "run_feature_vector_extraction.py", line 340, in main
    run_feature_vector_extraction(ids=ids, **module.args)
  File "run_feature_vector_extraction.py", line 317, in run_feature_vector_extraction
    used_ids, results, error_set = su.filter_results(specimen_ids, results)
TypeError: cannot unpack non-iterable NoneType object

I tried to print the results object and I got this

[{'error': {'type': 'dataset', 'details': 'Traceback (most recent call last):\n  File "/Users/julienballbe/My_Work/ipfx-master/ipfx/script_utils.py", line 76, in dataset_for_specimen_id\n    nwb_file=nwb_path, sweep_info=sweep_info, ontology=ontology)
  File "/Users/julienballbe/My_Work/ipfx-master/ipfx/dataset/create.py", line 101, in create_ephys_data_set
    is_mies = is_file_mies(nwb_file)\n  File "/Users/julienballbe/My_Work/ipfx-master/ipfx/dataset/create.py", line 33, in is_file_mies
        generated_by = dict(fil["general"]["generated_by"][:])
        ValueError: dictionary update sequence element #0 has length 8; 2 is required\n'}},{'error':...

with the same element (error:{'type … ) repeating
Do you have an idea why this happens?