Slow loading of VBN data via AllenSDK

Hi,

The cache.get_ecephys_session(id) is quite slow for me on visual behavior neuropixels data that has already been downloaded (~1min), and takes about 20 times longer than a pynwb command to load the data via NWBHDF5IO.

I was wondering if there was something that might be causing this slowdown due to how I am using the function perhaps? Or is it simply the case that the AllenSDK performs a lot of background calculations after loading the dataset from NWB to add extra information that is time consuming?

Further context

I have tried it on my own system with predownloaded data (Windows 10, python 3.8.10, allensdk 2.14.1, pynwb 2.2.0) - NWB took about 3seconds, AllenSDK about 1minute. I also tried on google colab with allensdk 2.14.1 on data that was mounted from my google drive - NWB took about 3seconds, AllenSDK about 30seconds.

I have included a code snippet that I used to test the times.

from time import perf_counter
from pathlib import Path

from allensdk.brain_observatory.behavior.behavior_project_cache import (
    VisualBehaviorNeuropixelsProjectCache,
)
from pynwb import NWBHDF5IO

ALLEN_DIR = Path(r"D:\example-data\allen-data")
ALLEN_MANIFEST = "visual-behavior-neuropixels_project_manifest_v0.4.0.json"
EXAMPLE_SESSION = 1044385384


def timeit(name):
    def inner(func):
        def wrapper(*args, **kwargs):
            t1 = perf_counter()
            func(*args, **kwargs)
            t2 = perf_counter()

            print(f"{name} took {t2 - t1:.2f} seconds")

        return wrapper

    return inner


@timeit("Allen SDK loader")
def allen_example(cache):
    return cache.get_ecephys_session(EXAMPLE_SESSION)


@timeit("NWB loader")
def nwb_example(nwb_path):
    nwb_io = NWBHDF5IO(nwb_path, "r", load_namespaces=True)
    return nwb_io.read()


def main(cache_is_s3=True):
    if cache_is_s3:
        cache = VisualBehaviorNeuropixelsProjectCache.from_s3_cache(cache_dir=ALLEN_DIR)
    else:
        cache = VisualBehaviorNeuropixelsProjectCache.from_local_cache(
            cache_dir=ALLEN_DIR
        )
    cache.load_manifest(ALLEN_MANIFEST)

    nwb_path = (
        ALLEN_DIR
        / "visual-behavior-neuropixels-0.4.0"
        / "behavior_ecephys_sessions"
        / str(EXAMPLE_SESSION)
        / f"ecephys_session_{EXAMPLE_SESSION}.nwb"
    )

    nwb_example(nwb_path)
    allen_example(cache)

    allen_example(cache)
    nwb_example(nwb_path)


main(cache_is_s3=True)
main(cache_is_s3=False)

And the output of timing:

NWB loader took 2.76 seconds
Allen SDK loader took 56.84 seconds
Allen SDK loader took 50.09 seconds
NWB loader took 2.80 seconds
NWB loader took 2.36 seconds
Allen SDK loader took 51.56 seconds
Allen SDK loader took 56.49 seconds
NWB loader took 7.52 seconds

Thanks for running this comparison, it’s very helpful! Do you mind re-posting this info here? That’s our preferred place to track software-related issues.

Hi @joshs, sure thing, I’ll post it over on github now - thanks :slight_smile: Edit: posted here.