Is it possible to pull data into Google Cloud Storage from AllenSDK API directly from a cloud notebook instance?

My computer is about 7 years old. It’s great it has a GPU and SSD, but it’s not enough for a holiday project that I came up with. I ran up against a wall for the data storage and processing power I need.
I need cloud computing for my little neural data visualization project idea!

I went on Google Cloud and created a Jupyter notebook instance on Vertex AI. I wanted to use this instance to pull Allen data into Google Cloud Storage. So, I found Cloud Storage, created a bucket and a folder for data. I then had to upload a manifest.json file that I got before from Allen Brain Observatory commands that pulled it into my computer.

Here’s the code for pulling the data that I ran on the Jupyter cloud instance:

Don’t forget to install allensdk (for some reason you also need the --user flag, otherwise fails): !pip install --user allensdk

from google.cloud import storage

import os

from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache

import numpy as np

import pandas as pd


client = storage.Client()



def configure_allensdk():

    # Configure the cache to use Google Cloud Storage

    #EcephysProjectCache.cache_data = True

   #Replace bucket_name and storage_path with your own names of the buckets and folders

    #you create on Google Cloud Storage

    #EcephysProjectCache.manifest_uri = f'gs://{bucket_name}/{storage_path}manifest.json'

    

    manifest_path = f"gs://allen-neuropixesl/AllenNeuroPixelsCache/manifest.json"

    # Initialize the cache

    cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)



    sessions = cache.get_session_table().index

    print(sessions) #See the session identifiers 

    return sessions, cache    

sessions, cache=configure_allensdk(storage_path)



def make_csd_images():

    for s in sessions:

        session = cache.get_session_data(s,

                                 isi_violations_maximum = np.inf,

                                 amplitude_cutoff_maximum = np.inf,

                                 presence_ratio_minimum = -np.inf

                                )

        probes=session.probes.index

        print(probes)

      #There are 6 probes for each session.

        for p in probes:

            session.get_lfp(p)

            session.get_current_source_density(p)
            

make_csd_images()

The files loaded faster in the notebook after Iran the script, but I can’t see them in cloud storage. The jupyter instance has 100GB storage, I suppose it could have pulled it there. Now the instance seems to be frozen because I can’t get it to finish provisioning, takes a long time-- maybe it’s related to the fact that I filled up the 100GB with the data.

I was just wondering if you could give any comments on what I might be doing wrong or if it’s possible to pull the data into cloud storage somehow. Perhaps there’s a better way than what I tried? I’m also working through the documentation on Google Cloud (support is 29$ a month).

Best,
Maria

PS Love the work you are doing!

Hi Maria,

I think the issue is that the cache object your using is expecting a file system path, not a uri to Google Cloud Storage. There are likely issues with the code parsing the path. I’m unsure why are you aren’t seeing some sort of failure but perhaps the cache object is parsing the link in an unexpected way and writing to the 100GB available on your notebook instance.

If you can mount the google cloud storage bucket as a local drive/folder inside the notebook, that would likely be your best bet. It looks like this might be a good place for your to start: Cloud Storage FUSE  |  Google Cloud I’m not familiar with using this package or the Google cloud so you’ll have to refer to help elsewhere for that.

Good luck!