How to download raw data from Neuropixels public datasets

I am using jupyter notebook documentation to get access to the data, as prescribed in the documentation Using unit_id and channel_id, we can select the spikes and LFP within an arbitrary time interval. Noted that we need to use method='nearest' when selecting the LFP data channel, since not every electrode is included in the LFP DataArray. I want to confirm what I understood from the python notebooks documentation, the number of units_of_interest, is the LFP Data which is associated with the spike times right? and the units_of_interest is not more than 10, we want to work on a large number of channels which has spike times.

Since I want to work on the processing of a large datasets, I was wondering how can I get access to the raw electrode data and spike times of all the probes of a particular session.


The raw data (all Neuropixels channels sampled at 30 kHz) is available as part of an AWS Public Dataset. The easiest way to interact with this data is by mounting the S3 bucket in an AWS SageMaker instance. It’s also possible to download these files via the AWS Command-Line Interface, but this will be quite slow, as there is ~1.2 TB of data for each experiment.

See this thread for information on how to align the raw data with the spike times in the NWB files.

Hi Joshs, thank you.

Is it the correct way to dowload the spike_band.dat file from download option, I hope that this is file which contains the raw data of particular probe of a session. and I do want to know how do I read such a large file in my system

And does this file contain spike times too?


Hi Shaihan,

You can read very large .dat files in Python using memory-mapping, so the data is not all loaded at once:

import numpy as np
directory = '/path/to/file'
num_channels = 384
data = np.memmap(os.path.join(directory, 'spike_band.dat'), mode='r', dtype='int16')
data = data.reshape((len(data) // num_channels, num_channels))

The spike times are accessible via the NWB files for each session.