How to download raw data from Neuropixels public datasets

smshaihan · October 26, 2022, 8:12am

I am using jupyter notebook documentation to get access to the data, as prescribed in the documentation Using unit_id and channel_id, we can select the spikes and LFP within an arbitrary time interval. Noted that we need to use method='nearest' when selecting the LFP data channel, since not every electrode is included in the LFP DataArray. I want to confirm what I understood from the python notebooks documentation, the number of units_of_interest, is the LFP Data which is associated with the spike times right? and the units_of_interest is not more than 10, we want to work on a large number of channels which has spike times.

Since I want to work on the processing of a large datasets, I was wondering how can I get access to the raw electrode data and spike times of all the probes of a particular session.

Regards,
Shaihan

joshs · October 27, 2022, 5:58pm

The raw data (all Neuropixels channels sampled at 30 kHz) is available as part of an AWS Public Dataset. The easiest way to interact with this data is by mounting the S3 bucket in an AWS SageMaker instance. It’s also possible to download these files via the AWS Command-Line Interface, but this will be quite slow, as there is ~1.2 TB of data for each experiment.

See this thread for information on how to align the raw data with the spike times in the NWB files.

smshaihan · November 4, 2022, 7:45am

Hi Joshs, thank you.

Is it the correct way to dowload the spike_band.dat file from download option, I hope that this is file which contains the raw data of particular probe of a session. and I do want to know how do I read such a large file in my system

And does this file contain spike times too?

Regards
Shaihan

joshs · November 8, 2022, 6:29pm

Hi Shaihan,

You can read very large .dat files in Python using memory-mapping, so the data is not all loaded at once:

import numpy as np
directory = '/path/to/file'
num_channels = 384
data = np.memmap(os.path.join(directory, 'spike_band.dat'), mode='r', dtype='int16')
data = data.reshape((len(data) // num_channels, num_channels))

The spike times are accessible via the NWB files for each session.

jongore · July 8, 2023, 1:26pm

In Jupyter Notebook, you can use unit_id and channel_id to select spikes and LFP (Local Field Potential) data within a specific time interval. When selecting LFP data, it’s important to use the method=‘nearest’ because not all electrodes are included in the LFP DataArray. The “units_of_interest” refers to the units (neurons) you want to analyze, which are associated with spike times. Typically, the spike times are provided as an array or data structure.

If you want to access the raw electrode data and spike times for all probes in a session, you’ll need to refer to the specific data source or library you’re using, following the documentation or examples provided. The method or function to access this information depends on the data format and tools you’re working with.

Topic		Replies	Views
Opening Raw Data via Jupyter notebook made in AWS sagemaker How To brain-observatory-visual-coding	5	516	July 27, 2021
Using the raw data brain-observatory-visual-coding	6	1021	October 16, 2022
Downloading 30-60 minutes of RAW neuropixels data How To how-to	5	1161	July 14, 2021
LFP data of Visual Behavior - Neuropixels July 2022 and Probe Missmatch Technical analysis , allensdk , how-to	7	801	December 20, 2023
Active channels with respect to Neuropixel probe geometry (raw data) Technical brain-observatory-visual-coding , allensdk , how-to	1	552	October 10, 2023

How to download raw data from Neuropixels public datasets

Related topics