Manage DATA.NWB files of human synapses

Thank you for the clarification Stephanie.

I have one more question:
Considering that the experiments performed at 50 Hz could have different times between the 8th stimulus and the 9th stimulus (recovery train of 4 pulses), how can I get this time value?

Thanks

The easiest place to get this is in the MultiPatchProbe table of the database. There is a column called “recovery_delay” which specifies the amount of time between the 8th and 9th pulse of any stimulus frequency. Our documentation shows what is available in each table of the database and how they can be linked to one another. If you have the recording object that you are interested in you can get to multi_patch_probe via recording.patch_clamp_recording.multi_patch_probe.recovery_delay

Dear Stephanie,

I want to create a csv file with the data that I want from your database.
I would like to have: ‘pair’, ‘syn_type’, ‘pre_exp_id’, ‘pre_id’, ‘pre_class’, ‘pre_layer’,‘post_exp_id’, ‘post_id’,
‘post_class’, ‘post_layer’, ‘PSP_mean (mV)’, ‘stim_freq (Hz)’, ‘time_rec (ms)’

Here I send you my code:

import numpy as np
import scipy.stats
from aisynphys.database import default_db as db
from aisynphys.cell_class import CellClass, classify_cells, classify_pairs
from aisynphys.connectivity import measure_connectivity, pair_was_probed
from aisynphys.database import SynphysDatabase
import matplotlib.pyplot as plt
from aisynphys.ui.notebook import plot_stim_sorted_pulse_amp
from aisynphys.dynamics import stim_sorted_pulse_amp 
import csv

# Download and cache the sqlite file for the requested database
# Load all synapses associated with human projects
# First load the data base
path_to_db = '/Users/natalibarros/ai_synphys_cache/database/synphys_r2.0_medium.sqlite'
db = SynphysDatabase.load_version(path_to_db)

pairs = db.pair_query(project_name=db.human_projects, synapse=True).all()

print("loaded %d synapses" % len(pairs))

print("number of experiments %d" % len(set([pair.experiment for pair in pairs])))

print("number of experiments with recordings %d" %len(set([pair.experiment.data for pair in pairs])))

# We can do a pair_query to return pairs with stp valid values

query = db.pair_query(
    experiment_type='standard multipatch',   # filter: just multipatch experiments
    species='human',                         # filter: only human data
    synapse=True,                            # filter: only cell pairs connected by synapse
)
pairs = query.all()
print('number of pairs with patch experiments', len(pairs))

pairs_with_stp = [p for p in pairs if p.dynamics.stp_induction_50hz is not None] # Is that ok?
print('number of pairs with STP dynamics', len(pairs_with_stp))

N = 1
# create a csv file object to save useful values
path_to_csv_file = '/Users/natalibarros/Desktop/EPFL_BBP/HUMAN_CORTEX/CIRCUIT_BUILDING/DATA/DATA_Allen_Brain/PSP_values_DBmedium_04.csv'
with open(path_to_csv_file, 'wt') as csvfile:
    writer = csv.writer(csvfile, delimiter=',', lineterminator='\n', )
    
    # Add the header row
    writer.writerow(['pair', 'syn_type', 'pre_exp_id', 'pre_id', 'pre_class', 'pre_layer','post_exp_id', 'post_id', 
                     'post_class', 'post_layer', 'PSP_mean (mV)', 'stim_freq (Hz)', 'time_rec (ms)'])
    for pair in pairs_with_stp:
        print('#', N)
        print('working on pair', pair)
        N = N+1
        # syn and cells information
        try:
            syn_type = pair.synapse.synapse_type
            pre_exp_id = pair.pre_cell
            pre_id = pair.pre_cell_id
            pre_class = pair.pre_cell.cell_class
            pre_layer = pair.pre_cell.cortical_location.cortical_layer
            post_exp_id = pair.post_cell
            post_id = pair.post_cell_id
            post_class = pair.post_cell.cell_class
            post_layer = pair.post_cell.cortical_location.cortical_layer
        except AttributeError:
            print('missing attribute')
            continue
        
        # PSPs information
        qc_pass_data = stim_sorted_pulse_amp(pair, db=db)

        # Stim freq information (ind_f)
        try:
            ind_f = int(qc_pass_data['induction_frequency'][0])
            # scatter plots of event amplitudes sorted by pulse number 
            mask = qc_pass_data['induction_frequency'] == ind_f
        except KeyError:
            ind_f = 50 # assumption: if there is no stim freq, we suppose is 50 Hz
            mask = qc_pass_data['induction_frequency'] == ind_f
            print('no ind_f vector')
            
        filtered = qc_pass_data[mask].copy()
        sign = 1 if pair.synapse.synapse_type == 'ex' else -1
        try:
            filtered['dec_fit_reconv_amp'] *= sign * 1000         # the data are in Volts, so I convert them into mV.
        except KeyError:
            print('No fit amps for pair: %s' % pair)
        PSP_means = filtered.groupby('pulse_number').mean()['dec_fit_reconv_amp'].to_list()
        
        # Find recovery_delay???????
        
    
        # Add the data row
        writer.writerow([pair, syn_type, pre_exp_id, pre_id, pre_class, pre_layer, 
                         post_exp_id, post_id, post_class, post_layer, PSP_means, ind_f, recov_time])
    

My questions are:

  1. Is it ok to use pair.dynamics.stp_induction_50hz to know which pairs have stp dynamics at all frequencies?
  2. I don’t know how to ask for the recovery_delay. I know that I need a recording object, but I don’t know how to do it inside my loop.

Could you help me please?

Hi Natali,

  1. No, it is not safe to assume that if a pair has 50hz data that is has all of the other frequencies. Similarly, in your try: except where you are masking qc_pass_data your assumption that if the KeyError is caught that ind_f == 50 also seems incorrect
  2. In this case the recovery delay can be found in the qc_pass_data dataframe that you have as a column (named recovery_delay) in the dataframe
qc_pass_data.head()

    fit_amp  dec_fit_reconv_amp  baseline_dec_fit_reconv_amp  qc_pass  \
0  0.000136            0.000234                    -0.000004     True   
1  0.000508            0.000492                     0.000032     True   
2  0.000019           -0.000011                    -0.000015     True   
3  0.001213            0.000295                    -0.000018     True   
4  0.000276            0.000073                    -0.000010     True   

   pulse_number  induction_frequency  recovery_delay  sync_rec_ext_id  \
0             1                 50.0           0.128               41   
1             2                 50.0           0.128               41   
2             3                 50.0           0.128               41   
3             4                 50.0           0.128               41   
4             5                 50.0           0.128               41   

         id  
0  14331746  
1  14331747  
2  14331749  
3  14331751  
4  14331753 

Nominally our recovery delays are (in seconds) 0.125, 0.25, 0.5, 1.0, 4.0 but due to acquisition sampling rate the values that are populated in the Database are a little different. You can see what recovery delays are present if you do:

qc_pass_data['recovery_delay'].unique()
>>> array([0.128, 0.253, 0.503, 1.003, 4.003]

Note that it’s only the 50 Hz stimulus that has multiple recovery delays, all other frequencies are only the 0.25s recovery delay. So if you wanted to filter your qc_pass_data to only include a particular recovery you could do something like:

mask = (qc_pass_data['induction_frequency'] == 50) & (qc_pass_data['recovery_delay'] == 0.503)
filtered = qc_pass_data[mask]
filtered.head()

         fit_amp  dec_fit_reconv_amp  baseline_dec_fit_reconv_amp  qc_pass  \
24  6.519115e-05            0.000028                    -0.000036     True   
25  1.434500e-03            0.000820                    -0.000021     True   
26  1.217622e-03            0.000187                     0.000020     True   
27  2.364060e-04           -0.000149                     0.000027     True   
28  3.436510e-12           -0.000027                     0.000014     True   

    pulse_number  induction_frequency  recovery_delay  sync_rec_ext_id  \
24             1                 50.0           0.503               43   
25             2                 50.0           0.503               43   
26             3                 50.0           0.503               43   
27             4                 50.0           0.503               43   
28             5                 50.0           0.503               43   

          id  
24  14333817  
25  14333819  
26  14333821  
27  14333823  
28  14333825

Thank you very much for the answer.
I think I got it now, mean PSP amps separated by frequencies and with recovery_delay (important for 50Hz)

After the line:

# PSPs information
        qc_pass_data = stim_sorted_pulse_amp(pair, db=db)

I included this:

# stim freq and recovery delay information
        ind_f = qc_pass_data['induction_frequency'].unique()
        rec_del = qc_pass_data['recovery_delay'].unique()
        ind_f_list = ind_f.tolist()
        rec_del_list = rec_del.tolist()

        for i, r in zip(ind_f_list,rec_del_list):
            mask = (qc_pass_data['induction_frequency'] == i) & (qc_pass_data['recovery_delay'] == r)
            filtered = qc_pass_data[mask]
            
            # detect ex or ihb and flip trace if inb
            sign = 1 if pair.synapse.synapse_type == 'ex' else -1
            try:
                filtered['dec_fit_reconv_amp'] *= sign * 1000  # convert values into mV
            except KeyError:
                print('No fit amps for pair: %s' % pair)

            PSP_means = filtered.groupby('pulse_number').mean()['dec_fit_reconv_amp'].to_list()  

    
            # Add the data row
            writer.writerow([pair, syn_type, pre_exp_id, pre_id, pre_class, pre_layer, 
                             post_exp_id, post_id, post_class, post_layer, PSP_means, i, r])

The zip(ind_f_list, rec_del_list) assumes that the induction frequency and recovery delay are paired (that’s what zip() does) which isn’t the case. If you want all combinations of induction frequencies and recovery delays I would do a nested loop, but since the only induction frequency that has multiple recovery delays is the 50Hz stimulus you could just create a nested if loop that goes through the recovery delays when the induction frequency is 50Hz. Based on exactly what you want there are multiple ways you could construct this but you’ve at least got the frequencies and delays pulled out correctly.

Thank you Stepahnie for the clarification,

I changed the code to this.

        # stim freq and recovery delay information
        ind_f = qc_pass_data['induction_frequency'].unique()
        #rec_del = qc_pass_data['recovery_delay'].unique()
        ind_f_list = ind_f.tolist()
        #rec_del_list = rec_del.tolist()

        for i in ind_f_list:
            if i == 50:
                print('50 Hz stimulation')
                rec_del = qc_pass_data['recovery_delay'].unique()
                rec_del_list = rec_del.tolist()
                for r in rec_del_list:
                    mask = (qc_pass_data['induction_frequency'] == i) & (qc_pass_data['recovery_delay'] == r)
                    filtered = qc_pass_data[mask]
                    print(r)

                    sign = 1 if pair.synapse.synapse_type == 'ex' else -1
                    try:
                        filtered['dec_fit_reconv_amp'] *= sign * 1000
                    except KeyError:
                        print('No fit amps for pair: %s' % pair)

                    PSP_means = filtered.groupby('pulse_number').mean()['dec_fit_reconv_amp'].to_list() 
                    if PSP_means == []:
                        print('PSP empty')
                        pass
                    else:
                        print('PSP')
                        # Add the data row
                        writer.writerow([pair, syn_type, pre_exp_id, pre_id, pre_class, pre_layer, 
                                         post_exp_id, post_id, post_class, post_layer, PSP_means, i, r])
            else: 
                print('other freq')
                mask = (qc_pass_data['induction_frequency'] == i)
                filtered = qc_pass_data[mask]

                sign = 1 if pair.synapse.synapse_type == 'ex' else -1
                try:
                    filtered['dec_fit_reconv_amp'] *= sign * 1000
                except KeyError:
                    print('No fit amps for pair: %s' % pair)

                PSP_means = filtered.groupby('pulse_number').mean()['dec_fit_reconv_amp'].to_list() 
                if PSP_means == []:
                    print('PSP empty')
                    pass
                else:
                    print('PSP')
                    writer.writerow([pair, syn_type, pre_exp_id, pre_id, pre_class, pre_layer, 
                                         post_exp_id, post_id, post_class, post_layer, PSP_means, i, r])

Now instead of having around 200 pair recordings with PSP values I have 1248!!!
Does this number sound correct?

That doesn’t sound unreasonable. You’re writing enough info to your file that you could check for duplicates for the same pair. I would double check your r variable for the “other freq” loop. You’re writing it to your file but you aren’t actually tracking it there.

To plot all synaptic traces, you’ll need to iterate through your recordings and pairs. Here’s a simplified example:
import matplotlib.pyplot as plt
from aisynphys.ui.notebook import plot_stim_sorted_pulse_amp

Assuming ‘recordings’ is a set of your data.nwb files

for recording_file in recordings:
pair = db.experiment_from_ext_id(‘#exp_id’).pairs[(‘#pre’, ‘#post’)]

fig, ax = plt.subplots()
plot_stim_sorted_pulse_amp(pair, ax)
plt.show()

The PyNWB error ValueError: No data_type found for builder root indicates that PyNWB is having trouble reading the NWB file correctly for CRM enrichment. Ensure that you are using the correct PyNWB version that is compatible with your NWB file version.