Manage DATA.NWB files of human synapses

Hello,

I would like to plot and analyze the synaptic traces that you have on your medium size data base for human.

I managed to load the sqlite database file and download the data.nwb files with the recordings, 86 in total. Then I created a python set with them. I can check how many sweeps and devices there are per recording, however when I try to plot one of them like:

import matplotlib.pyplot as plt
from aisynphys.ui.notebook import plot_stim_sorted_pulse_amp

fig, ax = plt.subplots()

pair = db.experiment_from_ext_id(’#exp_id’).pairs[(’#pre’, ‘#post’)]

plot_stim_sorted_pulse_amp(pair, ax)

I always have an AttributeError. My question is, how can I plot all of them and how can I know which one is suitable for being plotted.

I also tried to open the data.nwb files with PyNWB but I get this error: ValueError: No data_type found for builder root.

Hi Natali,

For the plotting error it would be helpful to see the whole error to understand where the code is breaking. After we sort that out, it will be the case that not all of the data is suitable for plotting in this way, as you say. There is no straightforward way of knowing other than to try, the code should give you an error that there is no data to plot if that is the case.

I am actually not as familiar with using PyNWB so I can’t say what is causing that error. I’ll see if my colleague @luke has more info.

Thanks!

Here is the code:

Create plot_pairs

import matplotlib.pyplot as plt
from aisynphys.ui.notebook import plot_stim_sorted_pulse_amp

fig, ax = plt.subplots()

pair = db.experiment_from_ext_id(‘1536790789.296’).pairs[(‘5’, ‘3’)]

plot_stim_sorted_pulse_amp(pair, ax)

And here is error the output:


Exception Traceback (most recent call last)
Input In [17], in
5 fig, ax = plt.subplots()
7 pair = db.experiment_from_ext_id(‘1536790789.296’).pairs[(‘5’, ‘3’)]
----> 9 plot_stim_sorted_pulse_amp(pair, ax)

File ~/aisynphys/aisynphys/ui/notebook.py:754, in plot_stim_sorted_pulse_amp(pair, ax, ind_f, color)
753 def plot_stim_sorted_pulse_amp(pair, ax, ind_f=50, color=‘k’):
→ 754 qc_pass_data = stim_sorted_pulse_amp(pair)
756 # scatter plots of event amplitudes sorted by pulse number
757 mask = qc_pass_data[‘induction_frequency’] == ind_f

File ~/aisynphys/aisynphys/dynamics.py:333, in stim_sorted_pulse_amp(pair)
330 def stim_sorted_pulse_amp(pair):
331 qc_field = pair.synapse.synapse_type + ‘_qc_pass’
→ 333 q = db.query(
334 db.PulseResponseFit.fit_amp,
335 db.PulseResponseFit.dec_fit_reconv_amp,
336 db.PulseResponseFit.baseline_dec_fit_reconv_amp,
337
338 getattr(db.PulseResponse, qc_field).label(‘qc_pass’),
339 db.StimPulse.pulse_number,
340 db.MultiPatchProbe.induction_frequency,
341 db.MultiPatchProbe.recovery_delay,
342 db.SyncRec.ext_id.label(‘sync_rec_ext_id’),
343 )
344 q = q.join(db.PulseResponse, db.PulseResponseFit.pulse_response)
345 q = q.join(db.Recording, db.PulseResponse.recording)

File ~/aisynphys/aisynphys/database/init.py:12, in NoDatabase.getattr(self, attr)
11 def getattr(self, attr):
—> 12 raise self.exception

Exception: No database was specified in config.synphys_db_host or with CLI flags --db-version or --db-host

Hi Natali,

Apologies on that error. You need to have a config.yml file to point at the database you have downloaded and then hopefully it should work. In any text editor put the following 2 lines:

synphys_db_host = "sqlite:///"
synphys_db = "path/to/database.sqlite"

Leave synphys_db_host as is and set synphys_db to be where the downloaded database sqlite file is on your machine. That should allow the code in that function to point at your version of the database and execute the query where the error is being generated. Save that file as config.yml in the outer directory of the aisynphys repository. Let me know if that doesn’t work.

I created the file ‘config.yml’ as you suggested and now I have a new error.

I run this:

from aisynphys.database import SynphysDatabase

Create plot_pairs

import matplotlib.pyplot as plt
from aisynphys.ui.notebook import plot_stim_sorted_pulse_amp

Load all synapses associated with human projects

First load the data base

path_to_db = ‘/Users/natalibarros/ai_synphys_cache/database/synphys_r2.0-pre4_medium.sqlite’
db = SynphysDatabase.load_version(path_to_db)

fig, ax = plt.subplots()

pair = db.experiment_from_ext_id(‘1536790789.296’).pairs[(‘5’, ‘3’)]

plot_stim_sorted_pulse_amp(pair, ax)

and I get this:


AttributeError Traceback (most recent call last)
Input In [1], in
----> 1 from aisynphys.database import SynphysDatabase
3 # Create plot_pairs
4 import matplotlib.pyplot as plt

File ~/aisynphys/aisynphys/init.py:2, in
1 # import config to be sure that we get early access to some command line flags
----> 2 from . import config

File ~/aisynphys/aisynphys/config.py:49, in
46 if config is None:
47 config = {}
—> 49 for k,v in config.items():
50 locals()[k] = v
53 # intercept specific command line args

AttributeError: ‘str’ object has no attribute ‘items’

Sorry I think I placed the ‘config.yml’ file in the workg place, now it looks like it is reading it, but I have key error.

Now I tried this:

pair = db.experiment_from_ext_id(‘1515610888.871’).pairs[(‘0’, ‘1’)]

I know that I have the data.nwb file, because it is downloaded and I know that the sweep.devices are [0, 1]. However, I get this error:


KeyError Traceback (most recent call last)
Input In [3], in
10 db = SynphysDatabase.load_version(path_to_db)
12 fig, ax = plt.subplots()
—> 14 pair = db.experiment_from_ext_id(‘1515610888.871’).pairs[(‘0’, ‘1’)]
16 plot_stim_sorted_pulse_amp(pair, ax)

KeyError: (‘0’, ‘1’)

If I try another experiment like this (that worked for you):

pair = db.experiment_from_ext_id(‘1521004040.059’).pairs[(‘2’, ‘3’)]

I get this:


AttributeError Traceback (most recent call last)
Input In [4], in
14 #pair = db.experiment_from_ext_id(‘1515610888.871’).pairs[(‘0’, ‘1’)]
16 pair = db.experiment_from_ext_id(‘1521004040.059’).pairs[(‘2’, ‘3’)]
—> 18 plot_stim_sorted_pulse_amp(pair, ax)

File ~/aisynphys/aisynphys/ui/notebook.py:754, in plot_stim_sorted_pulse_amp(pair, ax, ind_f, color)
753 def plot_stim_sorted_pulse_amp(pair, ax, ind_f=50, color=‘k’):
→ 754 qc_pass_data = stim_sorted_pulse_amp(pair)
756 # scatter plots of event amplitudes sorted by pulse number
757 mask = qc_pass_data[‘induction_frequency’] == ind_f

File ~/aisynphys/aisynphys/dynamics.py:331, in stim_sorted_pulse_amp(pair)
330 def stim_sorted_pulse_amp(pair):
→ 331 qc_field = pair.synapse.synapse_type + ‘_qc_pass’
333 q = db.query(
334 db.PulseResponseFit.fit_amp,
335 db.PulseResponseFit.dec_fit_reconv_amp,
(…)
342 db.SyncRec.ext_id.label(‘sync_rec_ext_id’),
343 )
344 q = q.join(db.PulseResponse, db.PulseResponseFit.pulse_response)

AttributeError: ‘NoneType’ object has no attribute ‘synapse_type’

How can I know that I can plot one specific experiment?

Thank you

Hi Natali,

Unfortunately I think the short answer is that to find suitable pairs will just take some trial and error. There are a few ways though that you can narrow your search. We will also do what we can on our end to make more graceful and informative error messages to let users know whether data is available or not.

For your initial example experiment 1515610888.871 we can first see how many pairs there are:

pairs = db.experiment_from_ext_id('1515610888.871').pairs
pairs
>>{('1', '2'): <Pair 1515610888.871 1 2>,
 ('2', '1'): <Pair 1515610888.871 2 1>,
 ('1', '3'): <Pair 1515610888.871 1 3>,
 ('1', '4'): <Pair 1515610888.871 1 4>,
 ('2', '3'): <Pair 1515610888.871 2 3>,
 ('2', '4'): <Pair 1515610888.871 2 4>,
 ('3', '1'): <Pair 1515610888.871 3 1>,
 ('3', '2'): <Pair 1515610888.871 3 2>,
 ('3', '4'): <Pair 1515610888.871 3 4>,
 ('4', '1'): <Pair 1515610888.871 4 1>,
 ('4', '2'): <Pair 1515610888.871 4 2>,
 ('4', '3'): <Pair 1515610888.871 4 3>}

From that list, not all pairs will have a connection. To figure that out we can do:

pairs_with_synapse = [pair for pair in pairs.values() if pair.has_synapse is True]
pairs_with_synapse
>>[<Pair 1515610888.871 1 2>]

From this we see that just one pair had a connection. We only expect connected pairs to have data that will plot PSP amplitudes in the function you are trying to use.
However, not all connections have PSP amplitudes that we could reliably fit and thus quantify as in those plots. These amplitudes are used to measure STP and so a short-cut way of seeing whether this pair has data that will be plottable is to check the STP measurements in the dynamics table as such:

pairs_with_STP_data = [pair for pair in pairs_with_synapse if np.isfinite(pair.dynamics.stp_induction_50hz)]
pairs_with_STP_data
>>[]

Unfortunately this returns an empty list, so there aren’t any pairs in this experiment that would have data to plot in this way.

Instead of going through all of these experiments one by one, you could use the pair_query to return pairs that do have this data:

query = db.pair_query(
    experiment_type='standard_multipatch',   # filter: just multipatch experiments
    species='human',                         # filter: only human data
    synapse=True,                            # filter: only cell pairs connected by synapse
    filter_exprs = [db.Dynamics.stp_induction_50hz != np.nan] # filter: only connections that have STP data
)
pairs_with_stp = query.all()
print(len(pairs_with_stp))
>> 264
pair = pairs_with_stp[0]       # let's just grab the first pair
pair
>> <Pair 1488403059.445 2 8>
# Now we can try plotting this and see if it works
fig, ax = plt.subplots()
pair = db.experiment_from_ext_id('1488403059.445').pairs[('2', '8')]
plot_stim_sorted_pulse_amp(pair, ax, avg_line=True)

image

Hello Stephanie,

Thank you very much for this explanation.

Now I see that I have a big issue, when I do this:

query = db.pair_query(
experiment_type=‘standard_multipatch’, # filter: just multipatch experiments
species=‘human’, # filter: only human data
synapse=True, # filter: only cell pairs connected by synapse
filter_exprs = [db.Dynamics.stp_induction_50hz != np.nan] # filter: only connections that have STP data
)
pairs_with_stp = query.all()

print(len(pairs_with_stp))

The output is: 0

Which means that there are no stp data in the database that I downloaded (medium size)…
How is this possibIe? It looks like I don’t have the same version of the database as you do?

hmmm that definitely shouldn’t be the case. Also apologies, there shouldn’t be an underscore in standard multipatch. So the start of the query should be

query = db.pair_query(
experiment_type='standard multipatch'
....
)

This shouldn’t be the cause of not returning any pairs though. Can you remind me what version of the database you are using? (I know it’s the medium size)
Thanks

I don’t know exactly how to check the database version.

This is the file name:
synphys_r2.0-pre4_medium.sqlite

I re-did the entire process again, just in case I had an old version of the git repository:

git clone GitHub - AllenInstitute/aisynphys: Analysis tools specific to the aiephys multipatch project.
cd aisynphys

conda env create --name aisynphys --file desktop-environment.yml
conda activate aisynphys

python setup.py develop
cd …

And now when I do:
from aisynphys.database import SynphysDatabase

I get an error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/var/folders/gk/m4t4s77d71773t401jmnwyw00000gn/T/ipykernel_34233/3531584690.py in <module>
----> 1 from aisynphys.database import SynphysDatabase

~/aisynphys/aisynphys/database/__init__.py in <module>
      1 from .. import config
----> 2 from .database import Database
      3 from .synphys_database import SynphysDatabase
      4 
      5 

~/aisynphys/aisynphys/database/database.py in <module>
     29 
     30 from .. import config
---> 31 from neuroanalysis.util.optional_import import optional_import
     32 pandas = optional_import('pandas')
     33 

ModuleNotFoundError: No module named 'neuroanalysis'

Hello Stephanie,

I solved the previous (ModuleNotFoundError) issue by installing neuroanalysis package on the virtual environment and I downloaded a new database, now the name of the file is: synphys_r2.0_medium.sqlite.

However, I still have the same problem as before. When doing this:

query = db.pair_query(
    experiment_type='standard_multipatch',   # filter: just multipatch experiments
    species='human',                         # filter: only human data
    synapse=True,                            # filter: only cell pairs connected by synapse
    filter_exprs = [db.Dynamics.stp_induction_50hz != np.nan] # filter: only connections that have STP data
)
pairs_with_stp = query.all()
print(len(pairs_with_stp))

I get again 0.

Hi Natali,

I’m not sure why that is happening, but it appears to be the way filter_exprs is working. I can assure you that there are data with STP in the dataset. While we investigate that you can do this same filtering but outside of the query. To do this I would use your query here but remove the filter_exprs (also note that there should be no underscore in standard multipatch). Then you can use a list comprehension to get just pairs from that query which have STP data like so:

query = db.pair_query(
    experiment_type='standard multipatch',   # filter: just multipatch experiments
    species='human',                         # filter: only human data
    synapse=True,                            # filter: only cell pairs connected by synapse
)
pairs = query.all()
print(len(pairs))
>>329
pairs_with_stp = [p for p in pairs if p.dynamics.stp_induction_50hz is not None]
print(len(pairs_with_stp))
>> 185

Hello Stephanie,

Thank you! Now it works!
I just have three more questions, for each pair:

  1. How can I get the pre_cell and post_cell class
  2. How can I get the values of the averaged pulse amplitudes
  3. How can I get the stimulation times or at least the stimulation frequency?

Thank you.

Hello,

I think I manage to extract the information that I need.
Would you mind to check if it looks ok?

Check pre and post cell class and layers:

> pair = db.experiment_from_ext_id('1488403059.445').pairs[('2', '8')]
> 
> pre_class = pair.pre_cell.cell_class
> pre_layer = pair.pre_cell.target_layer
> post_class = pair.post_cell.cell_class
> post_layer = pair.post_cell.target_layer
> 
> print('pre_cell class: %s, layer: %s, cell_exp_id: %s, cell_id: %s' %(pre_class, pre_layer, pair.pre_cell, pair.pre_cell_id))
> print('post_cell class: %s, layer: %s, cell_exp_id: %s, cell_id: %s' %(post_class, post_layer, pair.post_cell, pair.post_cell_id))
> print('synapse type: ', pair.synapse.synapse_type)
> pre_cell class: ex, layer: 2, cell_exp_id: <Cell 1488403059.445 2>, cell_id: 1650
> post_cell class: ex, layer: 2, cell_exp_id: <Cell 1488403059.445 8>, cell_id: 1656
> synapse type:  ex

Find PSP mean amps:

qc_pass_data = stim_sorted_pulse_amp(pair, db=db)

#print(qc_pass_data)

ind_f = 50
# scatter plots of event amplitudes sorted by pulse number 
mask = qc_pass_data['induction_frequency'] == ind_f

filtered = qc_pass_data[mask].copy()

sign = 1 if pair.synapse.synapse_type == 'ex' else -1
try:
    filtered['dec_fit_reconv_amp'] *= sign * 1000
except KeyError:
    print('No fit amps for pair: %s' % pair)

pulse_means = filtered.groupby('pulse_number').mean()['dec_fit_reconv_amp'].to_list()
        
#pulses = np.arange(1, 13)
#plt.plot(pulses, pulse_means)
print(pulse_means)

[0.3202318140359256, 0.503264249598427, 0.4523513450421059, 0.39244920739261635, 0.2576733228294735, 0.32470229641458237, 0.2734045214946149, 0.3265600061698008, 0.5724556953954355, 0.4246104297168553, 0.4142333743441119, 0.30915497784425194]

I have a couple of questions:

  • is the stimulation always at 50Hz?
  • What are the units for pulse_means? mA? mV?

Thank you

Hello,

I think I managed to get the data that I need.

Would you mind to check if it looks ok?

Get pre and post cell classes:

pair = db.experiment_from_ext_id('1488403059.445').pairs[('2', '8')]

pre_class = pair.pre_cell.cell_class
pre_layer = pair.pre_cell.target_layer
post_class = pair.post_cell.cell_class
post_layer = pair.post_cell.target_layer

print('pre_cell class: %s, layer: %s, cell_exp_id: %s, cell_id: %s' %(pre_class, pre_layer, pair.pre_cell, pair.pre_cell_id))
print('post_cell class: %s, layer: %s, cell_exp_id: %s, cell_id: %s' %(post_class, post_layer, pair.post_cell, pair.post_cell_id))
print('synapse type: ', pair.synapse.synapse_type)

OUTPUT:
pre_cell class: ex, layer: 2, cell_exp_id: <Cell 1488403059.445 2>, cell_id: 1650
post_cell class: ex, layer: 2, cell_exp_id: <Cell 1488403059.445 8>, cell_id: 1656
synapse type:  ex

Get mean PSP amps:

qc_pass_data = stim_sorted_pulse_amp(pair, db=db)

#print(qc_pass_data)

ind_f = 50
# scatter plots of event amplitudes sorted by pulse number 
mask = qc_pass_data['induction_frequency'] == ind_f

filtered = qc_pass_data[mask].copy()

sign = 1 if pair.synapse.synapse_type == 'ex' else -1
try:
    filtered['dec_fit_reconv_amp'] *= sign * 1000
except KeyError:
    print('No fit amps for pair: %s' % pair)

pulse_means = filtered.groupby('pulse_number').mean()['dec_fit_reconv_amp'].to_list()
        
#pulses = np.arange(1, 13)
#plt.plot(pulses, pulse_means)
print(pulse_means)

OUTPUT:
[0.3202318140359256, 0.503264249598427, 0.4523513450421059, 0.39244920739261635, 0.2576733228294735, 0.32470229641458237, 0.2734045214946149, 0.3265600061698008, 0.5724556953954355, 0.4246104297168553, 0.4142333743441119, 0.30915497784425194]

I have a couple of questions:

  • Is the frequency always 50 Hz?
  • What are the units for pulse_means? mA? mV?

Thank you!

Hi Natali,
One note about your code above. I would replace:

pre_layer = pair.pre_cell.target_layer

with

pre_layer = pair.pre_cell.cortical_location.cortical_layer

…because target_layer is simply the layer that the experimenter was targeting during the experiment, whereas cortical_layer is the location later verified by histological staining. These fields are documented at Database Schema — aisynphys documentation and Database Schema — aisynphys documentation.

Is the frequency always 50 Hz?

No; you can see this in the dataframe retuened by stim_sorted_pulse_amp:

np.unique(qc_pass_data[‘induction_frequency’])

array([ 10., 20., 25., 50., 100.])

However, note that most data will have been acquired at 50 Hz with varying recovery delays. Our stimuli are described here: Synaptic Physiology Methods: Experimental Stimuli - brain-map.org

What are the units for pulse_means? mA? mV?

The function stim_sorted_pulse_amp really just generates a database query and returns a dataframe of results, filtered for quality control. One of the query criteria listed inside that function is

q = q.filter(db.PatchClampRecording.clamp_mode=='ic')

So this function returns only data from current-clamp experiments, and thus the values are returned in Volts (not mV; all values in the database are in unscaled SI units).

Thank you very much Luke.

Just one last question, when are you planning to release the full version of the database?

This is already released! You should be able to download it with

db = SynphysDatabase.load_current(‘full’)

It’s a few hundred GB so make sure you have enough space for it and be prepared to wait. Also you can stop / restart the download at any time and it should resume wherever it left off.

Hello Luke,

I downloaded the full size DB, but I’m surprised that when I check on the number of synapses, number of experiments and number of recordings, I get exactly the same numbers that with the medium DB size.

# Download and cache the sqlite file for the requested database
# (for available versions, see SynphysDatabase.list_versions)
# db = SynphysDatabase.load_version('synphys_r1.0_2019-08-29_small.sqlite')

# Load all synapses associated with human projects
# First load the data base
path_to_db = '/Users/natalibarros/ai_synphys_cache/database/synphys_r2.0_full.sqlite'
db = SynphysDatabase.load_version(path_to_db)

pairs = db.pair_query(project_name=db.human_projects, synapse=True).all()

print("loaded %d synapses" % len(pairs))

print("number of experiments %d" % len(set([pair.experiment for pair in pairs])))

print("number of recordings %d" %len(set([pair.experiment.data for pair in pairs])))

loaded 329 synapses
number of experiments 133
number of recordings 86

Am I doing something wrong? I was expecting higher numbers.

Hi Natali,

The size of the different database versions refers not to the number of experiments but rather the type (size) of data that is included, so those numbers you calculated should be exactly the same across all of the databases. At the bottom of this page on our website you can see what kind of data is included in each database and thus gives it its “size”. For example, the “Full” database includes time-series data for each stimulus; these snippets of recordings are very large and thus owe to the > 160 GB size of that database. We did not anticipate most people needing this raw data and thus made the smaller databases, which have all of the same experiments and synapses but with our pre-calculated features for people to more easily explore.