Residue-residue distance in EDS on native structures.

  • EDS variants vary in how Euclidean distance between (amino-acid) residues are calculated when finding EDS paths and identifying shortcut edges.
  • The effects on shortcut edges and initial fold step of five EDS variants are examined:
    • CA-CA distance ('a', default, studied so far),
    • CB-CB distance ('b'),
    • shorter of CA-CA and CB-CB distance ('s'),
    • longer of CA-CA and CB-CB distance ('l'), and
    • mean CA-CA and CB-CB distance ('v').
In [1]:
EDV = ['a', 'b', 's', 'l', 'v']
In [2]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Canonical set of single-domain protein chains used to study folding pathways.

In [3]:
dirname = "SingleDomain"
PIDs = [ "2f4k", "1bdd", "2abd_0", "1gb1_0", "1mi0_swissmin", "1mhx_swissmin", "2ptl_0", 
        "1shg", "1srm_1", "2ci2", "1aps_0", "2kjv_0", "2kjw_0", "1qys" ]

Effect on number of shortcuts and number of long-range shortcuts.

  • The number of edges and PRN0 contact map is identical for all EDS variants. The PRN0 construction method is not changed, only the distance used by EDS to construct paths is varied.
In [4]:
fn = os.path.join(dirname, "NS_stats.out")
df = pd.read_table(fn, sep='\s+')
df = df[["pid", "numNodes", "numLinks", "numSC", "numSCLE"]]
x = df.pid.values
In [5]:
fig, axes = plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)
ax1 = axes[0]; ax2 = axes[1];

ax1.set_title("Number of shortcuts per structure by EDS variant.")
ax1.plot(x, df.numSC.values, '.--', label='a' + ' ' + str(np.round(df.numSC.mean(), 3)))
ax2.set_title("Number of long-range shortcuts per structure by EDS variant.")
ax2.plot(x, df.numSCLE.values, '.--', label='a' + ' ' + str(np.round(df.numSCLE.mean(), 3)))

for v in EDV[1:]:
    fn = os.path.join(dirname, "NS_stats_" + v + ".out")
    df = pd.read_table(fn, sep='\s+')
    ax1.plot(x, df.numSC.values, '.--', label = v + ' ' + str(np.round(df.numSC.mean(), 3)))
    ax2.plot(x, df.numSCLE.values, '.--', label= v + ' ' + str(np.round(df.numSCLE.mean(), 3)))

# matplotlib 3.3.0
# https://stackoverflow.com/questions/63723514/userwarning-fixedformatter-should-only-be-used-together-with-fixedlocator
ax1.set_xticks(x); ax1.set_xticklabels(x, rotation=90); ax1.grid()
ax1.legend(title="EDV mean", loc='upper left', bbox_to_anchor=(1.0, 1.0))
ax2.set_xticks(x); ax2.set_xticklabels(x, rotation=90); ax2.grid()
_ = ax2.legend(title="EDV mean", loc='upper left', bbox_to_anchor=(1.0, 1.0))
  • EDS variants 'a', 's', and 'v' produce roughly the same number of (long-range) shortcuts.
  • EDS variants 'l' and 'b' produce fewer (long-range) shortcuts.
  • Using CB-CB instead of CA-CA distances yield fewer (long-range) shortcuts.
In [6]:
from scipy.stats import ttest_rel
def onesided_paired_ttest(a, b):
    stats, pval = ttest_rel(a, b) # two-sided paired ttest
    onesided_ttest_alt = ' < ' if stats < 0 else ' > '
    onesided_ttest_pval = pval/2    
    return (onesided_ttest_alt, onesided_ttest_pval)
In [7]:
# EDS may use distance between any residue pair, not just edges!
# compare CA-CA with CB-CB distances of edges (a sample)
CA_eds, CB_eds = [], []
for pid in PIDs:
    fn = os.path.join(dirname, pid + "_edges.out")
    da = pd.read_table(fn, sep='\s+')
    CA_eds.append(da.ed.mean())
    fn = os.path.join(dirname, pid + "_edges_b.out")
    db = pd.read_table(fn, sep='\s+')
    CB_eds.append(db.ed.mean())
    print(pid, np.round([da.ed.mean(), da.ed.std(), db.ed.mean(), db.ed.std()], 3))
2f4k [8.234 2.389 8.362 2.103]
1bdd [8.281 2.2   8.455 2.035]
2abd_0 [8.232 2.099 8.289 1.994]
1gb1_0 [8.079 2.113 8.147 1.966]
1mi0_swissmin [7.494 1.983 7.674 1.871]
1mhx_swissmin [7.271 1.803 7.454 1.682]
2ptl_0 [8.12  2.109 8.251 2.079]
1shg [7.308 1.798 7.385 1.68 ]
1srm_1 [8.068 1.983 8.077 1.881]
2ci2 [7.577 1.999 7.588 1.8  ]
1aps_0 [7.737 1.986 7.893 1.913]
2kjv_0 [8.269 2.148 8.316 1.956]
2kjw_0 [8.373 2.26  8.35  2.082]
1qys [7.294 1.834 7.445 1.656]
In [8]:
# On average, CA-CA distances are significantly shorter than CB-CB distances.
print(onesided_paired_ttest(CB_eds, CA_eds))
plt.plot(CA_eds, label='CA')
plt.plot(CB_eds, label='CB')
_ = plt.legend();
(' > ', 9.609018755856016e-05)
  • On average, CA-CA distances are significantly shorter than CB-CB distances.

Effect on SCN0 contact map.

In [9]:
# Plot the native-state contact map on the given axis,
# Black and red cells denote PRN0 edges and shortcuts, respectively.
import matplotlib.colors as matcol
def plot_NS_contactmap(cmwsc, ax):
    n, m = cmwsc.shape
    ax.imshow(cmwsc, origin="lower", cmap=matcol.ListedColormap(['white', 'black', 'red']))    
    ax.set_xticks(np.arange(0, n, 10))    
    ax.set_yticks(np.arange(0, m, 10)) 
In [10]:
def plot_EDV_NScontactmaps(pid):
    fig1, axes1 = plt.subplots(2, 3, figsize=(9, 6), constrained_layout=True)
    fig1.suptitle("Native contact maps by EDS variant " + pid, fontsize=14)
    axes1[1][2].remove()
    for i, v in enumerate(EDV):
        if (v == 'a'):
            fn = os.path.join(dirname, pid + "_cmwsc.out")
        else:
            fn = os.path.join(dirname, pid + "_cmwsc_" + v + ".out")
        df = pd.read_table(fn, sep=' ', header=None)
        r = i//3; c = i%3; ax = axes1[r][c] 
        plot_NS_contactmap(df.values, ax)
        ax.set_title("EDS variant " + v)
    plt.savefig("EDV_NAcontactmaps_" + pid + ".png")
    return
In [11]:
plot_EDV_NScontactmaps("2ptl_0")
In [12]:
# for interactivity
%matplotlib inline
from ipywidgets import interact
In [13]:
_ = interact(plot_EDV_NScontactmaps, pid = PIDs)

Effect on initial $C_{SCN0}$ fold step.

In [14]:
def initial_foldstep(pid, edv):
    """Returns the initial pair of SSEs as a comma-separated string."""
    if edv == 'a':
        fn = os.path.join(dirname, pid + "_scn0_pathway.out")
    else:
        fn = os.path.join(dirname, pid + "_scn0_pathway_" + edv + ".out")    
    df = pd.read_table(fn, sep='\t')
    configs = df.config.values
    combi = configs[1].split() 
    first = [ e for e in combi if len(e) > 1 ][0] # get the string within the list
    return first
In [15]:
# initial fold step by EDS variant
firsts_dict = {}
for pid in PIDs:
    firsts = [ initial_foldstep(pid, v) for v in EDV ]
    # print(pid, firsts)
    firsts_dict[pid] = firsts
df = pd.DataFrame(firsts_dict).transpose()
df.columns = EDV
df    
Out[15]:
a b s l v
2f4k 1,3 3,5 3,5 1,3 3,5
1bdd 3,5 3,5 3,5 3,5 3,5
2abd_0 5,7 3,5 1,3 3,5 1,3
1gb1_0 6,8 0,2 0,2 6,8 0,2
1mi0_swissmin 1,3 1,3 1,3 1,3 1,3
1mhx_swissmin 1,3 7,9 7,9 1,3 1,3
2ptl_0 0,2 6,8 0,2 6,8 6,8
1shg 5,7 5,7 5,7 5,7 5,7
1srm_1 3,5 3,5 5,7 3,5 5,7
2ci2 5,7 5,7 5,7 5,7 5,7
1aps_0 5,7 9,11 5,7 5,7 5,7
2kjv_0 5,7 5,7 5,7 5,7 5,7
2kjw_0 5,7 5,7 5,7 5,7 5,7
1qys 1,3 1,3 1,3 1,3 1,3
In [16]:
# agreement with initial foldstep of EDS variant 'a'
agreefirst_df = pd.DataFrame(df.a)
for v in EDV[1:]:
    b = (df.a == df[v]) # series
    b.name = v
    agreefirst_df = agreefirst_df.join(b)
    print(v, b.sum())
agreefirst_df
b 8
s 9
l 12
v 9
Out[16]:
a b s l v
2f4k 1,3 False False True False
1bdd 3,5 True True True True
2abd_0 5,7 False False False False
1gb1_0 6,8 False False True False
1mi0_swissmin 1,3 True True True True
1mhx_swissmin 1,3 False False True True
2ptl_0 0,2 False True False False
1shg 5,7 True True True True
1srm_1 3,5 True False True False
2ci2 5,7 True True True True
1aps_0 5,7 False True True True
2kjv_0 5,7 True True True True
2kjw_0 5,7 True True True True
1qys 1,3 True True True True
  • In terms of initial fold step, EDS variant 'l' produces the most similar behavior with the default EDS variant ('a').
    • The initial fold steps determined with EDS variant 'l' are identical with those determined with 'a' for all but two structures (2abd and 2ptl).
  • Structures with initial fold steps unaffected by EDS variant: 1bdd, 1mi0, 1shg, 2ci2, 2kjv, 2kjw and 1qys.

  • No EDS variant produces the same initial fold steps as 'a' for the four structures: 1gb1, 2ptl, 1mi0 and 1mhx.

End of notebook

SKLC September 2020