Calculated phi-values from ND PRNs.

  • ND PRNs generated with EDS variant = 'a'.
  • Explore implication of ND node statistics normed with native-state statistics: NS node degree, and total number of EDS paths (N*N-N).
  • Observations:
    • Since EDS paths can have a backtrack segment, a node can be traversed by an EDS path more than once. This means node centrality can exceed the total number of paths (1lmb4).
      • An alternative is to ignore backtrack segments and count appearance of each node on an EDS path only once. Still EDS path?
    • Less variation between node statistics, and hence calculated phi-values.
    • Variation decreases as networks become more native-like since all calculated phi-values converge to 1.0 (or more).
    • As $Q$ increases, MAD from exp_phi increases, and $Q$ where minimum MAD from exp_phi is located is pushed towards smaller $Q$s.
    • Min MAD depends on average of exp_phi values (whether majority is closer to 1.0 or 0.0)?
    • Average MAD from exp_phi is smaller, compared to normalization with max. This could be the result of better matching with smaller exp_phi values when $Q$ is small.
    • MAD between min$Q$s and peak $Q$s is much smaller, compared to normalization with max.
    • Less difference between biased ND variants and the other ('r', 'y') ND variants.
    • 1lmb4 seems to benefit most (has better results) with this form of normalization.
In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from scipy.stats import spearmanr, ttest_rel
from scipy.stats import ttest_ind_from_stats as tts
In [2]:
Qs =[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
dirname = "Paci_ND3"
PIDs = ["1imq", "1lmb4", "1bf4", "2ptl", "2ci2", "1aps", "1fmk", "1bk2", "1shf", "1ten"]
In [3]:
EDVs = ['a'] # EDVs = ['a', 'b', 's', 'l', 'v']
NDVs = ['c', 'x', 'r', 'a', 'y']
ndv_colors = ['tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple']
In [4]:
# computed previously
c_peak_Qs = np.array([0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.3])
x_peak_Qs = np.array([0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2])
r_peak_Qs = np.array([0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2])
a_peak_Qs = np.array([0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2])
y_peak_Qs = np.array([0.2, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1])

# cxray
peak_Qs = np.column_stack([c_peak_Qs, x_peak_Qs, r_peak_Qs, a_peak_Qs, y_peak_Qs])
peak_Qs
Out[4]:
array([[0.2, 0.2, 0.2, 0.2, 0.2],
       [0.3, 0.3, 0.2, 0.3, 0.1],
       [0.3, 0.3, 0.2, 0.3, 0.2],
       [0.3, 0.3, 0.2, 0.3, 0.2],
       [0.3, 0.3, 0.2, 0.3, 0.2],
       [0.3, 0.3, 0.2, 0.3, 0.2],
       [0.2, 0.2, 0.2, 0.2, 0.1],
       [0.2, 0.2, 0.1, 0.2, 0.1],
       [0.2, 0.2, 0.2, 0.2, 0.1],
       [0.3, 0.2, 0.2, 0.2, 0.1]])

Mean Absolute Difference/Errror between experimental $\phi$ values and calculated $\phi$ values.

In [5]:
def mad_from_phi(ndv, edv):        
    dirn = os.path.join(dirname, "edv_a") # ND data
    ofn = os.path.join(dirname, ndv + "_phi_mad.out")
    fout = open(ofn, "w"); fout.close() # truncates existing file
    fout = open(ofn, "a") # appends to file 
    fout.write("pid Q mad_deg_avg mad_deg_avg2 mad_cent_avg mad_cent_avg2\n")
    for pid in PIDs:
        fn0 = os.path.join(dirname, pid + "_nodestats.out") # NS
        df0 = pd.read_table(fn0, sep="\s+") # nid exp_phi deg0 cent0
        N = len(df0)
        fn1 = os.path.join(dirn, pid + "_" + ndv + "_nodestats.out") # ND
        df1 = pd.read_table(fn1, sep="\s+") # Q nid deg_avg cent_avg
        for q in Qs:
            dq = df1.query("Q == @q")                        
            m = dq.deg_avg.max() # normalize then select phi residues
            dq = dq.assign(deg_avg_norm = dq.deg_avg.values/m)
            # prop. of native degree
            dq = dq.assign(deg_avg_norm2 = dq.deg_avg.values/df0.deg0.values) 
            m = dq.cent_avg.max()
            dq = dq.assign(cent_avg_norm = dq.cent_avg.values/m)
            # fraction of all paths
            dq = dq.assign(cent_avg_norm2 = dq.cent_avg.values/(N*N-N))
            dq = dq.assign(exp_phi = df0.exp_phi.values)
            de = dq.query("exp_phi > -1") 
            a = np.round(np.mean(np.abs(de.deg_avg_norm.values - de.exp_phi.values)), 3)
            b = np.round(np.mean(np.abs(de.cent_avg_norm.values - de.exp_phi.values)), 3)
            a2 = np.round(np.mean(np.abs(de.deg_avg_norm2.values - de.exp_phi.values)), 3)
            b2 = np.round(np.mean(np.abs(de.cent_avg_norm2.values - de.exp_phi.values)), 3)
            fout.write("{0} {1} {2} {3} {4} {5}\n".format(pid, q, a, a2, b, b2))
    fout.close() 
    df = pd.read_table(ofn, sep=' ')
    # in general a2 > a
    # count = np.sum(df.mad_deg_avg.values > df.mad_deg_avg2.values) # 21, 22, 19, 21, 14 
    # in general b2 > b
    # count = np.sum(df.mad_cent_avg.values > df.mad_cent_avg2.values) # 16, 16, 56, 20, 64
    # in general a2 > b2, desire to minimize mad, but cannot conclude to use b2 at this point!
    count = np.sum(df.mad_deg_avg2.values > df.mad_cent_avg2.values) # 80, 78, 88, 86, 89    
    # print(ndv, edv, count, len(df))
    return count
In [6]:
# this cell takes some time to run, only need to do this once to create ndv_phi_mad.out files
counts = [] # deg > cent mad across all Qs for each pid
for edv in EDVs:
    row = []
    for ndv in NDVs:
        count = mad_from_phi(ndv, edv)
        row.append(count)
    counts.append(row)

counts 
Out[6]:
[[80, 78, 88, 86, 89]]
In [7]:
# formats counts and do summation
df = pd.DataFrame(counts).assign(EDV=EDVs).set_index("EDV")
df.columns = NDVs
s = df.sum(axis=0) # col sum
s.name = "sum"
df = df.append(s)
df = df.assign(sum = df.sum(axis=1)) # row sum
df
Out[7]:
c x r a y sum
EDV
a 80 78 88 86 89 421
sum 80 78 88 86 89 421
  • Max per cell in the table is 100: 10 pids with 10 Qs each = 100 trials per NDV,EDV pair. Smaller MAD is better.
  • At least 78% of the time (over all $Q$s), calculating phi-values with (normalized) node degree produces larger MAD than doing so with (normalised) node centrality.
    • ND phi-values calculated with node centrality correspond better with experimental phi-values than ND phi-values calculated with node degree.
    • Count results suggest to calculate ND phi-values from node centrality. But cannot conclude this yet; results with minimum MAD is suggest otherwise!
  • Another consideration beside MAD from exp. $\phi$: $Q$ where MAD is minimum is inside ND-TS region?
    • $\phi$ values are a property of TS structures.
In [8]:
# load the ndv_phi_mad.out files
edv = 'a' 
cxray_dfs = []
for ndv in NDVs:
    fn = os.path.join(dirname, ndv + "_phi_mad.out")
    df = pd.read_table(fn, sep=' ')
    cxray_dfs.append(df)
In [9]:
import random
random.seed()
# mean_95CI with resampling/bootstrap
def bootstrap(vals):
    means = np.zeros(10000); stdevs = np.zeros(10000)
    for i in np.arange(10000):
        resample = random.choices(vals, k=10) # with replacement
        means[i], stdevs[i] = np.mean(resample), np.std(resample, ddof=1)
    a, z = np.percentile(means, 2.5), np.percentile(means, 97.5) 
    moe = (z-a)/2.0
    return moe, a, z

# calc the 95%CI of the mean, with sample stdev, n >= 30
def mean_95CI(sample_mean, sample_stdev, sample_size):
    moe = (1.96 * sample_stdev)/np.sqrt(sample_size) # margin of error
    return moe, sample_mean - moe, sample_mean + moe

def summary_stats(vals):
    mu = np.mean(vals); snu = np.std(vals, ddof=1)
    gmoe, ga, gz = mean_95CI(mu, snu, 10) # Gaussian estimated
    bmoe, ba, bz = bootstrap(vals) # bootstrap
    return np.round([mu, snu, np.ptp(vals), gmoe, ga, gz, bmoe, ba, bz], 4)
In [10]:
def plot_CIs(MADstats_df, minQstats_df):
    methods = MADstats_df.columns.values; n = len(methods)
    plt_colors = plt.get_cmap("tab20")
    gcolors = plt_colors(list(range(1, n*2, 2))) # odd
    bcolors = plt_colors(list(range(0, n*2, 2))) # even
    fig, axes = plt.subplots(2, 1, figsize=(6, 8), constrained_layout=True)
    ax = axes[0]
    mu = MADstats_df.loc['sample_avg'].values    
    ga = MADstats_df.loc['95meanCI_gleft'].values
    gz = MADstats_df.loc['95meanCI_gright'].values    
    ba = MADstats_df.loc['95meanCI_bleft'].values
    bz = MADstats_df.loc['95meanCI_bright'].values    
    for i, p in enumerate(zip(ga, gz, ba, bz)):
        ax.plot([i, i], [p[0], p[1]], marker='o', color=gcolors[i])
        ax.plot([i+0.2, i+0.2], [p[2], p[3]], marker='o', color=bcolors[i], label=str(p[3])) 
        ax.scatter(i+0.2, mu[i], marker='^')
    ax.legend(title="97.5 percentile", loc="upper left", bbox_to_anchor=(1.0, 1.0))
    ax.set_title("mean MAD from exp_phi", fontsize=14)
    ax.set_ylabel("95% confidence interval", fontsize=14)
    ax.set_xticks(list(range(0, n, 1))); ax.set_xticklabels(methods, fontsize=14)    
    ax.grid()
    ax = axes[1]
    mu = minQstats_df.loc['sample_avg'].values
    ga = minQstats_df.loc['95meanCI_gleft'].values
    gz = minQstats_df.loc['95meanCI_gright'].values    
    ba = minQstats_df.loc['95meanCI_bleft'].values
    bz = minQstats_df.loc['95meanCI_bright'].values    
    for i, p in enumerate(zip(ga, gz, ba, bz)):
        ax.plot([i, i], [p[0], p[1]], marker='o', color=gcolors[i])
        ax.plot([i+0.2, i+0.2], [p[2], p[3]], marker='o', color=bcolors[i], label=str(p[3])) 
        ax.scatter(i+0.2, mu[i], marker='^')
    ax.legend(title="97.5 percentile", loc="upper left", bbox_to_anchor=(1.0, 1.0))
    ax.set_title("mean minQ", fontsize=14)
    ax.set_ylabel("95% confidence interval", fontsize=14)
    ax.set_xticks(list(range(0, n, 1))); ax.set_xticklabels(methods, fontsize=14)    
    ax.grid()
    return plt

Node centrality

In [11]:
# compare mads of ND variants
fig, axes = plt.subplots(4, 3, figsize=(12, 16), constrained_layout=True, sharey=True)
plt.suptitle("Mean absolute difference between phi-values from experiments and \n \
phi-values calculated with NS normalized node centrality from ND nets, EDS variant " + edv)
plt.delaxes(axes[3][1])
plt.delaxes(axes[3][2])

min_mads, min_Qs = [], []
for i, pid in enumerate(PIDs):
    r = i//3; c = i%3; ax = axes[r][c]    
    res = []
    for i, ndv in enumerate(NDVs):
        dp = cxray_dfs[i].query("pid == @pid")
        # mad = dp.mad_cent_avg.values 
        mad = dp.mad_cent_avg2.values 
        ax.plot(Qs, mad, 'o--', label=ndv)    
        res.append(mad)            
    ax.set_title(pid)
    ax.set_xlabel('Q'); ax.set_xticks(np.arange(0, 1.1, 0.1))
    ax.legend(); ax.grid()        
    min_mad = np.round(np.min(res, axis=1), 3) # row-wise per ND variant, over Qs for a pid
    min_q = (np.round(np.argmin(res, axis=1), 3) + 1)/10
    min_mads.append(min_mad)
    min_Qs.append(min_q)
    print(pid, min_mad, min_q)

# ofn = os.path.join(dirname, "ND_centphi_" + edv + ".png")
# print(ofn)
# plt.savefig(ofn)
1imq [0.186 0.18  0.169 0.187 0.164] [0.2 0.2 0.2 0.2 0.2]
1lmb4 [0.29  0.237 0.303 0.377 0.304] [0.3 0.3 0.9 0.4 1. ]
1bf4 [0.221 0.23  0.212 0.232 0.214] [0.1 0.1 0.2 0.1 0.2]
2ptl [0.221 0.222 0.208 0.236 0.207] [0.1 0.1 0.2 0.1 0.2]
2ci2 [0.155 0.15  0.162 0.175 0.165] [0.1 0.1 0.2 0.1 0.2]
1aps [0.193 0.21  0.192 0.211 0.195] [0.1 0.1 0.2 0.1 0.2]
1fmk [0.26  0.257 0.29  0.276 0.295] [0.1 0.1 0.2 0.1 0.2]
1bk2 [0.177 0.175 0.166 0.203 0.154] [0.1 0.1 0.3 0.1 0.3]
1shf [0.283 0.267 0.293 0.304 0.305] [0.1 0.1 0.2 0.1 0.2]
1ten [0.174 0.187 0.149 0.151 0.146] [0.1 0.1 0.2 0.1 0.3]
In [12]:
# can skip if file already created
cent_minmad_df = pd.DataFrame(min_mads)
cent_minmad_df.columns = NDVs
cent_minmad_df = cent_minmad_df.assign(pid=PIDs).set_index('pid')
ofn = os.path.join(dirname, "ND_centphi_minmad_" + edv + ".out")
cent_minmad_df.to_csv(ofn, sep=' ')

cent_MADstats_df = pd.DataFrame({ "MAD_stats": ["sample_avg", "sample_stdev", "range", \
                                  "gMOE", "95meanCI_gleft", "95meanCI_gright",
                                  "bMOE", "95meanCI_bleft", "95meanCI_bright",],
                "c": summary_stats(cent_minmad_df['c'].values), 
                "x": summary_stats(cent_minmad_df['x'].values),
                "r": summary_stats(cent_minmad_df['r'].values), 
                "a": summary_stats(cent_minmad_df['a'].values),
                "y": summary_stats(cent_minmad_df['y'].values) })
cent_MADstats_df = cent_MADstats_df.set_index("MAD_stats")

# ofn = os.path.join(dirname, "ND_cent_MADstats_" + edv + ".out")
# cent_MADstats_df.to_csv(ofn, sep=' ')
In [13]:
# can skip if file already created
cent_minQ_df = pd.DataFrame(min_Qs)
cent_minQ_df.columns = NDVs
cent_minQ_df = cent_minQ_df.assign(pid=PIDs).set_index('pid')
ofn = os.path.join(dirname, "ND_centphi_minQ_" + edv + ".out")
cent_minQ_df.to_csv(ofn, sep=' ')

cent_minQstats_df = pd.DataFrame({ "minQ_stats": ["sample_avg", "sample_stdev", "range", \
                                "gMOE", "95meanCI_gleft", "95meanCI_gright",
                                "bMOE", "95meanCI_bleft", "95meanCI_bright"],
                "c": summary_stats(cent_minQ_df['c'].values), 
                "x": summary_stats(cent_minQ_df['x'].values),
                "r": summary_stats(cent_minQ_df['r'].values), 
                "a": summary_stats(cent_minQ_df['a'].values),
                "y": summary_stats(cent_minQ_df['y'].values)})
cent_minQstats_df = cent_minQstats_df.set_index("minQ_stats")

# ofn = os.path.join(dirname, "ND_cent_minQstats_" + edv + ".out")
# cent_minQstats_df.to_csv(ofn, sep=' ')

Which ND variant produces the best calculated $\phi$ values with node centrality?

  • Two considerations:
    • min MAD from exp_phi; and
    • $Q$ where MAD from exp_phi is min is within ND-TS region.
In [14]:
# edv = 'a'
# fn = os.path.join(dirname, "ND_centphi_minmad_" + edv + ".out")
# cent_minmad_df = pd.read_table(fn, sep=' ').set_index('pid')
print(cent_minmad_df.describe()) # sample std
cent_minmad_df
               c          x          r          a          y
count  10.000000  10.000000  10.000000  10.000000  10.000000
mean    0.216000   0.211500   0.214400   0.235200   0.214900
std     0.047593   0.037951   0.059366   0.067639   0.063656
min     0.155000   0.150000   0.149000   0.151000   0.146000
25%     0.179250   0.181750   0.166750   0.191000   0.164250
50%     0.207000   0.216000   0.200000   0.221500   0.201000
75%     0.250250   0.235250   0.270500   0.266000   0.274750
max     0.290000   0.267000   0.303000   0.377000   0.305000
Out[14]:
c x r a y
pid
1imq 0.186 0.180 0.169 0.187 0.164
1lmb4 0.290 0.237 0.303 0.377 0.304
1bf4 0.221 0.230 0.212 0.232 0.214
2ptl 0.221 0.222 0.208 0.236 0.207
2ci2 0.155 0.150 0.162 0.175 0.165
1aps 0.193 0.210 0.192 0.211 0.195
1fmk 0.260 0.257 0.290 0.276 0.295
1bk2 0.177 0.175 0.166 0.203 0.154
1shf 0.283 0.267 0.293 0.304 0.305
1ten 0.174 0.187 0.149 0.151 0.146
In [15]:
# fn = os.path.join(dirname, "ND_centphi_minQ_" + edv + ".out")
# cent_minQ_df = pd.read_table(fn, sep=' ').set_index('pid')
print(cent_minQ_df.describe())
cent_minQ_df
               c          x          r          a          y
count  10.000000  10.000000  10.000000  10.000000  10.000000
mean    0.130000   0.130000   0.280000   0.140000   0.300000
std     0.067495   0.067495   0.220101   0.096609   0.249444
min     0.100000   0.100000   0.200000   0.100000   0.200000
25%     0.100000   0.100000   0.200000   0.100000   0.200000
50%     0.100000   0.100000   0.200000   0.100000   0.200000
75%     0.100000   0.100000   0.200000   0.100000   0.275000
max     0.300000   0.300000   0.900000   0.400000   1.000000
Out[15]:
c x r a y
pid
1imq 0.2 0.2 0.2 0.2 0.2
1lmb4 0.3 0.3 0.9 0.4 1.0
1bf4 0.1 0.1 0.2 0.1 0.2
2ptl 0.1 0.1 0.2 0.1 0.2
2ci2 0.1 0.1 0.2 0.1 0.2
1aps 0.1 0.1 0.2 0.1 0.2
1fmk 0.1 0.1 0.2 0.1 0.2
1bk2 0.1 0.1 0.3 0.1 0.3
1shf 0.1 0.1 0.2 0.1 0.2
1ten 0.1 0.1 0.2 0.1 0.3
In [16]:
fig, axes = plt.subplots(1, 2, figsize=(8, 4), constrained_layout=True)
plt.suptitle("Calculating $\phi$ values with NS normalized node centrality from ND nets.", 
             fontsize=14)

ax0 = axes[0]
cent_minmad_df.boxplot(ax = ax0, showmeans=True)
ax0.set_title('Minimum MAD from exp_phi')
ax0.set_xticklabels(NDVs, fontsize=14); ax0.set_xlabel('ND variant')
# ax0.set_yticks(np.arange(0.2, 0.55, 0.05))

ax1 = axes[1]
cent_minQ_df.boxplot(ax = ax1, showmeans=True)
ax1.set_title('$Q$ with min MAD');
ax1.set_xticklabels(NDVs, fontsize=14); ax1.set_xlabel('ND variant')
_ = ax1.set_yticks(np.arange(0, 1.1, 0.1))
In [17]:
# MAD between min_Qs (Q where MAD from exp_phi is minimum) and peak Qs for cxray NDVs
np.mean(np.abs(cent_minQ_df.values - peak_Qs), axis=0)
Out[17]:
array([0.13, 0.12, 0.09, 0.13, 0.15])
In [18]:
print("MAD between experimental and calculated phi-values")
cent_MADstats_df
MAD between experimental and calculated phi-values
Out[18]:
c x r a y
MAD_stats
sample_avg 0.2160 0.2115 0.2144 0.2352 0.2149
sample_stdev 0.0476 0.0380 0.0594 0.0676 0.0637
range 0.1350 0.1170 0.1540 0.2260 0.1590
gMOE 0.0295 0.0235 0.0368 0.0419 0.0395
95meanCI_gleft 0.1865 0.1880 0.1776 0.1933 0.1754
95meanCI_gright 0.2455 0.2350 0.2512 0.2771 0.2544
bMOE 0.0280 0.0220 0.0349 0.0398 0.0369
95meanCI_bleft 0.1890 0.1893 0.1815 0.1980 0.1786
95meanCI_bright 0.2449 0.2334 0.2513 0.2775 0.2524
In [19]:
print("Q with minimum MAD between experimental and calculated phi-values")
cent_minQstats_df
Q with minimum MAD between experimental and calculated phi-values
Out[19]:
c x r a y
minQ_stats
sample_avg 0.1300 0.1300 0.2800 0.1400 0.3000
sample_stdev 0.0675 0.0675 0.2201 0.0966 0.2494
range 0.2000 0.2000 0.7000 0.3000 0.8000
gMOE 0.0418 0.0418 0.1364 0.0599 0.1546
95meanCI_gleft 0.0882 0.0882 0.1436 0.0801 0.1454
95meanCI_gright 0.1718 0.1718 0.4164 0.1999 0.4546
bMOE 0.0400 0.0350 0.1100 0.0500 0.1300
95meanCI_bleft 0.1000 0.1000 0.2000 0.1000 0.2000
95meanCI_bright 0.1800 0.1700 0.4200 0.2000 0.4600
In [20]:
plt = plot_CIs(cent_MADstats_df, cent_minQstats_df)
_ = plt.suptitle("Calculating $\phi$ values with NS normalized ND node centrality.", 
                 fontsize=14)
In [21]:
ndc_cent = cent_minmad_df['c'].values
ndx_cent = cent_minmad_df['x'].values
nda_cent = cent_minmad_df['a'].values
ndr_cent = cent_minmad_df['r'].values
ndy_cent = cent_minmad_df['y'].values
In [22]:
print("c a")
print(ttest_rel(ndc_cent, nda_cent))
print(tts(ndc_cent.mean(), ndc_cent.std(ddof=1), 10, 
    nda_cent.mean(), nda_cent.std(ddof=1), 10, equal_var=False))
print("\nx a")
print(ttest_rel(ndx_cent, nda_cent))
print(tts(ndx_cent.mean(), ndx_cent.std(ddof=1), 10, 
    nda_cent.mean(), nda_cent.std(ddof=1), 10, equal_var=False))
print("\nr a")
print(ttest_rel(ndr_cent, nda_cent))
print(tts(ndr_cent.mean(), ndr_cent.std(ddof=1), 10, 
    nda_cent.mean(), nda_cent.std(ddof=1), 10, equal_var=False))
print("\ny a")
print(ttest_rel(ndy_cent, nda_cent))
print(tts(ndy_cent.mean(), ndy_cent.std(ddof=1), 10, 
    nda_cent.mean(), nda_cent.std(ddof=1), 10, equal_var=False))
c a
Ttest_relResult(statistic=-2.1998838344328835, pvalue=0.055351057363337036)
Ttest_indResult(statistic=-0.7341208082163616, pvalue=0.47339483909234537)

x a
Ttest_relResult(statistic=-1.645582049961906, pvalue=0.13425842442991975)
Ttest_indResult(statistic=-0.9663136086020867, pvalue=0.35010731334375245)

r a
Ttest_relResult(statistic=-2.825410134954505, pvalue=0.019870724386820936)
Ttest_indResult(statistic=-0.7308675811574417, pvalue=0.4744235544825969)

y a
Ttest_relResult(statistic=-2.476735503154165, pvalue=0.035179359273354756)
Ttest_indResult(statistic=-0.6911334993886235, pvalue=0.4983310110150768)
In [23]:
minMad_c = cent_minmad_df['c'].values
minMad_a = cent_minmad_df['a'].values
print('minMad c', minMad_c)
print('minMad a', minMad_a)
minQ_c = cent_minQ_df['c'].values
minQ_a = cent_minQ_df['a'].values
print('minQ c', minQ_c)
print('minQ a', minQ_a)
minMad c [0.186 0.29  0.221 0.221 0.155 0.193 0.26  0.177 0.283 0.174]
minMad a [0.187 0.377 0.232 0.236 0.175 0.211 0.276 0.203 0.304 0.151]
minQ c [0.2 0.3 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
minQ a [0.2 0.4 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]
In [24]:
fig, axes = plt.subplots(4, 3, figsize=(12, 18), constrained_layout=True)
plt.suptitle("Calculated $\phi$ values from NS normalized node centrality, \
ND variants 'c' and 'a' with EDS variant 'a'.", fontsize=14)
plt.delaxes(axes[3, 1])
plt.delaxes(axes[3, 2])
dirn = os.path.join(dirname, "edv_a") # ND data

for i, pid in enumerate(PIDs):    
    r = i//3; c = i%3; ax = axes[r][c]
    fn0 = os.path.join(dirname, pid + "_nodestats.out")
    df0 = pd.read_table(fn0, sep="\s+") # nid exp_phi deg0 cent0  
    N = len(df0); m = N*N-N
    fn1 = os.path.join(dirn, pid + "_c_nodestats.out")
    df1 = pd.read_table(fn1, sep="\s+")    
    q_c = minQ_c[i]
    dq = df1.query("Q == @q_c")                           # 1lmb4 > 1.0?
    dq = dq.assign(cent_avg_norm2 = dq.cent_avg.values/m) # node centality is pre-averaged!
    dq = dq.assign(exp_phi = df0.exp_phi.values)
    de = dq.query("exp_phi > -1")    
    mad_c = np.round(np.mean(np.abs(de.cent_avg_norm2.values - de.exp_phi.values)), 2)    
    ax.plot(dq.nid.values+1, dq.cent_avg_norm2.values, color="tab:blue", alpha=0.5, 
            label="'c' cent " + str(mad_c))
        
    fn1 = os.path.join(dirn, pid + "_a_nodestats.out")
    df1 = pd.read_table(fn1, sep="\s+")
    q_a = minQ_a[i]
    dq = df1.query("Q == @q_a")
    dq = dq.assign(cent_avg_norm2 = dq.cent_avg.values/m)    
    dq = dq.assign(exp_phi = df0.exp_phi.values)
    de = dq.query("exp_phi > -1")    
    mad_a = np.round(np.mean(np.abs(de.cent_avg_norm2.values - de.exp_phi.values)), 2)    
    ax.plot(dq.nid.values+1, dq.cent_avg_norm2.values, color="tab:red", alpha=0.5, 
            label="'a' cent " + str(mad_a))
    
    ax.plot(de.nid.values+1, de.exp_phi.values, "o--g", alpha=0.5, label='exp_phi')
    ax.set_title(pid + " " + str(q_c) + " " + str(q_a), loc='left')
    ax.set_xlabel('Residue'); ax.set_yticks(np.arange(0, 1.3, 0.1))
    ax.legend(loc="lower left", bbox_to_anchor=(0.5, 1.0)) 
    # ax.grid()
# ofn = os.path.join(dirname, "ND_centphi_ca_" + edv + ".png")
# plt.savefig(ofn)
  • Figure above: Calculated phi-values as normalized node centrality (cent) of 'c' and 'a' ND generated PRNs at the $Q$ where the mean absolute difference (MAD) between experimental phi-values and cent is minimum. The numbers in the title of each subplot is this $Q$ for 'c' and 'a' respectively. The numbers in the legend is the MAD associated with these $Q$ values. ND node centrality for a $Q$ is the average over 100 independent ND generated PRNs at the $Q$.

Node degree

In [25]:
# compare mads of ND variants
fig, axes = plt.subplots(4, 3, figsize=(12, 16), constrained_layout=True, sharey=True)
plt.suptitle("Mean absolute difference between phi-values from experiments and \n \
phi-values calculated with NS normalized node degree from ND nets, EDS variant " + edv)
plt.delaxes(axes[3][1])
plt.delaxes(axes[3][2])

min_mads, min_Qs = [], []
for i, pid in enumerate(PIDs):
    r = i//3; c = i%3; ax = axes[r][c]
    res = []
    for i, ndv in enumerate(NDVs):
        dp = cxray_dfs[i].query("pid == @pid")        
        mad = dp.mad_deg_avg2.values
        ax.plot(Qs, mad, 'o--', label=ndv)    
        res.append(mad)            
    ax.set_title(pid)
    ax.set_xlabel('Q'); ax.set_xticks(np.arange(0, 1.1, 0.1))
    ax.legend(); ax.grid()        
    min_mad = np.round(np.min(res, axis=1), 3) # row-wise per ND variant, over Qs for a pid
    min_q = (np.round(np.argmin(res, axis=1), 3) + 1)/10
    min_mads.append(min_mad)
    min_Qs.append(min_q)
    print(pid, min_mad, min_q)

# ofn = os.path.join(dirname, "ND_degphi_" + edv + ".png")
# plt.savefig(ofn)
1imq [0.143 0.165 0.174 0.158 0.155] [0.2 0.2 0.1 0.2 0.1]
1lmb4 [0.241 0.228 0.243 0.247 0.312] [0.3 0.3 0.2 0.3 0.2]
1bf4 [0.181 0.192 0.191 0.191 0.205] [0.1 0.1 0.1 0.1 0.1]
2ptl [0.205 0.203 0.199 0.212 0.183] [0.2 0.2 0.1 0.2 0.1]
2ci2 [0.132 0.15  0.148 0.141 0.177] [0.1 0.1 0.1 0.1 0.1]
1aps [0.219 0.231 0.22  0.209 0.215] [0.1 0.1 0.1 0.1 0.1]
1fmk [0.265 0.262 0.285 0.266 0.293] [0.2 0.2 0.1 0.1 0.1]
1bk2 [0.223 0.237 0.211 0.206 0.197] [0.1 0.1 0.1 0.1 0.1]
1shf [0.263 0.217 0.268 0.282 0.275] [0.2 0.2 0.1 0.1 0.1]
1ten [0.149 0.152 0.163 0.155 0.16 ] [0.1 0.1 0.1 0.1 0.1]
In [26]:
# can skip if file already exists
deg_minmad_df = pd.DataFrame(min_mads)
deg_minmad_df.columns = NDVs
deg_minmad_df = deg_minmad_df.assign(pid=PIDs).set_index('pid')
ofn = os.path.join(dirname, "ND_degphi_minmad_" + edv + ".out")
deg_minmad_df.to_csv(ofn, sep=' ')

deg_MADstats_df = pd.DataFrame({ "MAD_stats": ["sample_avg", "sample_stdev", "range", \
                                  "gMOE", "95meanCI_gleft", "95meanCI_gright",
                                  "bMOE", "95meanCI_bleft", "95meanCI_bright",],
                "c": summary_stats(deg_minmad_df['c'].values), 
                "x": summary_stats(deg_minmad_df['x'].values),
                "r": summary_stats(deg_minmad_df['r'].values), 
                "a": summary_stats(deg_minmad_df['a'].values),
                "y": summary_stats(deg_minmad_df['y'].values) })
deg_MADstats_df = deg_MADstats_df.set_index("MAD_stats")

# ofn = os.path.join(dirname, "ND_deg_MADstats_" + edv + ".out")
# deg_MADstats_df.to_csv(ofn, sep=' ')
In [27]:
# can skip if file already exists
deg_minQ_df = pd.DataFrame(min_Qs)
deg_minQ_df.columns = NDVs
deg_minQ_df = deg_minQ_df.assign(pid=PIDs).set_index('pid')
ofn = os.path.join(dirname, "ND_degphi_minQ_" + edv + ".out")
deg_minQ_df.to_csv(ofn, sep=' ')

deg_minQstats_df = pd.DataFrame({ "minQ_stats": ["sample_avg", "sample_stdev", "range", \
                                "gMOE", "95meanCI_gleft", "95meanCI_gright",
                                "bMOE", "95meanCI_bleft", "95meanCI_bright"],
                "c": summary_stats(deg_minQ_df['c'].values), 
                "x": summary_stats(deg_minQ_df['x'].values),
                "r": summary_stats(deg_minQ_df['r'].values), 
                "a": summary_stats(deg_minQ_df['a'].values),
                "y": summary_stats(deg_minQ_df['y'].values)})
deg_minQstats_df = deg_minQstats_df.set_index("minQ_stats")

# ofn = os.path.join(dirname, "ND_deg_minQstats_" + edv + ".out")
# deg_minQstats_df.to_csv(ofn, sep=' ')
In [28]:
# fn = os.path.join(dirname, "ND_degphi_minmad_" + edv + ".out")
# deg_minmad_df = pd.read_table(fn, sep=' ').set_index('pid')
print(deg_minmad_df.describe())
deg_minmad_df
               c          x          r         a          y
count  10.000000  10.000000  10.000000  10.00000  10.000000
mean    0.202100   0.203700   0.210200   0.20670   0.217200
std     0.048968   0.038315   0.044733   0.04767   0.056334
min     0.132000   0.150000   0.148000   0.14100   0.155000
25%     0.157000   0.171750   0.178250   0.16625   0.178500
50%     0.212000   0.210000   0.205000   0.20750   0.201000
75%     0.236500   0.230250   0.237250   0.23825   0.260000
max     0.265000   0.262000   0.285000   0.28200   0.312000
Out[28]:
c x r a y
pid
1imq 0.143 0.165 0.174 0.158 0.155
1lmb4 0.241 0.228 0.243 0.247 0.312
1bf4 0.181 0.192 0.191 0.191 0.205
2ptl 0.205 0.203 0.199 0.212 0.183
2ci2 0.132 0.150 0.148 0.141 0.177
1aps 0.219 0.231 0.220 0.209 0.215
1fmk 0.265 0.262 0.285 0.266 0.293
1bk2 0.223 0.237 0.211 0.206 0.197
1shf 0.263 0.217 0.268 0.282 0.275
1ten 0.149 0.152 0.163 0.155 0.160
In [29]:
# fn = os.path.join(dirname, "ND_degphi_minQ_" + edv + ".out")
# deg_minQ_df = pd.read_table(fn, sep=' ').set_index('pid')
print(deg_minQ_df.describe())
deg_minQ_df
               c          x          r          a          y
count  10.000000  10.000000  10.000000  10.000000  10.000000
mean    0.160000   0.160000   0.110000   0.140000   0.110000
std     0.069921   0.069921   0.031623   0.069921   0.031623
min     0.100000   0.100000   0.100000   0.100000   0.100000
25%     0.100000   0.100000   0.100000   0.100000   0.100000
50%     0.150000   0.150000   0.100000   0.100000   0.100000
75%     0.200000   0.200000   0.100000   0.175000   0.100000
max     0.300000   0.300000   0.200000   0.300000   0.200000
Out[29]:
c x r a y
pid
1imq 0.2 0.2 0.1 0.2 0.1
1lmb4 0.3 0.3 0.2 0.3 0.2
1bf4 0.1 0.1 0.1 0.1 0.1
2ptl 0.2 0.2 0.1 0.2 0.1
2ci2 0.1 0.1 0.1 0.1 0.1
1aps 0.1 0.1 0.1 0.1 0.1
1fmk 0.2 0.2 0.1 0.1 0.1
1bk2 0.1 0.1 0.1 0.1 0.1
1shf 0.2 0.2 0.1 0.1 0.1
1ten 0.1 0.1 0.1 0.1 0.1
In [30]:
fig, axes = plt.subplots(1, 2, figsize=(8, 4), constrained_layout=True)
plt.suptitle("Calculating $\phi$ values with NS normalized node degree from ND nets.", 
             fontsize=14)

ax0 = axes[0]
deg_minmad_df.boxplot(ax = ax0, showmeans=True)
ax0.set_title('Minimum MAD from exp_phi')
ax0.set_xticklabels(NDVs, fontsize=14); ax0.set_xlabel('ND variant')
# ax0.set_yticks(np.arange(0.2, 0.55, 0.05))

ax1 = axes[1]
deg_minQ_df.boxplot(ax = ax1, showmeans=True)
ax1.set_title('$Q$ with min MAD');
ax1.set_xticklabels(NDVs, fontsize=14); ax1.set_xlabel('ND variant')
_ = ax1.set_yticks(np.arange(0, 1.1, 0.1))
In [31]:
# MAD between min_Qs (Q where MAD from exp_phi is minimum) and peak Qs for cxray NDVs
np.mean(np.abs(deg_minQ_df.values - peak_Qs), axis=0)
Out[31]:
array([0.1 , 0.09, 0.08, 0.11, 0.06])
In [32]:
print("MAD between experimental and calculated phi-values")
deg_MADstats_df
MAD between experimental and calculated phi-values
Out[32]:
c x r a y
MAD_stats
sample_avg 0.2021 0.2037 0.2102 0.2067 0.2172
sample_stdev 0.0490 0.0383 0.0447 0.0477 0.0563
range 0.1330 0.1120 0.1370 0.1410 0.1570
gMOE 0.0304 0.0237 0.0277 0.0295 0.0349
95meanCI_gleft 0.1717 0.1800 0.1825 0.1772 0.1823
95meanCI_gright 0.2325 0.2274 0.2379 0.2362 0.2521
bMOE 0.0290 0.0220 0.0260 0.0280 0.0331
95meanCI_bleft 0.1727 0.1819 0.1851 0.1787 0.1857
95meanCI_bright 0.2307 0.2260 0.2370 0.2346 0.2520
In [33]:
print("Q with minimum MAD between experimental and calculated phi-values")
deg_minQstats_df
Q with minimum MAD between experimental and calculated phi-values
Out[33]:
c x r a y
minQ_stats
sample_avg 0.1600 0.1600 0.1100 0.1400 0.1100
sample_stdev 0.0699 0.0699 0.0316 0.0699 0.0316
range 0.2000 0.2000 0.1000 0.2000 0.1000
gMOE 0.0433 0.0433 0.0196 0.0433 0.0196
95meanCI_gleft 0.1167 0.1167 0.0904 0.0967 0.0904
95meanCI_gright 0.2033 0.2033 0.1296 0.1833 0.1296
bMOE 0.0400 0.0400 0.0150 0.0400 0.0150
95meanCI_bleft 0.1200 0.1200 0.1000 0.1000 0.1000
95meanCI_bright 0.2000 0.2000 0.1300 0.1800 0.1300
In [34]:
plt = plot_CIs(deg_MADstats_df, deg_minQstats_df)
_ = plt.suptitle("Calculating $\phi$ values with NS normalized ND node degree.", 
                 fontsize=14)
In [35]:
minMad_a = deg_minmad_df['a'].values
print('minMad a', minMad_a)
minQ_a = deg_minQ_df['a'].values
print('minQ a', minQ_a)
minMad a [0.158 0.247 0.191 0.212 0.141 0.209 0.266 0.206 0.282 0.155]
minQ a [0.2 0.3 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1]
In [36]:
fig, axes = plt.subplots(4, 3, figsize=(12, 18), constrained_layout=True)
plt.suptitle("Calculated $\phi$ values from NS normalized node degree, \
ND variant 'a' with EDS variant 'a'.", fontsize=14)
plt.delaxes(axes[3, 1])
plt.delaxes(axes[3, 2])
dirn = os.path.join(dirname, "edv_a") # ND data

for i, pid in enumerate(PIDs):    
    r = i//3; c = i%3; ax = axes[r][c]
    fn0 = os.path.join(dirname, pid + "_nodestats.out")
    df0 = pd.read_table(fn0, sep="\s+") # nid exp_phi deg0 cent0        
        
    fn1 = os.path.join(dirn, pid + "_a_nodestats.out")
    df1 = pd.read_table(fn1, sep="\s+")
    q_a = minQ_a[i]
    dq = df1.query("Q == @q_a")
    dq = dq.assign(deg_avg_norm2 = dq.deg_avg.values/df0.deg0.values)    
    dq = dq.assign(exp_phi = df0.exp_phi.values)
    de = dq.query("exp_phi > -1")    
    mad_a = np.round(np.mean(np.abs(de.deg_avg_norm2.values - de.exp_phi.values)), 2)    
    ax.plot(dq.nid.values+1, dq.deg_avg_norm2.values, color="tab:red", alpha=0.5, 
            label="'a' deg " + str(mad_a))
    
    ax.plot(de.nid.values+1, de.exp_phi.values, "o--g", alpha=0.5, label='exp_phi')
    ax.set_title(pid + " " + str(q_a), loc='left')
    ax.set_xlabel('Residue'); ax.set_yticks(np.arange(0, 1.1, 0.1))
    ax.legend(loc="lower left", bbox_to_anchor=(0.5, 1.0)) 

# ofn = os.path.join(dirname, "ND_degphi_a_" + edv + ".png")
# plt.savefig(ofn)
  • Figure above: Calculated phi-values as normalized node degree (deg) of 'a' ND generated PRNs at the $Q$ where the mean absolute difference (MAD) between experimental phi-values and deg is minimum. The numbers in the title of each subplot is this $Q$ for 'a'. The numbers in the legend is the MAD associated with these $Q$ values. ND node degree for a $Q$ is the average over 100 independent ND generated PRNs at the $Q$.

Summary

In [37]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5), constrained_layout=True, sharey=True)
plt.suptitle("Calculating $\phi$ values with ND PRNs, EDS variant " + edv, fontsize=14)

ax = axes[0]
ax.set_title("Node centrality", fontsize=14)
for ndv in NDVs:
    vals = cent_minmad_df[ndv].values
    ax.plot(PIDs, vals, 'o-', label=ndv + ' ' + str(np.round(np.mean(vals), 3)))
ax.set_xticks(PIDs); ax.set_xticklabels(PIDs, rotation=90, fontsize=14)
ax.set_ylabel("Minimum MAD from exp_phi", fontsize=14)
ax.legend(); ax.grid()

ax = axes[1]
ax.set_title("Node degree", fontsize=14)
for ndv in NDVs:
    vals = deg_minmad_df[ndv].values
    ax.plot(PIDs, vals, 'o--', label=ndv + ' ' + str(np.round(np.mean(vals), 3)))
ax.set_xticks(PIDs); ax.set_xticklabels(PIDs, rotation=90, fontsize=14)
ax.legend(); ax.grid()
# ax.set_ylabel("Minimum MAD ", fontsize=14)

# ofn = os.path.join(dirname, "ND_minMadphi_" + edv + ".png")
# plt.savefig(ofn) 
In [38]:
fig, axes = plt.subplots(2, 2, figsize=(8, 8), constrained_layout=True)
plt.suptitle("Calculating $\phi$ values with ND PRNs, EDS variant " + edv, fontsize=14)

ax0 = axes[0][0]
cent_minmad_df.boxplot(ax = ax0, showmeans=True)
ax0.set_title('Node centrality', fontsize=14)
ax0.set_xticklabels(NDVs, fontsize=14)
ax0.set_ylabel('Minimum MAD from exp_phi', fontsize=12)
ax0.set_yticks(np.arange(0.15, 0.6, 0.05))

ax1 = axes[0][1]
deg_minmad_df.boxplot(ax = ax1, showmeans=True)
ax1.set_title('Node degree', fontsize=14)
ax1.set_xticklabels(NDVs, fontsize=14)
ax1.set_yticks(np.arange(0.15, 0.6, 0.05))

ax2 = axes[1][0]
cent_minQ_df.boxplot(ax = ax2, showmeans=True)
ax2.set_xticklabels(NDVs, fontsize=14)
ax2.set_ylabel('$Q$ with minimum MAD', fontsize=12); ax2.set_yticks(np.arange(0, 1.1, 0.1))
ax2.set_xlabel('ND variant', fontsize=14)

ax3 = axes[1][1]
deg_minQ_df.boxplot(ax = ax3, showmeans=True)
ax3.set_yticks(np.arange(0, 1.1, 0.1))
ax3.set_xticklabels(NDVs, fontsize=14)
_ = ax3.set_xlabel('ND variant', fontsize=14)

# ofn = os.path.join(dirname, "ND_phi_" + edv + ".png")
# plt.savefig(ofn)
In [39]:
methods = cent_MADstats_df.columns.values; n = len(methods)
plt_colors = plt.get_cmap("tab20")
gcolors = plt_colors(list(range(1, n*2, 2))) # odd
bcolors = plt_colors(list(range(0, n*2, 2))) # even

fig, axes = plt.subplots(2, 2, figsize=(8, 8), constrained_layout=True)
plt.suptitle("Calculating $\phi$ values from ND nets.", fontsize=14)

ax = axes[0][0]
mu = cent_MADstats_df.loc['sample_avg'].values
ga = cent_MADstats_df.loc['95meanCI_gleft'].values
gz = cent_MADstats_df.loc['95meanCI_gright'].values
ba = cent_MADstats_df.loc['95meanCI_bleft'].values
bz = cent_MADstats_df.loc['95meanCI_bright'].values
for i, p in enumerate(zip(ga, gz, ba, bz)):
    ax.plot([i, i], [p[0], p[1]], marker='o', color=gcolors[i])
    ax.plot([i+0.2, i+0.2], [p[2], p[3]], marker='o', color=bcolors[i], label=str(p[3])) 
    ax.scatter(i+0.2, mu[i], marker='^')
ax.set_title("Node centrality", fontsize=14)
ax.set_yticks(np.arange(0.12, 0.32, 0.02))
ax.set_ylabel("95% CI mean MAD from exp_phi", fontsize=14)
ax.set_xticks(list(range(0, n, 1))); ax.set_xticklabels(methods, fontsize=14)
ax.legend(); ax.grid()

ax = axes[1][0]
mu = cent_minQstats_df.loc['sample_avg'].values
ga = cent_minQstats_df.loc['95meanCI_gleft'].values
gz = cent_minQstats_df.loc['95meanCI_gright'].values
ba = cent_minQstats_df.loc['95meanCI_bleft'].values
bz = cent_minQstats_df.loc['95meanCI_bright'].values
for i, p in enumerate(zip(ga, gz, ba, bz)):
    ax.plot([i, i], [p[0], p[1]], marker='o', color=gcolors[i])
    ax.plot([i+0.2, i+0.2], [p[2], p[3]], marker='o', color=bcolors[i], label=str(p[3])) 
    ax.scatter(i+0.2, mu[i], marker='^')
ax.set_yticks(np.arange(0.0, 0.5, 0.1))
ax.set_ylabel("95% CI mean minQ", fontsize=14)
ax.set_xticks(list(range(0, n, 1))); ax.set_xticklabels(methods, fontsize=14)
ax.set_xlabel("ND variant", fontsize=14)
ax.legend(); ax.grid()

ax = axes[0][1]
mu = deg_MADstats_df.loc['sample_avg'].values
ga = deg_MADstats_df.loc['95meanCI_gleft'].values
gz = deg_MADstats_df.loc['95meanCI_gright'].values
ba = deg_MADstats_df.loc['95meanCI_bleft'].values
bz = deg_MADstats_df.loc['95meanCI_bright'].values
for i, p in enumerate(zip(ga, gz, ba, bz)):
    ax.plot([i, i], [p[0], p[1]], marker='o', color=gcolors[i])
    ax.plot([i+0.2, i+0.2], [p[2], p[3]], marker='o', color=bcolors[i], label=str(p[3])) 
    ax.scatter(i+0.2, mu[i], marker='^')
ax.set_title("Node degree", fontsize=14)
ax.set_yticks(np.arange(0.12, 0.32, 0.02))
# ax.set_ylabel("95% confidence interval mean MAD from exp_phi", fontsize=14)
ax.set_xticks(list(range(0, n, 1))); ax.set_xticklabels(methods, fontsize=14)
ax.legend(); ax.grid()

ax = axes[1][1]
mu = deg_minQstats_df.loc['sample_avg'].values
ga = deg_minQstats_df.loc['95meanCI_gleft'].values
gz = deg_minQstats_df.loc['95meanCI_gright'].values
ba = deg_minQstats_df.loc['95meanCI_bleft'].values
bz = deg_minQstats_df.loc['95meanCI_bright'].values
for i, p in enumerate(zip(ga, gz, ba, bz)):
    ax.plot([i, i], [p[0], p[1]], marker='o', color=gcolors[i])
    ax.plot([i+0.2, i+0.2], [p[2], p[3]], marker='o', color=bcolors[i], label=str(p[3])) 
    ax.scatter(i+0.2, mu[i], marker='^')
ax.set_yticks(np.arange(0.0, 0.5, 0.1))
ax.set_xticks(list(range(0, n, 1))); ax.set_xticklabels(methods, fontsize=14)
ax.set_xlabel("ND variant", fontsize=14)
ax.legend(); ax.grid()

# ofn = os.path.join(dirname, "ND_95CIstats_" + edv + ".png")
# print(ofn)
# plt.savefig(ofn)

How do calculated $\phi$ values from ND PRNs fare against TSE PRNs?

In [40]:
# MAD values computed previously
ns_deg = np.array([0.48, 0.34, 0.49, 0.35, 0.34, 0.37, 0.45, 0.35, 0.51, 0.41])
ns_cent = np.array([0.44, 0.3, 0.41, 0.29, 0.32, 0.3, 0.43, 0.35, 0.44, 0.36])
ns_locent = np.array([0.33, 0.5, 0.2, 0.19, 0.24, 0.22, 0.3, 0.15, 0.24, 0.16])

ts_deg = np.array([0.46, 0.24, 0.35, 0.29, 0.28, 0.26, 0.3, 0.28, 0.3, 0.3])
ts_deg2 = np.array([0.25, 0.21, 0.4, 0.29, 0.36, 0.35, 0.28, 0.31, 0.22, 0.32])
ts_cent = np.array([0.41, 0.34, 0.37, 0.34, 0.29, 0.26, 0.42, 0.26, 0.43, 0.4])

ndc_cent = np.array([0.394, 0.286, 0.36, 0.327, 0.26, 0.26, 0.304, 0.23, 0.391, 0.252])
nda_cent = np.array([0.274, 0.372, 0.301, 0.287, 0.251, 0.242, 0.302, 0.204, 0.399, 0.313])
nda_deg = np.array([0.33, 0.315, 0.402, 0.318, 0.271, 0.295, 0.355, 0.268, 0.444, 0.299])

ndc_cent2 = np.array([0.186, 0.29, 0.221, 0.221, 0.155, 0.193, 0.26, 0.177, 0.283, 0.174])
nda_cent2 = np.array([0.187, 0.377, 0.232, 0.236, 0.175, 0.211, 0.276, 0.203, 0.304, 0.151])
nda_deg2 = np.array([0.158, 0.247, 0.191, 0.212, 0.141, 0.209, 0.266, 0.206, 0.282, 0.155])
In [41]:
# sample size is too small for CLT which 95%meanCI assumes (should actual do resampling?)
print("MAD between experimental and calculated phi-values")
df = pd.DataFrame({ "MAD_stats": ["sample_avg", "sample_stdev", "range", \
                                  "gMOE", "95meanCI_gleft", "95meanCI_gright", 
                                  "bMOE", "95meanCI_bleft", "95meanCI_bright"],
                "NS_deg": summary_stats(ns_deg), "NS_cent": summary_stats(ns_cent),
                "NS_locent": summary_stats(ns_locent), "TSE_deg": summary_stats(ts_deg),
                "TSE_deg2": summary_stats(ts_deg2), "TSE_cent": summary_stats(ts_cent),
                "NDc_cent": summary_stats(ndc_cent), "NDa_cent": summary_stats(nda_cent),
                "NDa_deg": summary_stats(nda_deg),
                "NDc_cent2": summary_stats(ndc_cent2), "NDa_cent2": summary_stats(nda_cent2),
                "NDa_deg2": summary_stats(nda_deg2)})
df = df.set_index("MAD_stats")
df
MAD between experimental and calculated phi-values
Out[41]:
NS_deg NS_cent NS_locent TSE_deg TSE_deg2 TSE_cent NDc_cent NDa_cent NDa_deg NDc_cent2 NDa_cent2 NDa_deg2
MAD_stats
sample_avg 0.4090 0.3640 0.2530 0.3060 0.2990 0.3520 0.3064 0.2945 0.3297 0.2160 0.2352 0.2067
sample_stdev 0.0679 0.0613 0.1036 0.0613 0.0615 0.0648 0.0593 0.0584 0.0565 0.0476 0.0676 0.0477
range 0.1700 0.1500 0.3500 0.2200 0.1900 0.1700 0.1640 0.1950 0.1760 0.1350 0.2260 0.1410
gMOE 0.0421 0.0380 0.0642 0.0380 0.0381 0.0401 0.0368 0.0362 0.0350 0.0295 0.0419 0.0295
95meanCI_gleft 0.3669 0.3260 0.1888 0.2680 0.2609 0.3119 0.2696 0.2583 0.2947 0.1865 0.1933 0.1772
95meanCI_gright 0.4511 0.4020 0.3172 0.3440 0.3371 0.3921 0.3432 0.3307 0.3647 0.2455 0.2771 0.2362
bMOE 0.0400 0.0360 0.0590 0.0355 0.0360 0.0380 0.0348 0.0347 0.0333 0.0278 0.0386 0.0282
95meanCI_bleft 0.3700 0.3280 0.2000 0.2750 0.2630 0.3140 0.2724 0.2604 0.2994 0.1891 0.1986 0.1788
95meanCI_bright 0.4500 0.4000 0.3180 0.3460 0.3350 0.3900 0.3420 0.3298 0.3659 0.2447 0.2757 0.2351
In [42]:
plt.figure(figsize=(8, 6))

methods = df.columns.values; n = len(methods)
plt_colors = plt.get_cmap("tab20")
gcolors = plt_colors(list(range(1, n*2, 2))) # odd
bcolors = plt_colors(list(range(0, n*2, 2))) # even

mu = df.loc['sample_avg'].values
ga = df.loc['95meanCI_gleft'].values; gz = df.loc['95meanCI_gright'].values
ba = df.loc['95meanCI_bleft'].values; bz = df.loc['95meanCI_bright'].values

for i, p in enumerate(zip(ga, gz, ba, bz)):
    plt.plot([i, i], [p[0], p[1]], color=gcolors[i], marker='o')
    plt.scatter(i+0.2, mu[i], marker='^')
    plt.plot([i+0.2, i+0.2], [p[2], p[3]], color=bcolors[i], marker='o', label=str(p[3]))
    
plt.title("Calculating $\phi$ values", fontsize=14)
plt.xticks(list(range(0, n, 1)), methods, rotation=45, fontsize=12)
plt.xlabel("Method: PRN type and node statistic combination", fontsize=14)
plt.ylabel("95% confidence interval around mean MAD", fontsize=14)
plt.legend(title="97.5 percentile", loc="upper left", bbox_to_anchor=(1.0, 1.0)); plt.grid()

# ofn = os.path.join(dirname, "NSTSEND_95CIstats_" + edv + ".png")
# print(ofn)
# plt.savefig(ofn)

End of notebook

SKLC December 2020