neuroharmony.ks_test_grid

neuroharmony.ks_test_grid(df, features, sampling_variable='scanner')[source]

Calculate the Kolmogorov-Smirnov score for all pairs of scanners.

Parameters
df: NDFrame of shape [n_subjects, n_features]

DataFrame with the subjects data.

features: list

List of the features to be considered on the Kolmogorov-Smirnov test.

sampling_variable: str, default=’scanner’

Variable for which you want to group subjects.

Returns
KS_by_variable: dict of NDFrames

Kolmogorov-Smirnov p-values to all pairs of instances in the sampling_variable column. The keys in the dictionary are the variables in ‘features’. The values of each entry are square NDFrames of shape [n_vars, n_vars].

Raises
ValueError:

If the list of variables contain any variable that is not present in df.

Examples

>>> ixi = DataSet('data/raw/IXI').data
>>> features = ['Left-Lateral-Ventricle', 'Left-Inf-Lat-Vent', ]
>>> KS = ks_test_grid(df, features, 'scanner')
>>> KS[features[0]]
+--------------------------+----------------------+------------------------+--------------------+
|                          | SCANNER01-SCANNER01  | SCANNER02-SCANNER01    | SCANNER03-SCANNER01|
+==========================++=====================+========================+====================+
|SCANNER01-SCANNER01       | NaN                  | NaN                    | NaN                |
+--------------------------+----------------------+------------------------+--------------------+
|SCANNER02-SCANNER01       | 0.000759473          | NaN                    | NaN                |
+--------------------------+----------------------+------------------------+--------------------+
|SCANNER03-SCANNER01       | 0.0539998            | 0.625887               | NaN                |
+--------------------------+----------------------+------------------------+--------------------+