neuroharmony.Neuroharmony¶
- class neuroharmony.Neuroharmony(features, regression_features, covariates, eliminate_variance, estimator=RandomForestRegressor(), scaler=StandardScaler(), decomposition=PCA(), model_strategy='single', param_distributions={'RandomForestRegressor__criterion': ['mse', 'mae'], 'RandomForestRegressor__n_estimators': [100, 200, 500], 'RandomForestRegressor__warm_start': [False, True]}, estimator_args={'criterion': 'mae', 'n_jobs': 1, 'random_state': 42, 'verbose': False}, scaler_args={}, randomized_search_args={}, pipeline_args={})[source]¶
Harmonization tool to mitigate scanner bias.
- Parameters
- featureslist
Target features to be harmonized, for example, ROIs.
- regression_featureslist
Features used to derive harmonization rules, for example, IQMs.
- covariateslist
Variables for which we want to eliminate the bias, for example, age, sex, and scanner.
- estimatorsklearn estimator, default=RandomForestRegressor()
Model to make the harmonization regression.
- scalersklearn scaler, default=StandardScaler()
Scaler used as the first step of the harmonization regression.
- model_strategy{“single”, “full”}, default=”single”
If “single” one model will be trained for each single feature in features. If “full” it will use a single model to regress all the features at once.
- param_distributionsdict, default=dict(RandomForestRegressor__n_estimators=[100, 200, 500],
RandomForestRegressor__warm_start=[False, True], )
Distribution of parameters to be testes on the RandomizedSearchCV.
- **estimator_argsdict
Parameters for the estimator.
- **scaler_argsdict
Parameters for the scaler.
- **randomized_search_argsdict
Parameters for the RandomizedSearchCV. See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html
- **pipeline_argsdict
Parameters for the sklearn Pipeline. See https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
- Attributes
- X_harmonized_NDFrame [n_subjects, n_features]
Input data harmonized.
- leaveonegroupout_
Leave One Group Out cross-validator.
- models_by_feature_
Estimators by features.
- fit(df)[source]¶
Fit the model.
Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
- Parameters
- dfNDFrame of shape [n_subjects, n_features]
Training data. Must fulfil input requirements of the first step of the pipeline.
- Returns
- selfNeuroharmony
This estimator
- refit(df)[source]¶
Fit a trained model with a new dataset.
- Parameters
- dfNDFrame of shape [n_samples, n_features]
Pandas dataframe with features, regression_features and covariates.
- fit_transform(df)[source]¶
Fit to data, then transform it.
Fits transformer to df and y with optional parameters fit_params and returns a transformed version of df.
- Parameters
- df: NDFrame of shape [n_subjects, n_features]
Training set.
- Returns
- harmonized_: NDFrame of shape [n_samples, n_features_new]
Data harmonized with ComBat.
- transform(df)[source]¶
Predict regression target for df.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.
- Parameters
- dfNDFrame of shape [n_samples, n_features]
Pandas dataframe with features, regression_features and covariates.
- Returns
- yNDFrame of shape [n_samples, n_features]
Data harmonized with Neuroharmony.