osl_dynamics.analysis.prediction#

Classes/functions for building machine learning pipelines, performing hyperparameter tuning, and evaluating model performance.

Classes#

PipelineBuilder

A class to handle the design of machine learning pipelines with options for

ModelSelection

Model selection class.

Module Contents#

class osl_dynamics.analysis.prediction.PipelineBuilder[source]#

A class to handle the design of machine learning pipelines with options for scaling, dimensionality reduction, and model selection.

DEFAULT_SCALER = None[source]#
DEFAULT_DIM_REDUCTION = None[source]#
DEFAULT_PREDICTOR = 'ols'[source]#
scaler_dict[source]#
dim_reduction_dict[source]#
predictor_dict[source]#
property available_scalers: List[str][source]#
Return type:

List[str]

property available_dim_reductions: List[str][source]#
Return type:

List[str]

property available_predictors: List[str][source]#
Return type:

List[str]

validate_model(scaler=None, dim_reduction=None, predictor=None)[source]#

Validates the provided model components (scaler, dimensionality reduction, and predictor).

Parameters:
  • scaler (str or None, optional) – The scaler name to use. If None, the default scaler is used.

  • dim_reduction (str or None, optional) – The dimensionality reduction technique to use. If None, no reduction is applied.

  • predictor (str or None, optional) – The model name to use. If None, the default predictor is used.

Raises:

ValueError – If any of the provided models or techniques are invalid.

Return type:

None

build_model(scaler=None, dim_reduction=None, predictor=None)[source]#

Constructs and returns a scikit-learn pipeline with the specified components.

Parameters:
  • scaler (str or None, optional) – The scaler name to use. If None, the default scaler is used.

  • dim_reduction (str or None, optional) – The dimensionality reduction technique to use. If None, no reduction is applied.

  • predictor (str or None, optional) – The model name to use. If None, the default predictor is used.

Returns:

A scikit-learn Pipeline object with the specified components.

Return type:

Pipeline

get_params_grid(scalar_params=None, dim_reduction_params=None, predictor_params=None)[source]#

Returns a combined parameter grid for use in hyperparameter optimization (e.g., GridSearchCV).

Parameters:
  • scalar_params (dict, optional) – A dictionary of parameters to be passed to the scaler.

  • dim_reduction_params (dict, optional) – A dictionary of parameters to be passed to the dimensionality reduction technique.

  • predictor_params (dict, optional) – A dictionary of parameters to be passed to the model.

Returns:

A dictionary combining the parameters for all the components in the pipeline.

Return type:

dict

class osl_dynamics.analysis.prediction.ModelSelection(model, params_grid=None, search_type='grid', cv=5, scoring=None, n_iter=10, random_state=None, n_jobs=1, verbose=0)[source]#

Model selection class.

Parameters:
  • model (sklearn.base.BaseEstimator) – The machine learning model to be used.

  • params_grid (dict, optional) – The hyperparameter grid for tuning.

  • search_type (str, default='grid') – The type of search: ‘grid’ for GridSearchCV or ‘random’ for RandomizedSearchCV.

  • cv (int, optional) – Number of cross-validation folds. Defaults to 5.

  • scoring (str, optional) – Scoring metric to optimize. Defaults to None.

  • n_iter (int, optional) – Number of iterations for RandomizedSearchCV. Defaults to 10.

  • random_state (int, optional) – Random seed for reproducibility.

  • n_jobs (int, optional) – Number of CPU cores to use for parallel processing. Defaults to 1.

  • verbose (int, optional) – Verbosity level for model selection methods. Defaults to 0.

model[source]#
params_grid = None[source]#
search_type = 'grid'[source]#
cv = 5[source]#
scoring = None[source]#
n_iter = 10[source]#
random_state = None[source]#
n_jobs = 1[source]#
verbose = 0[source]#
best_model = None[source]#
best_params = None[source]#
set_params_grid(params_grid)[source]#

Sets the hyperparameter grid for tuning.

Parameters:

params_grid (dict) – The hyperparameter grid for tuning.

Return type:

None

set_cv(cv)[source]#

Sets the number of cross-validation folds.

Parameters:

cv (int) – Number of cross-validation folds.

Return type:

None

set_scoring(scoring)[source]#

Sets the scoring metric.

Parameters:

scoring (str) – The scoring metric to use.

Return type:

None

set_n_iter(n_iter)[source]#

Sets the number of iterations for RandomizedSearchCV.

Parameters:

n_iter (int) – Number of iterations (must be positive integer).

Return type:

None

validate_data(X, y)[source]#
Parameters:
  • X (Union[numpy.ndarray, list])

  • y (Union[numpy.ndarray, list])

Return type:

None

model_selection(X, y, override_best_model=True)[source]#

Performs hyperparameter tuning using cross-validation.

Parameters:
  • X (array-like) – Feature matrix of shape (n_samples, n_features).

  • y (array-like) – Target variable of shape (n_samples,).

  • override_best_model (bool)

Returns:

  • best_params_ (dict) – Best hyperparameters found.

  • best_score_ (float) – Best cross-validation score achieved.

nested_cross_validation(X, y, split_type='kfold', outer_cv=5, shuffle=True)[source]#

Performs nested cross-validation to evaluate model performance.

Parameters:
  • X (array-like) – Feature matrix of shape (n_samples, n_features).

  • y (array-like) – Target variable of shape (n_samples,).

  • split_type (str, optional) – Type of cross-validation split to use. Must be ‘kfold’ or ‘stratified_kfold’. Defaults to ‘kfold’.

  • outer_cv (int, optional) – Number of outer cross-validation folds. Defaults to 5.

  • shuffle (bool, optional) – Whether to shuffle the data before splitting. Defaults to True.

Returns:

outer_scores – Array of test scores for each outer fold.

Return type:

np.ndarray

cross_validation_scores(X, y, cv=None, scoring=None)[source]#

Computes cross-validation scores for the best model.

Parameters:
  • X (array-like) – Feature matrix of shape (n_samples, n_features).

  • y (array-like) – Target variable of shape (n_samples,).

  • cv (int, optional) – Number of cross-validation folds. Defaults to the instance’s cv attribute.

  • scoring (str, optional) – Scoring metric to use. Defaults to the instance’s scoring attribute.

Returns:

scores – Cross-validation scores.

Return type:

np.ndarray