osl_dynamics.analysis.prediction#

Classes/functions for building machine learning pipelines, performing hyperparameter tuning, and evaluating model performance.

Classes#

`PipelineBuilder`	A class to handle the design of machine learning pipelines with options for
`ModelSelection`	Model selection class.

Module Contents#

class osl_dynamics.analysis.prediction.PipelineBuilder[source]#

A class to handle the design of machine learning pipelines with options for scaling, dimensionality reduction, and model selection.

DEFAULT_SCALER = None[source]#

DEFAULT_DIM_REDUCTION = None[source]#

DEFAULT_PREDICTOR = 'ols'[source]#

scaler_dict[source]#

dim_reduction_dict[source]#

predictor_dict[source]#

property available_scalers: List[str][source]#

Return type:: List[str]

property available_dim_reductions: List[str][source]#

Return type:: List[str]

property available_predictors: List[str][source]#

Return type:: List[str]

validate_model(scaler=None, dim_reduction=None, predictor=None)[source]#

Validates the provided model components (scaler, dimensionality reduction, and predictor).

Parameters:

scaler (str or None, optional) – The scaler name to use. If None, the default scaler is used.
dim_reduction (str or None, optional) – The dimensionality reduction technique to use. If None, no reduction is applied.
predictor (str or None, optional) – The model name to use. If None, the default predictor is used.

Raises:

ValueError – If any of the provided models or techniques are invalid.

Return type:

None

build_model(scaler=None, dim_reduction=None, predictor=None)[source]#

Constructs and returns a scikit-learn pipeline with the specified components.

Parameters:

scaler (str or None, optional) – The scaler name to use. If None, the default scaler is used.
dim_reduction (str or None, optional) – The dimensionality reduction technique to use. If None, no reduction is applied.
predictor (str or None, optional) – The model name to use. If None, the default predictor is used.

Returns:

A scikit-learn Pipeline object with the specified components.

Return type:

Pipeline

get_params_grid(scalar_params=None, dim_reduction_params=None, predictor_params=None)[source]#

Returns a combined parameter grid for use in hyperparameter optimization (e.g., GridSearchCV).

Parameters:

scalar_params (dict, optional) – A dictionary of parameters to be passed to the scaler.
dim_reduction_params (dict, optional) – A dictionary of parameters to be passed to the dimensionality reduction technique.
predictor_params (dict, optional) – A dictionary of parameters to be passed to the model.

Returns:

A dictionary combining the parameters for all the components in the pipeline.

Return type:

dict

class osl_dynamics.analysis.prediction.ModelSelection(model, params_grid=None, search_type='grid', cv=5, scoring=None, n_iter=10, random_state=None, n_jobs=1, verbose=0)[source]#

Model selection class.

Parameters:

model (sklearn.base.BaseEstimator) – The machine learning model to be used.
params_grid (dict, optional) – The hyperparameter grid for tuning.
search_type (str, default='grid') – The type of search: ‘grid’ for GridSearchCV or ‘random’ for RandomizedSearchCV.
cv (int, optional) – Number of cross-validation folds. Defaults to 5.
scoring (str, optional) – Scoring metric to optimize. Defaults to None.
n_iter (int, optional) – Number of iterations for RandomizedSearchCV. Defaults to 10.
random_state (int, optional) – Random seed for reproducibility.
n_jobs (int, optional) – Number of CPU cores to use for parallel processing. Defaults to 1.
verbose (int, optional) – Verbosity level for model selection methods. Defaults to 0.

model[source]#

params_grid = None[source]#

search_type = 'grid'[source]#

cv = 5[source]#

scoring = None[source]#

n_iter = 10[source]#

random_state = None[source]#

n_jobs = 1[source]#

verbose = 0[source]#

best_model = None[source]#

best_params = None[source]#

set_params_grid(params_grid)[source]#

Sets the hyperparameter grid for tuning.

Parameters:: params_grid (dict) – The hyperparameter grid for tuning.
Return type:: None

set_cv(cv)[source]#

Sets the number of cross-validation folds.

Parameters:: cv (int) – Number of cross-validation folds.
Return type:: None

set_scoring(scoring)[source]#

Sets the scoring metric.

Parameters:: scoring (str) – The scoring metric to use.
Return type:: None

set_n_iter(n_iter)[source]#

Sets the number of iterations for RandomizedSearchCV.

Parameters:: n_iter (int) – Number of iterations (must be positive integer).
Return type:: None

validate_data(X, y)[source]#

Parameters:

X (Union[numpy.ndarray, list])
y (Union[numpy.ndarray, list])

Return type:

None

model_selection(X, y, override_best_model=True)[source]#

Performs hyperparameter tuning using cross-validation.

Parameters:

X (array-like) – Feature matrix of shape (n_samples, n_features).
y (array-like) – Target variable of shape (n_samples,).
override_best_model (bool)

Returns:

best_params_ (dict) – Best hyperparameters found.
best_score_ (float) – Best cross-validation score achieved.

nested_cross_validation(X, y, split_type='kfold', outer_cv=5, shuffle=True)[source]#

Performs nested cross-validation to evaluate model performance.

Parameters:

X (array-like) – Feature matrix of shape (n_samples, n_features).
y (array-like) – Target variable of shape (n_samples,).
split_type (str, optional) – Type of cross-validation split to use. Must be ‘kfold’ or ‘stratified_kfold’. Defaults to ‘kfold’.
outer_cv (int, optional) – Number of outer cross-validation folds. Defaults to 5.
shuffle (bool, optional) – Whether to shuffle the data before splitting. Defaults to True.

Returns:

outer_scores – Array of test scores for each outer fold.

Return type:

np.ndarray

cross_validation_scores(X, y, cv=None, scoring=None)[source]#

Computes cross-validation scores for the best model.

Parameters:

X (array-like) – Feature matrix of shape (n_samples, n_features).
y (array-like) – Target variable of shape (n_samples,).
cv (int, optional) – Number of cross-validation folds. Defaults to the instance’s cv attribute.
scoring (str, optional) – Scoring metric to use. Defaults to the instance’s scoring attribute.

Returns:

scores – Cross-validation scores.

Return type:

np.ndarray