osl_dynamics.analysis.prediction#
Classes/functions for building machine learning pipelines, performing hyperparameter tuning, and evaluating model performance.
Classes#
A class to handle the design of machine learning pipelines with options for |
|
Model selection class. |
Module Contents#
- class osl_dynamics.analysis.prediction.PipelineBuilder[source]#
A class to handle the design of machine learning pipelines with options for scaling, dimensionality reduction, and model selection.
- validate_model(scaler=None, dim_reduction=None, predictor=None)[source]#
Validates the provided model components (scaler, dimensionality reduction, and predictor).
- Parameters:
scaler (str or None, optional) – The scaler name to use. If None, the default scaler is used.
dim_reduction (str or None, optional) – The dimensionality reduction technique to use. If None, no reduction is applied.
predictor (str or None, optional) – The model name to use. If None, the default predictor is used.
- Raises:
ValueError – If any of the provided models or techniques are invalid.
- Return type:
None
- build_model(scaler=None, dim_reduction=None, predictor=None)[source]#
Constructs and returns a scikit-learn pipeline with the specified components.
- Parameters:
scaler (str or None, optional) – The scaler name to use. If None, the default scaler is used.
dim_reduction (str or None, optional) – The dimensionality reduction technique to use. If None, no reduction is applied.
predictor (str or None, optional) – The model name to use. If None, the default predictor is used.
- Returns:
A scikit-learn Pipeline object with the specified components.
- Return type:
Pipeline
- get_params_grid(scalar_params=None, dim_reduction_params=None, predictor_params=None)[source]#
Returns a combined parameter grid for use in hyperparameter optimization (e.g., GridSearchCV).
- Parameters:
scalar_params (dict, optional) – A dictionary of parameters to be passed to the scaler.
dim_reduction_params (dict, optional) – A dictionary of parameters to be passed to the dimensionality reduction technique.
predictor_params (dict, optional) – A dictionary of parameters to be passed to the model.
- Returns:
A dictionary combining the parameters for all the components in the pipeline.
- Return type:
dict
- class osl_dynamics.analysis.prediction.ModelSelection(model, params_grid=None, search_type='grid', cv=5, scoring=None, n_iter=10, random_state=None, n_jobs=1, verbose=0)[source]#
Model selection class.
- Parameters:
model (sklearn.base.BaseEstimator) – The machine learning model to be used.
params_grid (dict, optional) – The hyperparameter grid for tuning.
search_type (str, default='grid') – The type of search: ‘grid’ for GridSearchCV or ‘random’ for RandomizedSearchCV.
cv (int, optional) – Number of cross-validation folds. Defaults to 5.
scoring (str, optional) – Scoring metric to optimize. Defaults to None.
n_iter (int, optional) – Number of iterations for RandomizedSearchCV. Defaults to 10.
random_state (int, optional) – Random seed for reproducibility.
n_jobs (int, optional) – Number of CPU cores to use for parallel processing. Defaults to 1.
verbose (int, optional) – Verbosity level for model selection methods. Defaults to 0.
- set_params_grid(params_grid)[source]#
Sets the hyperparameter grid for tuning.
- Parameters:
params_grid (dict) – The hyperparameter grid for tuning.
- Return type:
None
- set_cv(cv)[source]#
Sets the number of cross-validation folds.
- Parameters:
cv (int) – Number of cross-validation folds.
- Return type:
None
- set_scoring(scoring)[source]#
Sets the scoring metric.
- Parameters:
scoring (str) – The scoring metric to use.
- Return type:
None
- set_n_iter(n_iter)[source]#
Sets the number of iterations for RandomizedSearchCV.
- Parameters:
n_iter (int) – Number of iterations (must be positive integer).
- Return type:
None
- validate_data(X, y)[source]#
- Parameters:
X (Union[numpy.ndarray, list])
y (Union[numpy.ndarray, list])
- Return type:
None
- model_selection(X, y, override_best_model=True)[source]#
Performs hyperparameter tuning using cross-validation.
- Parameters:
X (array-like) – Feature matrix of shape (n_samples, n_features).
y (array-like) – Target variable of shape (n_samples,).
override_best_model (bool)
- Returns:
best_params_ (dict) – Best hyperparameters found.
best_score_ (float) – Best cross-validation score achieved.
- nested_cross_validation(X, y, split_type='kfold', outer_cv=5, shuffle=True)[source]#
Performs nested cross-validation to evaluate model performance.
- Parameters:
X (array-like) – Feature matrix of shape (n_samples, n_features).
y (array-like) – Target variable of shape (n_samples,).
split_type (str, optional) – Type of cross-validation split to use. Must be ‘kfold’ or ‘stratified_kfold’. Defaults to ‘kfold’.
outer_cv (int, optional) – Number of outer cross-validation folds. Defaults to 5.
shuffle (bool, optional) – Whether to shuffle the data before splitting. Defaults to True.
- Returns:
outer_scores – Array of test scores for each outer fold.
- Return type:
np.ndarray
- cross_validation_scores(X, y, cv=None, scoring=None)[source]#
Computes cross-validation scores for the best model.
- Parameters:
X (array-like) – Feature matrix of shape (n_samples, n_features).
y (array-like) – Target variable of shape (n_samples,).
cv (int, optional) – Number of cross-validation folds. Defaults to the instance’s cv attribute.
scoring (str, optional) – Scoring metric to use. Defaults to the instance’s scoring attribute.
- Returns:
scores – Cross-validation scores.
- Return type:
np.ndarray