osl_dynamics.utils.sklearn_wrappers#

Wrappers for scikit-learn.

Functions#

linear_regression(X, y, fit_intercept[, normalize, ...])

Wrapper for sklearn.linear_model.LinearRegression.

fit_gaussian_mixture(X[, logit_transform, ...])

Fits a two-component Gaussian Mixture Model (GMM).

Module Contents#

osl_dynamics.utils.sklearn_wrappers.linear_regression(X, y, fit_intercept, normalize=False, log_message=False, n_jobs=-1)[source]#

Wrapper for sklearn.linear_model.LinearRegression.

Parameters:
  • X (np.ndarray) – Regressors, should be a 2D array (n_targets, n_regressors).

  • y (np.ndarray) – Targets. Should be a 2D array: (n_targets, n_features). If a higher dimension array is passed, the extra dimensions are concatenated.

  • fit_intercept (bool) – Should we fit an intercept?

  • normalize (bool, optional) – Should we z-score the regressors?

  • log_message (bool, optional) – Should we log a message?

  • n_jobs (int, optional) – Number of parallel jobs.

Returns:

  • coefs (np.ndarray) – Regression coefficients. 2D array or higher dimensionality: (n_regressors, n_features).

  • intercept (np.ndarray) – Regression intercept. 1D array or higher dimensionality: (n_features,). Returned if fit_intercept=True.

Return type:

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

osl_dynamics.utils.sklearn_wrappers.fit_gaussian_mixture(X, logit_transform=False, standardize=True, p_value=None, one_component_percentile=None, n_sigma=0, label_order='mean', sklearn_kwargs=None, return_statistics=False, show_plot=False, plot_filename=None, plot_kwargs=None, log_message=True)[source]#

Fits a two-component Gaussian Mixture Model (GMM).

Parameters:
  • X (np.ndarray) – Data to fit GMM to. Must be 1D.

  • logit_transform (bool, optional) – Should we logit transform the X?

  • standardize (bool, optional) – Should we standardize X?

  • p_value (float, optional) – Used to determine a threshold. We ensure the data points assigned to the ‘on’ component have a probability of less than p_value of belonging to the ‘off’ component.

  • one_component_percentile (float, optional) – Percentile threshold if only one component is found. Should be between 0 and 100. E.g. for the 95th percentile, one_component_percentile=95.

  • n_sigma (float, optional) – Number of standard deviations of the ‘off’ component the mean of the ‘on’ component must be for the fit to be considered to have two components.

  • label_order (str, optional) – How do we order the inferred classes?

  • sklearn_kwargs (dict, optional) – Dictionary of keyword arguments to pass to sklearn.mixture.GaussianMixture.

  • return_statistics (bool, optional) – Should we return statistics of the Gaussian mixture components?

  • show_plot (bool, optional) – Should we show the GMM fit to the distribution of X.

  • plot_filename (str, optional) – Filename to save a plot of the Gaussian mixture model.

  • plot_kwargs (dict, optional) – Keyword arguments to pass to osl_dynamics.utils.plotting.plot_gmm() Only used if plot_filename is not None.

  • log_message (bool) – Should we log a message?

Returns:

threshold – Threshold for the on class.

Return type:

float