osl_dynamics.data.tf#

Function related to TensorFlow datasets.

Module Contents#

Functions#

get_n_sequences(arr, sequence_length[, step_size])

Calculate the number of sequences an array will be split into.

concatenate_datasets(datasets)

Concatenates a list of TensorFlow datasets.

create_dataset(data, sequence_length, step_size)

Creates a TensorFlow dataset of batched time series data.

save_tfrecord(data, sequence_length, step_size, filepath)

Save dataset to a TFRecord file.

load_tfrecord_dataset(tfrecord_dir, batch_size[, ...])

Load a TFRecord dataset.

_validate_tf_dataset(dataset)

Check if the input is a valid TensorFlow dataset.

get_range(dataset)

The range (max-min) of values contained in a batched Tensorflow dataset.

get_n_channels(dataset)

Get the number of channels in a batched TensorFlow dataset.

get_n_batches(dataset)

Get number of batches in a TensorFlow dataset.

osl_dynamics.data.tf.get_n_sequences(arr, sequence_length, step_size=None)[source]#

Calculate the number of sequences an array will be split into.

Parameters:
  • arr (np.ndarray) – Time series data.

  • sequence_length (int) – Length of sequences which the data will be segmented in to.

  • step_size (int, optional) – The number of samples by which to move the sliding window between sequences.

Returns:

n – Number of sequences.

Return type:

int

osl_dynamics.data.tf.concatenate_datasets(datasets)[source]#

Concatenates a list of TensorFlow datasets.

Parameters:

datasets (list) – List of TensorFlow datasets.

Returns:

full_dataset – Concatenated dataset.

Return type:

tf.data.Dataset

osl_dynamics.data.tf.create_dataset(data, sequence_length, step_size)[source]#

Creates a TensorFlow dataset of batched time series data.

Parameters:
  • data (dict) – Dictionary containing data to batch. Keys correspond to the input name for the model and the value is the data.

  • sequence_length (int) – Sequence length to batch the data.

  • step_size (int) – Number of samples to slide the sequence across the data.

Returns:

dataset – TensorFlow dataset.

Return type:

tf.data.Dataset

osl_dynamics.data.tf.save_tfrecord(data, sequence_length, step_size, filepath)[source]#

Save dataset to a TFRecord file.

Parameters:
  • data (dict) – Dictionary containing data to batch. Keys correspond to the input name for the model and the value is the data.

  • sequence_length (int) – Sequence length to batch the data.

  • step_size (int) – Number of samples to slide the sequence across the data.

  • filepath (str) – Path to save the TFRecord file.

osl_dynamics.data.tf.load_tfrecord_dataset(tfrecord_dir, batch_size, shuffle=True, validation_split=None, concatenate=True, drop_last_batch=False, buffer_size=100000, keep=None)[source]#

Load a TFRecord dataset.

Parameters:
  • tfrecord_dir (str) – Directory containing the TFRecord datasets.

  • batch_size (int) – Number sequences in each mini-batch which is used to train the model.

  • shuffle (bool, optional) – Should we shuffle sequences (within a batch) and batches.

  • validation_split (float, optional) – Ratio to split the dataset into a training and validation set.

  • concatenate (bool, optional) – Should we concatenate the datasets for each array?

  • drop_last_batch (bool, optional) – Should we drop the last batch if it is smaller than the batch size?

  • buffer_size (int, optional) – Buffer size for shuffling a TensorFlow Dataset. Smaller values will lead to less random shuffling but will be quicker. Default is 100000.

  • keep (list of int, optional) – List of session indices to keep. If None, then all sessions are kept.

Returns:

dataset – Dataset for training or evaluating the model along with the validation set if validation_split was passed.

Return type:

tf.data.Dataset or tuple

osl_dynamics.data.tf._validate_tf_dataset(dataset)[source]#

Check if the input is a valid TensorFlow dataset.

Parameters:

dataset (tf.data.Dataset or list) – TensorFlow dataset or list of datasets.

Returns:

dataset – TensorFlow dataset.

Return type:

tf.data.Dataset

osl_dynamics.data.tf.get_range(dataset)[source]#

The range (max-min) of values contained in a batched Tensorflow dataset.

Parameters:

dataset (tf.data.Dataset) – TensorFlow dataset.

Returns:

range – Range of each channel.

Return type:

np.ndarray

osl_dynamics.data.tf.get_n_channels(dataset)[source]#

Get the number of channels in a batched TensorFlow dataset.

Parameters:

dataset (tf.data.Dataset) – TensorFlow dataset.

Returns:

n_channels – Number of channels.

Return type:

int

osl_dynamics.data.tf.get_n_batches(dataset)[source]#

Get number of batches in a TensorFlow dataset.

Parameters:

dataset (tf.data.Dataset) – TensorFlow dataset.

Returns:

n_batches – Number of batches.

Return type:

int