osl_dynamics.data.tf#
Function related to TensorFlow datasets.
Functions#
|
Calculate the number of sequences an array will be split into. |
|
Concatenates a list of TensorFlow datasets. |
|
Creates a TensorFlow dataset of batched time series data. |
|
Save dataset to a TFRecord file. |
|
Load a TFRecord dataset. |
|
The range (max-min) of values contained in a batched Tensorflow dataset. |
|
Get the number of channels in a batched TensorFlow dataset. |
|
Get number of batches in a TensorFlow dataset. |
|
Get number of sequences and range (max-min) of values. |
Module Contents#
- osl_dynamics.data.tf.get_n_sequences(arr, sequence_length, step_size=None)[source]#
Calculate the number of sequences an array will be split into.
- Parameters:
arr (np.ndarray) – Time series data.
sequence_length (int) – Length of sequences which the data will be segmented in to.
step_size (int, optional) – The number of samples by which to move the sliding window between sequences.
- Returns:
n – Number of sequences.
- Return type:
int
- osl_dynamics.data.tf.concatenate_datasets(datasets)[source]#
Concatenates a list of TensorFlow datasets.
- Parameters:
datasets (list) – List of TensorFlow datasets.
- Returns:
full_dataset – Concatenated dataset.
- Return type:
tf.data.Dataset
- osl_dynamics.data.tf.create_dataset(data, sequence_length, step_size)[source]#
Creates a TensorFlow dataset of batched time series data.
- Parameters:
data (dict) – Dictionary containing data to batch. Keys correspond to the input name for the model and the value is the data.
sequence_length (int) – Sequence length to batch the data.
step_size (int) – Number of samples to slide the sequence across the data.
- Returns:
dataset – TensorFlow dataset.
- Return type:
tf.data.Dataset
- osl_dynamics.data.tf.save_tfrecord(data, sequence_length, step_size, filepath)[source]#
Save dataset to a TFRecord file.
- Parameters:
data (dict) – Dictionary containing data to batch. Keys correspond to the input name for the model and the value is the data.
sequence_length (int) – Sequence length to batch the data.
step_size (int) – Number of samples to slide the sequence across the data.
filepath (str) – Path to save the TFRecord file.
- Return type:
None
- osl_dynamics.data.tf.load_tfrecord_dataset(tfrecord_dir, batch_size, shuffle=True, concatenate=True, drop_last_batch=False, buffer_size=4000, keep=None)[source]#
Load a TFRecord dataset.
- Parameters:
tfrecord_dir (str) – Directory containing the TFRecord datasets.
batch_size (int) – Number sequences in each mini-batch which is used to train the model.
shuffle (bool, optional) – Should we shuffle sequences (within a batch) and batches.
concatenate (bool, optional) – Should we concatenate the datasets for each array?
drop_last_batch (bool, optional) – Should we drop the last batch if it is smaller than the batch size?
buffer_size (int, optional) – Buffer size for shuffling a TensorFlow Dataset. Smaller values will lead to less random shuffling but will be quicker. Default is 100000.
keep (list of int, optional) – List of session indices to keep. If
None, then all sessions are kept.
- Returns:
dataset – Dataset for training or evaluating the model along with the validation set
validation_splitis present in the config.- Return type:
tf.data.TFRecordDataset or tuple of tf.data.TFRecordDataset
- osl_dynamics.data.tf.get_range(dataset)[source]#
The range (max-min) of values contained in a batched Tensorflow dataset.
- Parameters:
dataset (tf.data.Dataset) – TensorFlow dataset.
- Returns:
range_ – Range of each channel.
- Return type:
np.ndarray
- osl_dynamics.data.tf.get_n_channels(dataset)[source]#
Get the number of channels in a batched TensorFlow dataset.
- Parameters:
dataset (tf.data.Dataset) – TensorFlow dataset.
- Returns:
n_channels – Number of channels.
- Return type:
int
- osl_dynamics.data.tf.get_n_batches(dataset)[source]#
Get number of batches in a TensorFlow dataset.
- Parameters:
dataset (tf.data.Dataset) – TensorFlow dataset.
- Returns:
n_batches – Number of batches.
- Return type:
int
- osl_dynamics.data.tf.get_n_sequences_and_range(dataset)[source]#
Get number of sequences and range (max-min) of values.
- Parameters:
dataset (tf.data.Dataset) – TensorFlow dataset.
- Returns:
n_sequences (int) – Number of batches.
range_ (np.ndarray) – Range of each channel.
- Return type:
Tuple[int, numpy.ndarray]