yokome.models.cross_validation
¶
-
yokome.models.cross_validation.
cross_validate
(seed_dir, language, n_samples, n_splits, evl_size, max_epochs, batch_size, max_generalization_loss, min_coverage, hyperparams, seed=None, verbose=False, dashboard_port=6006)¶ Perform cross-validation on the
The process is designed to be able to continue with minimal additional effort after a crash. It can therefore be stopped and taken up again later.
Tensorboard is served during each training run.
- Parameters
seed_dir (str) – Where to store model data for this seed. If cross-validation is performed for multiple seeds, multiple seed directories are needed.
language (yokome.language.Language) – The language to train on.
n_samples (int) – The number of sample sentences to load.
n_splits (int) – The number
k
of folds.evl_size (float) – The portion of evaluation samples w.r.t. the non-validation part of all samples.
max_epochs (int) – The maximum number of epochs to train for. The actual number of epochs may be less if the training process stops early.
batch_size (int) – The number of sentences to estimate the probability for in parallel.
max_generalization_loss (float) – The maximum generalization loss at which the training process is still continued.
min_coverage – The portion of the corpus that has to be covered by the minimal vocabulary of the most frequent words that is used to encode incoming data.
hyperparams – The model parameters used in this pass of cross-validation.
seed (int) – The seed used for the pseudo-random number generator that generates the seeds for the models to be trained.
verbose (bool) – Whether to print progress indiation.
dashboard_port (int) – On which port to serve Tensorboard.
- Returns
The average loss over all folds.
-
yokome.models.cross_validation.
kfold
(language, n_samples=None, n_splits=5, evl_size=0.25)¶ Create splits of corpus sentences to be used in cross-validation.
The sentences are loaded using the languages
load
method. The splits are performed randomly, and differently for different numbers of samples.- Parameters
language (yokome.language.Language) – The language to train on.
n_samples (int) – The number of sample sentences to load.
n_splits (int) – The number
k
of folds.evl_size (float) – The portion of evaluation samples w.r.t. the non-validation part of all samples.
- Returns
An iterable over triples of tuples over sentences. Each triple consists of the training, evaluation and validation splits, respectively.