`yokome.models.cross_validation`¶

yokome.models.cross_validation.cross_validate(seed_dir, language, n_samples, n_splits, evl_size, max_epochs, batch_size, max_generalization_loss, min_coverage, hyperparams, seed=None, verbose=False, dashboard_port=6006)¶

Perform cross-validation on the

The process is designed to be able to continue with minimal additional effort after a crash. It can therefore be stopped and taken up again later.

Tensorboard is served during each training run.

Parameters

seed_dir (str) – Where to store model data for this seed. If cross-validation is performed for multiple seeds, multiple seed directories are needed.
language (yokome.language.Language) – The language to train on.
n_samples (int) – The number of sample sentences to load.
n_splits (int) – The number k of folds.
evl_size (float) – The portion of evaluation samples w.r.t. the non-validation part of all samples.
max_epochs (int) – The maximum number of epochs to train for. The actual number of epochs may be less if the training process stops early.
batch_size (int) – The number of sentences to estimate the probability for in parallel.
max_generalization_loss (float) – The maximum generalization loss at which the training process is still continued.
min_coverage – The portion of the corpus that has to be covered by the minimal vocabulary of the most frequent words that is used to encode incoming data.
hyperparams – The model parameters used in this pass of cross-validation.
seed (int) – The seed used for the pseudo-random number generator that generates the seeds for the models to be trained.
verbose (bool) – Whether to print progress indiation.
dashboard_port (int) – On which port to serve Tensorboard.

Returns

The average loss over all folds.

yokome.models.cross_validation.kfold(language, n_samples=None, n_splits=5, evl_size=0.25)¶

Create splits of corpus sentences to be used in cross-validation.

The sentences are loaded using the languages load method. The splits are performed randomly, and differently for different numbers of samples.

Parameters

language (yokome.language.Language) – The language to train on.
n_samples (int) – The number of sample sentences to load.
n_splits (int) – The number k of folds.
evl_size (float) – The portion of evaluation samples w.r.t. the non-validation part of all samples.

Returns

An iterable over triples of tuples over sentences. Each triple consists of the training, evaluation and validation splits, respectively.

yokome.models.cross_validation¶

`yokome.models.cross_validation`¶