Test Combo

JudiLing.test_combo — Method

test_combo(test_mode;kwargs...)

A wrapper function for a full model for a specific combination of parameters. A detailed introduction is in Test Combo Introduction

Note

testcombo: testcombo is deprecated. While it will remain in the package it is no longer actively maintained.

Obligatory Arguments

test_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.

Optional Arguments

train_sample_size::Int64=0: the desired number of training data
val_sample_size::Int64=0: the desired number of validation data
val_ratio::Float64=0.0: the desired portion of validation data, if works only if :valsamplesize is 0.0.
extension::String=".csv": the extension for data nfeaturesinflections
n_grams_target_col::Union{String, Symbol}=:Word: the column name for target strings
n_grams_tokenized::Boolean=false: if true, the dataset target is assumed to be tokenized
n_grams_sep_token::String=nothing: separator
grams::Int64=3: the number of grams for cues
n_grams_keep_sep::Boolean=false: if true, keep separators in cues
start_end_token::String=":": start and end token in boundary cues
path_sep_token::String=":": path separator in the assembled path
random_seed::Int64=314: the random seed
sd_base_mean::Int64=1: the sd mean of base features
sd_inflection_mean::Int64=1: the sd mean of inflectional features
sd_base::Int64=4: the sd of base features
sd_inflection::Int64=4: the sd of inflectional features
isdeep::Boolean=true: if true, mean of each feature is also randomized
add_noise::Boolean=true: if true, add additional Gaussian noise
sd_noise::Int64=1: the sd of the Gaussian noise
normalized::Boolean=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd
if_combined::Boolean=false: if true, then features are combined with both training and validation data
learn_mode::Int64=:cholesky: which learning mode, currently supports :cholesky and :wh
method::Int64=:additive: whether :additive or :multiplicative decomposition is required
shift::Int64=0.02: shift value for :additive decomposition
multiplier::Int64=1.01: multiplier value for :multiplicative decomposition
output_format::Int64=:auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
sparse_ratio::Int64=0.05: the ratio to decide whether a matrix is sparse
wh_freq::Vector=nothing: the learning sequence
init_weights::Matrix=nothing: the initial weights
eta::Float64=0.1: the learning rate
n_epochs::Int64=1: the number of epochs to be trained
max_t::Int64=0: the number of epochs to be trained
A::Matrix=nothing: the number of epochs to be trained
A_mode::Symbol=:combined: the adjacency matrix mode, currently supports :combined or :train_only
max_can::Int64=10: the max number of candidate path to keep in the output
threshold_train::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for training data
is_tolerant_train::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for training data
tolerance_train::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for training data
max_tolerance_train::Int64=2: maximum number of n-grams allowed in a path for training data
threshold_val::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for validation data
is_tolerant_val::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for validation data
tolerance_val::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for validation data
max_tolerance_val::Int64=2: maximum number of n-grams allowed in a path for validation data
n_neighbors_train::Int64=10: the top n form neighbors to be considered for training data
n_neighbors_val::Int64=20: the top n form neighbors to be considered for validation data
issparse::Bool=false: if true, keep sparse matrix format when learning paths
output_dir::String="out": the output directory
verbose::Bool=false: if true, more information will be printed

source