JudiLing is able to call the python package pyndl internally to compute NDL models. pyndl uses event files to compute the mapping matrices, which have to be generated manually or by using pyndl in Python, see documentation here. The advantage of calling pyndl from JudiLing is that the resulting weights, cue and semantic matrices can be directly translated into JudiLing format and further processing can be done in JudiLing.

Note

For pyndl to be available in JudiLing, PyCall has to be imported before JudiLing:

using PyCall
using JudiLing

Calling pyndl from JudiLing

JudiLing.Pyndl_Weight_StructType
Pyndl_Weight_Struct
    cues::Vector{String}
    outcomes::Vector{String}
    weight::Matrix{Float64}
  • cues::Vector{String}: Vector of cues, in the order that they appear in the weight matrix.
  • outcomes::Vector{String}: Vector of outcomes, in the order that they appear in the weight matrix.
  • weight::Matrix{Float64}: Weight matrix.
source
JudiLing.pyndlMethod
pyndl(
    data_path::String;
    alpha::Float64 = 0.1,
    betas::Tuple{Float64,Float64} = (0.1, 0.1),
    method::String = "openmp"
)

Compute weights using pyndl. See the documentation of pyndl for more information: https://pyndl.readthedocs.io/en/latest/

Obligatory arguments

  • data_path::String: Path to an events file as generated by pyndl's preprocess.createeventfile

Optional arguments

  • alpha::Float64 = 0.1: α learning rate.
  • betas::Tuple{Float64,Float64} = (0.1, 0.1): β1 and β2 learning rates
  • method::String = "openmp": One of {"openmp", "threading"}. "openmp" only works on Linux.

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
source

Translating output of pyndl to cue and semantic matrices in JudiLing

With the weights in hand, the cue and semantic matrices can be computed:

JudiLing.make_cue_matrixMethod
make_cue_matrix(
    data::DataFrame,
    pyndl_weights::Pyndl_Weight_Struct;
    grams = 3,
    target_col = "Words",
    tokenized = false,
    sep_token = nothing,
    keep_sep = false,
    start_end_token = "#",
    verbose = false,
)

Make the cue matrix based on a dataframe and weights computed with pyndl. Practically this means that the cues are extracted from the weights object and translated to the JudiLing format.

Obligatory arguments

  • data::DataFrame: Dataset with all the word types on which the weights were trained.
  • pyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl

Optional argyments

  • grams = 3: N-gram size (has to match the n-gram granularity of the cues on which the weights were trained).
  • target_col = "Words": Column with target words.
  • tokenized = false: Whether the target words are already tokenized
  • sep_token = nothing: The string separating the tokens (only used if tokenized=true).
  • keep_sep = false: Whether the sep_token should be retained in the cues.
  • start_end_token = "#": The string with which to mark word boundaries.
  • verbose = false: Verbose mode.

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
cue_obj = JudiLing.make_cue_matrix("latin_train.csv", weights,
                                    grams = 3,
                                    target_col = "Word")
source
JudiLing.make_S_matrixMethod
make_S_matrix(
    data::DataFrame,
    pyndl_weights::Pyndl_Weight_Struct,
    n_features_columns::Vector;
    tokenized::Bool=false,
    sep_token::String="_"
)

Create semantic matrix based on a dataframe and weights computed with pyndl. Practically this means that the semantic features are extracted from the weights object and translated to the JudiLing format.

Obligatory arguments

  • data::DataFrame: The dataset with word types.
  • pyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl.
  • n_features_columns::Vector: Vector of columns with the features in the dataset.

Optional arguments

  • tokenized=false: Whether the features in n_features_columns columns are already tokenized (e.g. "feature1_feature2_feature3")
  • sep_token="_": The string with which the features are separated (only used if tokenized=false).

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
S = JudiLing.make_S_matrix(data,
                            weights_latin,
                            ["Lexeme", "Person", "Number", "Tense", "Voice", "Mood"],
                            tokenized=false)
source
JudiLing.make_S_matrixMethod
make_S_matrix(
    data_train::DataFrame,
    data_val::DataFrame,
    pyndl_weights::Pyndl_Weight_Struct,
    n_features_columns::Vector;
    tokenized::Bool=false,
    sep_token::String="_"
)

Create semantic matrix based on a training and validation dataframe and weights computed with pyndl. Practically this means that the semantic features are extracted from the weights object and translated to the JudiLing format.

Obligatory arguments

  • data_train::DataFrame: The training dataset.
  • data_val::DataFrame: The validation dataset.
  • pyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl.
  • n_features_columns::Vector: Vector of columns with the features in the training and validation datasets.

Optional arguments

  • tokenized=false: Whether the features in n_features_columns columns are already tokenized (e.g. "feature1_feature2_feature3")
  • sep_token="_": The string with which the features are separated (only used if tokenized=false).

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
S_train, S_val = JudiLing.make_S_matrix(train,
                            val,
                            weights_latin,
                            ["Lexeme", "Person", "Number", "Tense", "Voice", "Mood"],
                            tokenized=false)
source