Helpers

Helpers

This page contains information on additional helper functions in this package.

JudiLingMeasures.compute_all_measures_train — Method

function compute_all_measures_train(data_train::DataFrame,
                                    cue_obj_train::JudiLing.Cue_Matrix_Struct,
                                    Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                    S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                    Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                    F_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                    G_train::Union{JudiLing.SparseMatrixCSC, Matrix};
                                    res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
                                    gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                    rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                    sem_density_n::Int64=8,
                                    calculate_production_uncertainty::Bool=false,
                                    low_cost_measures_only::Bool=false)

Compute all measures currently available in JudiLingMeasures for the training data.

Arguments

data_train::DataFrame: The data for which measures should be calculated (the training data).
cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.
Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the training data.
S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.
Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the training data.
F_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Comprehension mapping matrix for the training data.
G_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Production mapping matrix for the training data.
res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (with check_gold_path=true)
gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (with check_gold_path=true)
rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (with check_gold_path=true)
sem_density_n::Int64=8: Number of neighbours to take into account in Semantic Density measure.
calculate_production_uncertainty: "Production Uncertainty" is computationally very heavy for large C matrices, therefore its computation is turned off by default.
low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.

Returns

results::DataFrame: A dataframe with all information in data_train plus all the computed measures.

source

JudiLingMeasures.compute_all_measures_train — Method

function compute_all_measures_train(data_train::DataFrame,
                                    cue_obj_train::JudiLing.Cue_Matrix_Struct,
                                    Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                    S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                    Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix};
                                    res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
                                    gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                    rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                    sem_density_n::Int64=8,
                                    calculate_production_uncertainty::Bool=false,
                                    low_cost_measures_only::Bool=false)

Compute all measures currently available in JudiLingMeasures for the training data if F and G are not available (usually for DDL models).

Arguments

data_train::DataFrame: The data for which measures should be calculated (the training data).
cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.
Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the training data.
S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.
Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the training data.
res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (with check_gold_path=true)
gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (with check_gold_path=true)
rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (with check_gold_path=true)
sem_density_n::Int64=8: Number of neighbours to take into account in Semantic Density measure.
calculate_production_uncertainty: "Production Uncertainty" is computationally very heavy for large C matrices, therefore its computation is turned off by default.
low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.

Returns

results::DataFrame: A dataframe with all information in data_train plus all the computed measures.

source

JudiLingMeasures.compute_all_measures_val — Method

function compute_all_measures_val(data_val::DataFrame,
                                  cue_obj_train::JudiLing.Cue_Matrix_Struct,
                                  cue_obj_val::JudiLing.Cue_Matrix_Struct,
                                  Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  F_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  G_train::Union{JudiLing.SparseMatrixCSC, Matrix};
                                  res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
                                  gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                  rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                  sem_density_n::Int64=8,
                                  calculate_production_uncertainty::Bool=false,
                                  low_cost_measures_only::Bool=false)

Compute all measures currently available in JudiLingMeasures for the validation data.

Arguments

data_val::DataFrame: The data for which measures should be calculated (the validation data).
cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.
cue_obj_val::JudiLing.Cue_Matrix_Struct: The cue object of the validation data.
Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the validation data.
S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.
S_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the validation data.
Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the data of interest.
F_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Comprehension mapping matrix for the training data.
G_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Production mapping matrix for the training data.
res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (with check_gold_path=true)
gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (with check_gold_path=true)
rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (with check_gold_path=true)
low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.

Returns

results::DataFrame: A dataframe with all information in data_val plus all the computed measures.

source

JudiLingMeasures.compute_all_measures_val — Method

function compute_all_measures_val(data_val::DataFrame,
                                  cue_obj_train::JudiLing.Cue_Matrix_Struct,
                                  cue_obj_val::JudiLing.Cue_Matrix_Struct,
                                  Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                                  Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix};
                                  res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
                                  gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                  rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
                                  sem_density_n::Int64=8,
                                  calculate_production_uncertainty::Bool=false,
                                  low_cost_measures_only::Bool=false)

Compute all measures currently available in JudiLingMeasures for the validation data if F and G are not available (usually for DDL models).

Arguments

data_val::DataFrame: The data for which measures should be calculated (the validation data).
cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.
cue_obj_val::JudiLing.Cue_Matrix_Struct: The cue object of the validation data.
Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the validation data.
S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.
S_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the validation data.
Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the data of interest.
res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (with check_gold_path=true)
gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (with check_gold_path=true)
rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (with check_gold_path=true)
low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.

Returns

results::DataFrame: A dataframe with all information in data_val plus all the computed measures.

source

JudiLingMeasures.correlation_diagonal_rowwise — Method

function correlation_diagonal_rowwise(S1, S2)

Computes the pairwise correlation of each row in S1 and S2, i.e. only the diagonal of the correlation matrix.

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> correlation_diagonal_rowwise(ma1, ma4)
3-element Array{Float64,1}:
 0.8660254037844387
 0.9607689228305228
 0.9819805060619657

source

JudiLingMeasures.correlation_rowwise — Method

correlation_rowwise(S1::Union{JudiLing.SparseMatrixCSC, Matrix},
                    S2::Union{JudiLing.SparseMatrixCSC, Matrix})

Compute the correlation between each row of S1 with all rows in S2.

Example

julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> correlation_rowwise(ma2, ma3)
4×4 Matrix{Float64}:
  0.662266   0.174078    0.816497  -0.905822
 -0.41762    0.29554    -0.990148   0.988623
 -0.308304   0.0368355  -0.863868   0.862538
  0.207514  -0.0909091  -0.426401   0.354787

source

JudiLingMeasures.cosine_similarity — Method

cosine_similarity(s_hat_collection, S)

Calculate cosine similarity between all predicted and all target semantic vectors

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> cosine_similarity(ma1, ma4)
3×3 Array{Float64,2}:
  0.979958  -0.857143   0.963624
 -0.979958   0.857143  -0.963624
  0.979958  -0.857143   0.963624

source

JudiLingMeasures.count_rows — Method

count_rows(dat::DataFrame)

Get the number of rows in dat.

Examples

julia> dat = DataFrame("text"=>[1,2,3])
julia> count_rows(dat)
 3

source

JudiLingMeasures.entropy — Method

entropy(ps::Union{Missing, Array, SubArray})

Compute the Shannon-Entropy of the values in ps bigger than 0.

Note: the result of this is entropy function is different to other entropy measures as a) the values are scaled between 0 and 1 first, and b) log2 instead of log is used

Examples

julia> ps = [0.1, 0.2, 0.9]
julia> entropy(ps)
1.0408520829727552

source

JudiLingMeasures.euclidean_distance_rowwise — Method

euclidean_distance_rowwise(Shat::Union{JudiLing.SparseMatrixCSC, Matrix},
                         S::Union{JudiLing.SparseMatrixCSC, Matrix})

Calculate the pairwise Euclidean distances between all rows in Shat and S.

Throws error if missing is included in any of the arrays.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> euclidean_distance_rowwise(ma1, ma4)
3×3 Matrix{Float64}:
 1.0     7.2111  1.0
 6.7082  2.0     7.28011
 1.0     7.2111  1.0

source

JudiLingMeasures.get_avg_levenshtein — Method

get_avg_levenshtein(targets::Array, preds::Array)

Get the average levenshtein distance between two lists of strings.

Examples

julia> targets = ["abc", "abc", "abc"]
julia> preds = ["abd", "abc", "ebd"]
julia> get_avg_levenshtein(targets, preds)
 1.0

source

JudiLingMeasures.get_nearest_neighbour_eucl — Method

get_nearest_neighbour_eucl(eucl_sims::Matrix)

Get the nearest neighbour for each row in eucl_sims.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> eucl_sims = euclidean_distance_array(ma1, ma4)
julia> get_nearest_neighbour_eucl(eucl_sims)
3-element Vector{Float64}:
 1.0
 2.0
 1.0

source

JudiLingMeasures.get_res_learn_df — Method

get_res_learn_df(res_learn_val, data_val, cue_obj_train, cue_obj_val)

Wrapper for JudiLing.write2df for easier use.

source

JudiLingMeasures.l1_rowwise — Method

l1_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})

Compute the L1 Norm of each row of M.

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l1_rowwise(ma1)
3×1 Matrix{Int64}:
 6
 6
 6

source

JudiLingMeasures.l2_rowwise — Method

l2_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})

Compute the L2 Norm of each row of M.

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l2_rowwise(ma1)
3×1 Matrix{Float64}:
 3.7416573867739413
 3.7416573867739413
 3.7416573867739413

source

JudiLingMeasures.make_measure_preparations — Method

function make_measure_preparations(data_train, S_train, Shat_train,
                                   res_learn_train, cue_obj_train,
                                   rpi_learn_train)

Returns all additional objects needed for measure calculations if the data of interest is the training data.

Arguments

data_train: The data for which the measures are to be calculated (training data).
S_train: The semantic matrix of the training data
Shat_train: The predicted semantic matrix of the training data.
res_learn_train: The first object return by the learn_paths_rpi algorithm for the training data.
cue_obj_train: The cue object of the training data.
rpi_learn_train: The second object return by the learn_paths_rpi algorithm for the training data.

Returns

results::DataFrame: A deepcopy of data_train.
cor_s::Matrix: Correlation matrix between Shat_train and S_train.
df::DataFrame: The output of res_learn_train (of the training data) in form of a dataframe
rpi_df::DataFrame: Stores the path information about the predicted forms (from learn_paths), which is needed to compute things like PathSum, PathCounts and PathEntropies.

source

JudiLingMeasures.make_measure_preparations — Method

function make_measure_preparations(data_val, S_train, S_val, Shat_val,
                                   res_learn_val, cue_obj_train, cue_obj_val,
                                   rpi_learn_val)

Returns all additional objects needed for measure calculations if the data of interest is the validation data.

Arguments

data_val: The data for which the measures are to be calculated (validation data).
S_train: The semantic matrix of the training data
S_val: The semantic matrix of the validation data
Shat_val: The predicted semantic matrix of the validation data.
res_learn_val: The first object return by the learn_paths_rpi algorithm for the validation data.
cue_obj_train: The cue object of the training data.
cue_obj_val: The cue object of the data of interest.
rpi_learn_val: The second object return by the learn_paths_rpi algorithm for the validation data.

Returns

results::DataFrame: A deepcopy of data_val.
cor_s::Matrix: Correlation matrix between Shat_val and S_val.
df::DataFrame: The output of res_learn_val (of the validation data) in form of a dataframe
rpi_df::DataFrame: Stores the path information about the predicted forms (from learn_paths), which is needed to compute things like PathSum, PathCounts and PathEntropies.

source

JudiLingMeasures.max_rowwise — Method

max_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})

Get the maximum of each row in S.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> max_rowwise(ma1)
3×1 Matrix{Int64}:
 3
 -1
 3

source

JudiLingMeasures.mean_rowwise — Method

mean_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})

Calculate the mean of each row in S.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> mean_rowwise(ma1)
3×1 Matrix{Float64}:
  2.0
 -2.0
  2.0

source

JudiLingMeasures.safe_length — Method

safe_length(x::Union{Missing, String})

Compute length of x, if x is missing return missing

Example

julia> safe_length(missing)
missing
julia> safe_length("abc")
3

source

JudiLingMeasures.safe_sum — Method

safe_sum(x::Array)

Compute sum of all elements of x, if x is empty return missing

Example

julia> safe_sum([])
missing
julia> safe_sum([1,2,3])
6

source

JudiLingMeasures.sem_density_mean — Method

sem_density_mean(s_cor::Union{JudiLing.SparseMatrixCSC, Matrix},
                 n::Int)

Compute the average semantic density of the predicted semantic vector with its n most correlated semantic neighbours.

Arguments

s_cor::Union{JudiLing.SparseMatrixCSC, Matrix}: the correlation matrix between S and Shat
n::Int: the number of highest semantic neighbours to take into account

Example

julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> cor_s = correlation_rowwise(ma2, ma3)
julia> sem_density_mean(cor_s, 2)
4-element Vector{Float64}:
 0.7393813797301239
 0.6420816485652429
 0.4496869233815781
 0.281150888376636

source