Helpers
This page contains information on additional helper functions in this package.
JudiLingMeasures.compute_all_measures_train — Methodfunction compute_all_measures_train(data_train::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
F_train::Union{JudiLing.SparseMatrixCSC, Matrix},
G_train::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)Compute all measures currently available in JudiLingMeasures for the training data.
Arguments
data_train::DataFrame: The data for which measures should be calculated (the training data).cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the training data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the training data.F_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Comprehension mapping matrix for the training data.G_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Production mapping matrix for the training data.res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true)gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true)rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true)sem_density_n::Int64=8: Number of neighbours to take into account in Semantic Density measure.calculate_production_uncertainty: "Production Uncertainty" is computationally very heavy for large C matrices, therefore its computation is turned off by default.low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame: A dataframe with all information indata_trainplus all the computed measures.
JudiLingMeasures.compute_all_measures_train — Methodfunction compute_all_measures_train(data_train::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)Compute all measures currently available in JudiLingMeasures for the training data if F and G are not available (usually for DDL models).
Arguments
data_train::DataFrame: The data for which measures should be calculated (the training data).cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the training data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the training data.res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true)gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true)rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true)sem_density_n::Int64=8: Number of neighbours to take into account in Semantic Density measure.calculate_production_uncertainty: "Production Uncertainty" is computationally very heavy for large C matrices, therefore its computation is turned off by default.low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame: A dataframe with all information indata_trainplus all the computed measures.
JudiLingMeasures.compute_all_measures_val — Methodfunction compute_all_measures_val(data_val::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
cue_obj_val::JudiLing.Cue_Matrix_Struct,
Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
F_train::Union{JudiLing.SparseMatrixCSC, Matrix},
G_train::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)Compute all measures currently available in JudiLingMeasures for the validation data.
Arguments
data_val::DataFrame: The data for which measures should be calculated (the validation data).cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.cue_obj_val::JudiLing.Cue_Matrix_Struct: The cue object of the validation data.Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the validation data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.S_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the validation data.Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the data of interest.F_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Comprehension mapping matrix for the training data.G_train::Union{JudiLing.SparseMatrixCSC, Matrix}: Production mapping matrix for the training data.res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true)gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true)rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true)low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame: A dataframe with all information indata_valplus all the computed measures.
JudiLingMeasures.compute_all_measures_val — Methodfunction compute_all_measures_val(data_val::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
cue_obj_val::JudiLing.Cue_Matrix_Struct,
Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)Compute all measures currently available in JudiLingMeasures for the validation data if F and G are not available (usually for DDL models).
Arguments
data_val::DataFrame: The data for which measures should be calculated (the validation data).cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.cue_obj_val::JudiLing.Cue_Matrix_Struct: The cue object of the validation data.Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the validation data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the training data.S_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the validation data.Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the data of interest.res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true)gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true)rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true)low_cost_measures_only::Bool=false: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame: A dataframe with all information indata_valplus all the computed measures.
JudiLingMeasures.correlation_diagonal_rowwise — Methodfunction correlation_diagonal_rowwise(S1, S2)Computes the pairwise correlation of each row in S1 and S2, i.e. only the diagonal of the correlation matrix.
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> correlation_diagonal_rowwise(ma1, ma4)
3-element Array{Float64,1}:
0.8660254037844387
0.9607689228305228
0.9819805060619657JudiLingMeasures.correlation_rowwise — Methodcorrelation_rowwise(S1::Union{JudiLing.SparseMatrixCSC, Matrix},
S2::Union{JudiLing.SparseMatrixCSC, Matrix})Compute the correlation between each row of S1 with all rows in S2.
Example
julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> correlation_rowwise(ma2, ma3)
4×4 Matrix{Float64}:
0.662266 0.174078 0.816497 -0.905822
-0.41762 0.29554 -0.990148 0.988623
-0.308304 0.0368355 -0.863868 0.862538
0.207514 -0.0909091 -0.426401 0.354787JudiLingMeasures.cosine_similarity — Methodcosine_similarity(s_hat_collection, S)Calculate cosine similarity between all predicted and all target semantic vectors
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> cosine_similarity(ma1, ma4)
3×3 Array{Float64,2}:
0.979958 -0.857143 0.963624
-0.979958 0.857143 -0.963624
0.979958 -0.857143 0.963624JudiLingMeasures.count_rows — Methodcount_rows(dat::DataFrame)Get the number of rows in dat.
Examples
julia> dat = DataFrame("text"=>[1,2,3])
julia> count_rows(dat)
3JudiLingMeasures.entropy — Methodentropy(ps::Union{Missing, Array, SubArray})Compute the Shannon-Entropy of the values in ps bigger than 0.
Note: the result of this is entropy function is different to other entropy measures as a) the values are scaled between 0 and 1 first, and b) log2 instead of log is used
Examples
julia> ps = [0.1, 0.2, 0.9]
julia> entropy(ps)
1.0408520829727552JudiLingMeasures.euclidean_distance_rowwise — Methodeuclidean_distance_rowwise(Shat::Union{JudiLing.SparseMatrixCSC, Matrix},
S::Union{JudiLing.SparseMatrixCSC, Matrix})Calculate the pairwise Euclidean distances between all rows in Shat and S.
Throws error if missing is included in any of the arrays.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> euclidean_distance_rowwise(ma1, ma4)
3×3 Matrix{Float64}:
1.0 7.2111 1.0
6.7082 2.0 7.28011
1.0 7.2111 1.0JudiLingMeasures.get_avg_levenshtein — Methodget_avg_levenshtein(targets::Array, preds::Array)Get the average levenshtein distance between two lists of strings.
Examples
julia> targets = ["abc", "abc", "abc"]
julia> preds = ["abd", "abc", "ebd"]
julia> get_avg_levenshtein(targets, preds)
1.0JudiLingMeasures.get_nearest_neighbour_eucl — Methodget_nearest_neighbour_eucl(eucl_sims::Matrix)Get the nearest neighbour for each row in eucl_sims.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> eucl_sims = euclidean_distance_array(ma1, ma4)
julia> get_nearest_neighbour_eucl(eucl_sims)
3-element Vector{Float64}:
1.0
2.0
1.0JudiLingMeasures.get_res_learn_df — Methodget_res_learn_df(res_learn_val, data_val, cue_obj_train, cue_obj_val)Wrapper for JudiLing.write2df for easier use.
JudiLingMeasures.l1_rowwise — Methodl1_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})Compute the L1 Norm of each row of M.
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l1_rowwise(ma1)
3×1 Matrix{Int64}:
6
6
6JudiLingMeasures.l2_rowwise — Methodl2_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})Compute the L2 Norm of each row of M.
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l2_rowwise(ma1)
3×1 Matrix{Float64}:
3.7416573867739413
3.7416573867739413
3.7416573867739413JudiLingMeasures.make_measure_preparations — Methodfunction make_measure_preparations(data_train, S_train, Shat_train,
res_learn_train, cue_obj_train,
rpi_learn_train)Returns all additional objects needed for measure calculations if the data of interest is the training data.
Arguments
data_train: The data for which the measures are to be calculated (training data).S_train: The semantic matrix of the training dataShat_train: The predicted semantic matrix of the training data.res_learn_train: The first object return by thelearn_paths_rpialgorithm for the training data.cue_obj_train: The cue object of the training data.rpi_learn_train: The second object return by thelearn_paths_rpialgorithm for the training data.
Returns
results::DataFrame: A deepcopy ofdata_train.cor_s::Matrix: Correlation matrix betweenShat_trainandS_train.df::DataFrame: The output ofres_learn_train(of the training data) in form of a dataframerpi_df::DataFrame: Stores the path information about the predicted forms (fromlearn_paths), which is needed to compute things like PathSum, PathCounts and PathEntropies.
JudiLingMeasures.make_measure_preparations — Methodfunction make_measure_preparations(data_val, S_train, S_val, Shat_val,
res_learn_val, cue_obj_train, cue_obj_val,
rpi_learn_val)Returns all additional objects needed for measure calculations if the data of interest is the validation data.
Arguments
data_val: The data for which the measures are to be calculated (validation data).S_train: The semantic matrix of the training dataS_val: The semantic matrix of the validation dataShat_val: The predicted semantic matrix of the validation data.res_learn_val: The first object return by thelearn_paths_rpialgorithm for the validation data.cue_obj_train: The cue object of the training data.cue_obj_val: The cue object of the data of interest.rpi_learn_val: The second object return by thelearn_paths_rpialgorithm for the validation data.
Returns
results::DataFrame: A deepcopy ofdata_val.cor_s::Matrix: Correlation matrix betweenShat_valandS_val.df::DataFrame: The output ofres_learn_val(of the validation data) in form of a dataframerpi_df::DataFrame: Stores the path information about the predicted forms (fromlearn_paths), which is needed to compute things like PathSum, PathCounts and PathEntropies.
JudiLingMeasures.max_rowwise — Methodmax_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})Get the maximum of each row in S.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> max_rowwise(ma1)
3×1 Matrix{Int64}:
3
-1
3JudiLingMeasures.mean_rowwise — Methodmean_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})Calculate the mean of each row in S.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> mean_rowwise(ma1)
3×1 Matrix{Float64}:
2.0
-2.0
2.0JudiLingMeasures.safe_length — Methodsafe_length(x::Union{Missing, String})Compute length of x, if x is missing return missing
Example
julia> safe_length(missing)
missing
julia> safe_length("abc")
3JudiLingMeasures.safe_sum — Methodsafe_sum(x::Array)Compute sum of all elements of x, if x is empty return missing
Example
julia> safe_sum([])
missing
julia> safe_sum([1,2,3])
6JudiLingMeasures.sem_density_mean — Methodsem_density_mean(s_cor::Union{JudiLing.SparseMatrixCSC, Matrix},
n::Int)Compute the average semantic density of the predicted semantic vector with its n most correlated semantic neighbours.
Arguments
s_cor::Union{JudiLing.SparseMatrixCSC, Matrix}: the correlation matrix between S and Shatn::Int: the number of highest semantic neighbours to take into account
Example
julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> cor_s = correlation_rowwise(ma2, ma3)
julia> sem_density_mean(cor_s, 2)
4-element Vector{Float64}:
0.7393813797301239
0.6420816485652429
0.4496869233815781
0.281150888376636