Helpers
This page contains information on additional helper functions in this package.
JudiLingMeasures.compute_all_measures_train
— Methodfunction compute_all_measures_train(data_train::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
F_train::Union{JudiLing.SparseMatrixCSC, Matrix},
G_train::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)
Compute all measures currently available in JudiLingMeasures for the training data.
Arguments
data_train::DataFrame
: The data for which measures should be calculated (the training data).cue_obj_train::JudiLing.Cue_Matrix_Struct
: The cue object of the training data.Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Chat matrix of the training data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The S matrix of the training data.Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Shat matrix of the training data.F_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: Comprehension mapping matrix for the training data.G_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: Production mapping matrix for the training data.res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing
: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)sem_density_n::Int64=8
: Number of neighbours to take into account in Semantic Density measure.calculate_production_uncertainty
: "Production Uncertainty" is computationally very heavy for large C matrices, therefore its computation is turned off by default.low_cost_measures_only::Bool=false
: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame
: A dataframe with all information indata_train
plus all the computed measures.
JudiLingMeasures.compute_all_measures_train
— Methodfunction compute_all_measures_train(data_train::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)
Compute all measures currently available in JudiLingMeasures for the training data if F and G are not available (usually for DDL models).
Arguments
data_train::DataFrame
: The data for which measures should be calculated (the training data).cue_obj_train::JudiLing.Cue_Matrix_Struct
: The cue object of the training data.Chat_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Chat matrix of the training data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The S matrix of the training data.Shat_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Shat matrix of the training data.res_learn_train::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing
: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)gpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)rpi_learn_train::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)sem_density_n::Int64=8
: Number of neighbours to take into account in Semantic Density measure.calculate_production_uncertainty
: "Production Uncertainty" is computationally very heavy for large C matrices, therefore its computation is turned off by default.low_cost_measures_only::Bool=false
: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame
: A dataframe with all information indata_train
plus all the computed measures.
JudiLingMeasures.compute_all_measures_val
— Methodfunction compute_all_measures_val(data_val::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
cue_obj_val::JudiLing.Cue_Matrix_Struct,
Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
F_train::Union{JudiLing.SparseMatrixCSC, Matrix},
G_train::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)
Compute all measures currently available in JudiLingMeasures for the validation data.
Arguments
data_val::DataFrame
: The data for which measures should be calculated (the validation data).cue_obj_train::JudiLing.Cue_Matrix_Struct
: The cue object of the training data.cue_obj_val::JudiLing.Cue_Matrix_Struct
: The cue object of the validation data.Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Chat matrix of the validation data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The S matrix of the training data.S_val::Union{JudiLing.SparseMatrixCSC, Matrix}
: The S matrix of the validation data.Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Shat matrix of the data of interest.F_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: Comprehension mapping matrix for the training data.G_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: Production mapping matrix for the training data.res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing
: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)low_cost_measures_only::Bool=false
: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame
: A dataframe with all information indata_val
plus all the computed measures.
JudiLingMeasures.compute_all_measures_val
— Methodfunction compute_all_measures_val(data_val::DataFrame,
cue_obj_train::JudiLing.Cue_Matrix_Struct,
cue_obj_val::JudiLing.Cue_Matrix_Struct,
Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
S_train::Union{JudiLing.SparseMatrixCSC, Matrix},
S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix};
res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing,
gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing,
sem_density_n::Int64=8,
calculate_production_uncertainty::Bool=false,
low_cost_measures_only::Bool=false)
Compute all measures currently available in JudiLingMeasures for the validation data if F and G are not available (usually for DDL models).
Arguments
data_val::DataFrame
: The data for which measures should be calculated (the validation data).cue_obj_train::JudiLing.Cue_Matrix_Struct
: The cue object of the training data.cue_obj_val::JudiLing.Cue_Matrix_Struct
: The cue object of the validation data.Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Chat matrix of the validation data.S_train::Union{JudiLing.SparseMatrixCSC, Matrix}
: The S matrix of the training data.S_val::Union{JudiLing.SparseMatrixCSC, Matrix}
: The S matrix of the validation data.Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}
: The Shat matrix of the data of interest.res_learn_val::Union{Array{Array{JudiLing.Result_Path_Info_Struct,1},1}, Missing}=missing
: The first output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)gpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The second output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)rpi_learn_val::Union{Array{JudiLing.Gold_Path_Info_Struct,1}, Missing}=missing
: The third output of JudiLing.learnpathsrpi (withcheck_gold_path=true
)low_cost_measures_only::Bool=false
: Only compute measures which are not computationally heavy. Recommended for very large datasets.
Returns
results::DataFrame
: A dataframe with all information indata_val
plus all the computed measures.
JudiLingMeasures.correlation_diagonal_rowwise
— Methodfunction correlation_diagonal_rowwise(S1, S2)
Computes the pairwise correlation of each row in S1 and S2, i.e. only the diagonal of the correlation matrix.
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> correlation_diagonal_rowwise(ma1, ma4)
3-element Array{Float64,1}:
0.8660254037844387
0.9607689228305228
0.9819805060619657
JudiLingMeasures.correlation_rowwise
— Methodcorrelation_rowwise(S1::Union{JudiLing.SparseMatrixCSC, Matrix},
S2::Union{JudiLing.SparseMatrixCSC, Matrix})
Compute the correlation between each row of S1 with all rows in S2.
Example
julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> correlation_rowwise(ma2, ma3)
4×4 Matrix{Float64}:
0.662266 0.174078 0.816497 -0.905822
-0.41762 0.29554 -0.990148 0.988623
-0.308304 0.0368355 -0.863868 0.862538
0.207514 -0.0909091 -0.426401 0.354787
JudiLingMeasures.cosine_similarity
— Methodcosine_similarity(s_hat_collection, S)
Calculate cosine similarity between all predicted and all target semantic vectors
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> cosine_similarity(ma1, ma4)
3×3 Array{Float64,2}:
0.979958 -0.857143 0.963624
-0.979958 0.857143 -0.963624
0.979958 -0.857143 0.963624
JudiLingMeasures.count_rows
— Methodcount_rows(dat::DataFrame)
Get the number of rows in dat.
Examples
julia> dat = DataFrame("text"=>[1,2,3])
julia> count_rows(dat)
3
JudiLingMeasures.entropy
— Methodentropy(ps::Union{Missing, Array, SubArray})
Compute the Shannon-Entropy of the values in ps bigger than 0.
Note: the result of this is entropy function is different to other entropy measures as a) the values are scaled between 0 and 1 first, and b) log2 instead of log is used
Examples
julia> ps = [0.1, 0.2, 0.9]
julia> entropy(ps)
1.0408520829727552
JudiLingMeasures.euclidean_distance_rowwise
— Methodeuclidean_distance_rowwise(Shat::Union{JudiLing.SparseMatrixCSC, Matrix},
S::Union{JudiLing.SparseMatrixCSC, Matrix})
Calculate the pairwise Euclidean distances between all rows in Shat and S.
Throws error if missing is included in any of the arrays.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> euclidean_distance_rowwise(ma1, ma4)
3×3 Matrix{Float64}:
1.0 7.2111 1.0
6.7082 2.0 7.28011
1.0 7.2111 1.0
JudiLingMeasures.get_avg_levenshtein
— Methodget_avg_levenshtein(targets::Array, preds::Array)
Get the average levenshtein distance between two lists of strings.
Examples
julia> targets = ["abc", "abc", "abc"]
julia> preds = ["abd", "abc", "ebd"]
julia> get_avg_levenshtein(targets, preds)
1.0
JudiLingMeasures.get_nearest_neighbour_eucl
— Methodget_nearest_neighbour_eucl(eucl_sims::Matrix)
Get the nearest neighbour for each row in eucl_sims
.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> eucl_sims = euclidean_distance_array(ma1, ma4)
julia> get_nearest_neighbour_eucl(eucl_sims)
3-element Vector{Float64}:
1.0
2.0
1.0
JudiLingMeasures.get_res_learn_df
— Methodget_res_learn_df(res_learn_val, data_val, cue_obj_train, cue_obj_val)
Wrapper for JudiLing.write2df for easier use.
JudiLingMeasures.l1_rowwise
— Methodl1_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})
Compute the L1 Norm of each row of M
.
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l1_rowwise(ma1)
3×1 Matrix{Int64}:
6
6
6
JudiLingMeasures.l2_rowwise
— Methodl2_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})
Compute the L2 Norm of each row of M
.
Example
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l2_rowwise(ma1)
3×1 Matrix{Float64}:
3.7416573867739413
3.7416573867739413
3.7416573867739413
JudiLingMeasures.make_measure_preparations
— Methodfunction make_measure_preparations(data_train, S_train, Shat_train,
res_learn_train, cue_obj_train,
rpi_learn_train)
Returns all additional objects needed for measure calculations if the data of interest is the training data.
Arguments
data_train
: The data for which the measures are to be calculated (training data).S_train
: The semantic matrix of the training dataShat_train
: The predicted semantic matrix of the training data.res_learn_train
: The first object return by thelearn_paths_rpi
algorithm for the training data.cue_obj_train
: The cue object of the training data.rpi_learn_train
: The second object return by thelearn_paths_rpi
algorithm for the training data.
Returns
results::DataFrame
: A deepcopy ofdata_train
.cor_s::Matrix
: Correlation matrix betweenShat_train
andS_train
.df::DataFrame
: The output ofres_learn_train
(of the training data) in form of a dataframerpi_df::DataFrame
: Stores the path information about the predicted forms (fromlearn_paths
), which is needed to compute things like PathSum, PathCounts and PathEntropies.
JudiLingMeasures.make_measure_preparations
— Methodfunction make_measure_preparations(data_val, S_train, S_val, Shat_val,
res_learn_val, cue_obj_train, cue_obj_val,
rpi_learn_val)
Returns all additional objects needed for measure calculations if the data of interest is the validation data.
Arguments
data_val
: The data for which the measures are to be calculated (validation data).S_train
: The semantic matrix of the training dataS_val
: The semantic matrix of the validation dataShat_val
: The predicted semantic matrix of the validation data.res_learn_val
: The first object return by thelearn_paths_rpi
algorithm for the validation data.cue_obj_train
: The cue object of the training data.cue_obj_val
: The cue object of the data of interest.rpi_learn_val
: The second object return by thelearn_paths_rpi
algorithm for the validation data.
Returns
results::DataFrame
: A deepcopy ofdata_val
.cor_s::Matrix
: Correlation matrix betweenShat_val
andS_val
.df::DataFrame
: The output ofres_learn_val
(of the validation data) in form of a dataframerpi_df::DataFrame
: Stores the path information about the predicted forms (fromlearn_paths
), which is needed to compute things like PathSum, PathCounts and PathEntropies.
JudiLingMeasures.max_rowwise
— Methodmax_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})
Get the maximum of each row in S.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> max_rowwise(ma1)
3×1 Matrix{Int64}:
3
-1
3
JudiLingMeasures.mean_rowwise
— Methodmean_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})
Calculate the mean of each row in S.
Examples
julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> mean_rowwise(ma1)
3×1 Matrix{Float64}:
2.0
-2.0
2.0
JudiLingMeasures.safe_length
— Methodsafe_length(x::Union{Missing, String})
Compute length of x, if x is missing return missing
Example
julia> safe_length(missing)
missing
julia> safe_length("abc")
3
JudiLingMeasures.safe_sum
— Methodsafe_sum(x::Array)
Compute sum of all elements of x, if x is empty return missing
Example
julia> safe_sum([])
missing
julia> safe_sum([1,2,3])
6
JudiLingMeasures.sem_density_mean
— Methodsem_density_mean(s_cor::Union{JudiLing.SparseMatrixCSC, Matrix},
n::Int)
Compute the average semantic density of the predicted semantic vector with its n most correlated semantic neighbours.
Arguments
s_cor::Union{JudiLing.SparseMatrixCSC, Matrix}
: the correlation matrix between S and Shatn::Int
: the number of highest semantic neighbours to take into account
Example
julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> cor_s = correlation_rowwise(ma2, ma3)
julia> sem_density_mean(cor_s, 2)
4-element Vector{Float64}:
0.7393813797301239
0.6420816485652429
0.4496869233815781
0.281150888376636