Make Adjacency Matrix
JudiLing.make_full_adjacency_matrix
— Functionmake_adjacency_matrix(i2f)
Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.
Obligatory Arguments
i2f::Dict
: the dictionary returning features given indices
Optional Arguments
tokenized::Bool=false
:if true, the dataset target is assumed to be tokenizedsep_token::Union{Nothing, String, Char}=nothing
: separator tokenverbose::Bool=false
: if true, more information will be printed
Examples
# without tokenization
i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
JudiLing.make_adjacency_matrix(i2f)
# with tokenization
i2f = Dict([(1, "#-a-b"), (2, "a-b-c"), (3, "b-c-#"), (4, "#-b-c"), (5, "a-b-#")])
JudiLing.make_adjacency_matrix(
i2f,
tokenized=true,
sep_token="-")
JudiLing.make_full_adjacency_matrix
— Methodmake_adjacency_matrix(i2f)
Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.
Obligatory Arguments
i2f::Dict
: the dictionary returning features given indices
Optional Arguments
tokenized::Bool=false
:if true, the dataset target is assumed to be tokenizedsep_token::Union{Nothing, String, Char}=nothing
: separator tokenverbose::Bool=false
: if true, more information will be printed
Examples
# without tokenization
i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
JudiLing.make_adjacency_matrix(i2f)
# with tokenization
i2f = Dict([(1, "#-a-b"), (2, "a-b-c"), (3, "b-c-#"), (4, "#-b-c"), (5, "a-b-#")])
JudiLing.make_adjacency_matrix(
i2f,
tokenized=true,
sep_token="-")
JudiLing.make_combined_adjacency_matrix
— Methodmake_combined_adjacency_matrix(data_train, data_val)
Make combined adjacency matrix.
Obligatory Arguments
data_train::DataFrame
: training datasetdata_val::DataFrame
: validation dataset
Optional Arguments
grams=3
: the number of grams for cuestarget_col=:Words
: the column name for target stringstokenized=false
:if true, the dataset target is assumed to be tokenizedsep_token=nothing
: separatorkeep_sep=false
: if true, keep separators in cuesstart_end_token="#"
: start and end token in boundary cuesverbose=false
: if true, more information is printed
Examples
JudiLing.make_combined_adjacency_matrix(
latin_train,
latin_val,
grams=3,
target_col=:Word,
tokenized=false,
keep_sep=false
)