Make Adjacency Matrix

JudiLing.make_full_adjacency_matrix — Function

make_adjacency_matrix(i2f)

Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.

Obligatory Arguments

i2f::Dict: the dictionary returning features given indices

Optional Arguments

tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
sep_token::Union{Nothing, String, Char}=nothing: separator token
verbose::Bool=false: if true, more information will be printed

Examples

# without tokenization
i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
JudiLing.make_adjacency_matrix(i2f)

# with tokenization
i2f = Dict([(1, "#-a-b"), (2, "a-b-c"), (3, "b-c-#"), (4, "#-b-c"), (5, "a-b-#")])
JudiLing.make_adjacency_matrix(
    i2f,
    tokenized=true,
    sep_token="-")

source

JudiLing.make_full_adjacency_matrix — Method

make_adjacency_matrix(i2f)

Obligatory Arguments

i2f::Dict: the dictionary returning features given indices

Optional Arguments

tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
sep_token::Union{Nothing, String, Char}=nothing: separator token
verbose::Bool=false: if true, more information will be printed

Examples

# without tokenization
i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
JudiLing.make_adjacency_matrix(i2f)

# with tokenization
i2f = Dict([(1, "#-a-b"), (2, "a-b-c"), (3, "b-c-#"), (4, "#-b-c"), (5, "a-b-#")])
JudiLing.make_adjacency_matrix(
    i2f,
    tokenized=true,
    sep_token="-")

source

JudiLing.make_combined_adjacency_matrix — Method

make_combined_adjacency_matrix(data_train, data_val)

Make combined adjacency matrix.

Obligatory Arguments

data_train::DataFrame: training dataset
data_val::DataFrame: validation dataset

Optional Arguments

grams=3: the number of grams for cues
target_col=:Words: the column name for target strings
tokenized=false:if true, the dataset target is assumed to be tokenized
sep_token=nothing: separator
keep_sep=false: if true, keep separators in cues
start_end_token="#": start and end token in boundary cues
verbose=false: if true, more information is printed

Examples

JudiLing.make_combined_adjacency_matrix(
    latin_train,
    latin_val,
    grams=3,
    target_col=:Word,
    tokenized=false,
    keep_sep=false
    )

source