Model Training — fit • neermatch

Fits the model. The method emulates the behavior of the tf.keras.fit method. It automatically constructs a data generator from the left and right datasets iterating over all the elements of their Cartesian product. The generator's labels are generated from the matches data frame. The method uses the generator to train the model.

`DLMatchingModel`

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

`NSMatchingModel`

The method constructs a data generator from the input data frames using the similarity map with which the model was initialized and fits the model.

The model is trained using a custom training loop. The loss can either be purely defined using fuzzy logic axioms (default case with satisfiability weight 1.0) or as a weighted sum of binary cross-entropy and satisfiability loss (by setting the satisfiability weight to a value between 0 and 1).

`RefutationModel`

The method constructs a data generator and an axiom generator from the input data and uses the model's similarity map to fit the model while trying to refute the refutation claim.

In the default case of satisfiability weight equal to 1, the function minimizes the satisfiability of the refutation claim while penalizing the satisfiability of the matching axioms below the penalty threshold. If the satisfiability weight is less than 1, the model is trained to optimize the satisfiability of the refutation claim, while penalizing a weighted sum of the satisfiability of the matching axioms and the binary cross entropy loss for values below the penalty threshold.

The penalty threshold sets tolerance for the matching axioms (and/or the binary cross entropy loss) below which the penalty is applied. The penalty scale sets the linear scale of the penalty when the threshold is not crossed. The penalty decay sets the exponential decay of the penalty when the threshold is crossed. The linear and exponential parts are combined using the tf.keras.activations.elu function.

Usage

fit(object, left, right, matches, ...)

# S4 method for class 'neer_match.matching_model.DLMatchingModel'
fit(
  object,
  left,
  right,
  matches,
  batch_size = 16L,
  mismatch_share = 0.1,
  shuffle = TRUE,
  ...
)

# S4 method for class 'neer_match.matching_model.NSMatchingModel'
fit(
  object,
  left,
  right,
  matches,
  epochs,
  satisfiability_weight = 1,
  verbose = 1L,
  log_mod_n = 1L,
  ...
)

# S4 method for class 'neer_match.reasoning.RefutationModel'
fit(
  object,
  left,
  right,
  matches,
  epochs,
  refutation,
  penalty_threshold = 0.95,
  penalty_scale = 1,
  penalty_decay = 0.1,
  satisfiability_weight = 1,
  verbose = 1L,
  log_mod_n = 1L,
  ...
)

Arguments

object: A matching model object.
left: A data frame with the left records.
right: A data frame with the right records.
matches: A data frame with the indices of the matching record pairs.
...: Additional arguments passed to tf.keras.fit.
batch_size: The batch size (integer).
mismatch_share: A numeric value in the range \([0, 1]\) representing the share of used mismatched pairs in the input data.
shuffle: A logical value indicating whether to shuffle the input data.
epochs: The number of epochs to train the model.
satisfiability_weight: A numeric value in the range \([0, 1]\) representing the weight allocated to the satisfiability loss of a hybrid model.
verbose: An integer indicating the verbosity level.
log_mod_n: An positive integer that determines the frequency of logging. The method logs every log_mod_n epochs.
refutation: The refutation claim. A single element named list, where the name is a field pair association in the similarity map and the value is a list of one or more similarities. If instead a string is provided, the method uses all the similarities in the similarity map for the refutation.
penalty_threshold: A numeric value in the range \([0, 1]\) that determines the threshold for the penalty. If the loss is below the threshold, the penalty is applied.
penalty_scale: A numeric value that determines the linear scale of the penalty when the threshold is not crossed.
penalty_decay: A numeric value in the range \([0, 1]\) that determines the exponential decay of the penalty when the threshold is crossed.

Value

Called for side effects (model training).

Examples

smap <- SimilarityMap(
  instructions = list(
    `score` = list("gaussian", "euclidean"),
    `platform` = list("osa", "indel")
  )
)
model <- NSMatchingModel(smap)
compile(model)
matching_data <- fuzzy_games_example_data()
fit(
  model,
  matching_data$left, matching_data$right, matching_data$matches,
  epochs = 1L
)