Model Training — fit • neermatch

Fits the model. The method emulates the behavior of the tf.keras.fit method. It automatically constructs a data generator from the left and right datasets iterating over all the elements of their Cartesian product. The generator's labels are generated from the matches data frame. The method uses the generator to train the model.

`DLMatchingModel`

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

`NSMatchingModel`

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

`RefutationMatchingModel`

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

Usage

fit(object, left, right, matches, ...)

# S4 method for class 'neer_match.matching_model.DLMatchingModel'
fit(
  object,
  left,
  right,
  matches,
  batch_size = 16L,
  mismatch_share = 0.1,
  shuffle = TRUE,
  ...
)

# S4 method for class 'neer_match.matching_model.NSMatchingModel'
fit(
  object,
  left,
  right,
  matches,
  epochs,
  satisfiability_weight = 1,
  verbose = 1L,
  log_mod_n = 1L,
  ...
)

# S4 method for class 'neer_match.reasoning.RefutationModel'
fit(
  object,
  left,
  right,
  matches,
  epochs,
  refutation,
  penalty_threshold = 0.95,
  penalty_scale = 1,
  penalty_decay = 0.1,
  satisfiability_weight = 1,
  verbose = 1L,
  log_mod_n = 1L,
  ...
)

Arguments

object: A matching model object.
left: A data frame with the left records.
right: A data frame with the right records.
matches: A data frame with the indices of the matching record pairs.
...: Additional arguments passed to tf.keras.fit.
batch_size: The batch size (integer).
mismatch_share: A numeric value in the range \([0, 1]\) representing the share of used mismatched pairs in the input data.
shuffle: A logical value indicating whether to shuffle the input data.
epochs: The number of epochs to train the model.
satisfiability_weight: A numeric value in the range \([0, 1]\) representing the weight allocated to the satisfiability loss of a hybrid model.
verbose: An integer indicating the verbosity level.
log_mod_n: An positive integer that determines the frequency of logging. The method logs every log_mod_n epochs.

Value

Called for side effects (model training).

Examples

smap <- SimilarityMap(
  instructions = list(
    `score` = list("gaussian", "euclidean"),
    `platform` = list("osa", "indel")
  )
)
model <- NSMatchingModel(smap)
compile(model)
matching_data <- fuzzy_games_example_data()
fit(
  model,
  matching_data$left, matching_data$right, matching_data$matches,
  epochs = 1L
)