Skip to contents

Fits the model. The method emulates the behavior of the tf.keras.fit method. It automatically constructs a data generator from the left and right datasets iterating over all the elements of their Cartesian product. The generator's labels are generated from the matches data frame. The method uses the generator to train the model.

DLMatchingModel

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

NSMatchingModel

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

RefutationMatchingModel

The method passes the constructed generator and any additional call arguments to directly to the tf.keras.fit.

Usage

fit(object, left, right, matches, ...)

# S4 method for class 'neer_match.matching_model.DLMatchingModel'
fit(
  object,
  left,
  right,
  matches,
  batch_size = 16L,
  mismatch_share = 0.1,
  shuffle = TRUE,
  ...
)

# S4 method for class 'neer_match.matching_model.NSMatchingModel'
fit(
  object,
  left,
  right,
  matches,
  epochs,
  satisfiability_weight = 1,
  verbose = 1L,
  log_mod_n = 1L,
  ...
)

# S4 method for class 'neer_match.reasoning.RefutationModel'
fit(
  object,
  left,
  right,
  matches,
  epochs,
  refutation,
  penalty_threshold = 0.95,
  penalty_scale = 1,
  penalty_decay = 0.1,
  satisfiability_weight = 1,
  verbose = 1L,
  log_mod_n = 1L,
  ...
)

Arguments

object

A matching model object.

left

A data frame with the left records.

right

A data frame with the right records.

matches

A data frame with the indices of the matching record pairs.

...

Additional arguments passed to tf.keras.fit.

batch_size

The batch size (integer).

mismatch_share

A numeric value in the range \([0, 1]\) representing the share of used mismatched pairs in the input data.

shuffle

A logical value indicating whether to shuffle the input data.

epochs

The number of epochs to train the model.

satisfiability_weight

A numeric value in the range \([0, 1]\) representing the weight allocated to the satisfiability loss of a hybrid model.

verbose

An integer indicating the verbosity level.

log_mod_n

An positive integer that determines the frequency of logging. The method logs every log_mod_n epochs.

Value

Called for side effects (model training).

Examples

smap <- SimilarityMap(
  instructions = list(
    `score` = list("gaussian", "euclidean"),
    `platform` = list("osa", "indel")
  )
)
model <- NSMatchingModel(smap)
compile(model)
matching_data <- fuzzy_games_example_data()
fit(
  model,
  matching_data$left, matching_data$right, matching_data$matches,
  epochs = 1L
)