Training wide and deep models for tabular data

or just deep learning models for tabular data.

Here is the documentation for the Trainer class, that will do all the heavy lifting.

Trainer is also available from pytorch-widedeep directly, for example, one could do:

from pytorch-widedeep.training import Trainer

or also:

from pytorch-widedeep import Trainer
class pytorch_widedeep.training.Trainer(model, objective, custom_loss_function=None, optimizers=None, lr_schedulers=None, reducelronplateau_criterion='loss', initializers=None, transforms=None, callbacks=None, metrics=None, class_weight=None, lambda_sparse=0.001, alpha=0.25, gamma=2, verbose=1, seed=1)[source]

Class to set the of attributes that will be used during the training process.

Parameters
  • model (WideDeep) – An object of class WideDeep

  • objective (str) –

    Defines the objective, loss or cost function.

    Param aliases: loss_function, loss_fn, loss, cost_function, cost_fn, cost

    Possible values are:

    • binary, aliases: logistic, binary_logloss, binary_cross_entropy

    • binary_focal_loss

    • multiclass, aliases: multi_logloss, cross_entropy, categorical_cross_entropy,

    • multiclass_focal_loss

    • regression, aliases: mse, l2, mean_squared_error

    • mean_absolute_error, aliases: mae, l1

    • mean_squared_log_error, aliases: msle

    • root_mean_squared_error, aliases: rmse

    • root_mean_squared_log_error, aliases: rmsle

  • custom_loss_function (nn.Module, optional, default = None) –

    object of class nn.Module. If none of the loss functions available suits the user, it is possible to pass a custom loss function. See for example pytorch_widedeep.losses.FocalLoss for the required structure of the object or the Examples folder in the repo.

    Note

    If custom_loss_function is not None, objective must be ‘binary’, ‘multiclass’ or ‘regression’, consistent with the loss function

  • optimizers (Optimzer or dict, optional, default= None) –

    • An instance of Pytorch’s Optimizer object (e.g. torch.optim.Adam()) or

    • a dictionary where there keys are the model components (i.e. ‘wide’, ‘deeptabular’, ‘deeptext’, ‘deepimage’ and/or ‘deephead’) and the values are the corresponding optimizers. If multiple optimizers are used the dictionary MUST contain an optimizer per model component.

    if no optimizers are passed it will default to Adam for all Wide and Deep components

  • lr_schedulers (LRScheduler or dict, optional, default=None) –

    • An instance of Pytorch’s LRScheduler object (e.g torch.optim.lr_scheduler.StepLR(opt, step_size=5)) or

    • a dictionary where there keys are the model componenst (i.e. ‘wide’, ‘deeptabular’, ‘deeptext’, ‘deepimage’ and/or ‘deephead’) and the values are the corresponding learning rate schedulers.

  • reducelronplateau_criterion (str, optional. default="loss") – Quantity to be monitored during training if using the ReduceLROnPlateau learning rate scheduler. Possible value are: ‘loss’ or ‘metric’.

  • initializers (Initializer or dict, optional, default=None) –

    • An instance of an Initializer` object see pytorch-widedeep.initializers or

    • a dictionary where there keys are the model components (i.e. ‘wide’, ‘deeptabular’, ‘deeptext’, ‘deepimage’ and/or ‘deephead’) and the values are the corresponding initializers.

  • transforms (List, optional, default=None) – List with torchvision.transforms to be applied to the image component of the model (i.e. deepimage) See torchvision transforms.

  • callbacks (List, optional, default=None) – List with Callback objects. The three callbacks available in pytorch-widedeep are: LRHistory, ModelCheckpoint and EarlyStopping. The History and the LRShedulerCallback callbacks are used by default. This can also be a custom callback as long as the object of type Callback. See pytorch_widedeep.callbacks.Callback or the Examples folder in the repo

  • metrics (List, optional, default=None) –

    • List of objects of type Metric. Metrics available are: Accuracy, Precision, Recall, FBetaScore, F1Score and R2Score. This can also be a custom metric as long as it is an object of type Metric. See pytorch_widedeep.metrics.Metric or the Examples folder in the repo

    • List of objects of type torchmetrics.Metric. This can be any metric from torchmetrics library Examples. This can also be a custom metric as long as it is an object of type Metric. See the instructions.

  • class_weight (float, List or Tuple. optional. default=None) –

    • float indicating the weight of the minority class in binary classification problems (e.g. 9.)

    • a list or tuple with weights for the different classes in multiclass classification problems (e.g. [1., 2., 3.]). The weights do not need to be normalised. See this discussion.

  • lambda_sparse (float. default=1e-3) – Tabnet sparse regularization factor. Used, of course, if the deeptabular component is a Tabnet model

  • alpha (float. default=0.25) – if objective is binary_focal_loss or multiclass_focal_loss, the Focal Loss alpha and gamma parameters can be set directly in the Trainer via the alpha and gamma parameters

  • gamma (float. default=2) – Focal Loss alpha gamma parameter

  • verbose (int, default=1) – Setting it to 0 will print nothing during training.

  • seed (int, default=1) – Random seed to be used internally for train_test_split

Attributes
  • cyclic_lr (bool) – Attribute that indicates if any of the lr_schedulers is cyclic_lr (i.e. CyclicLR or OneCycleLR). See Pytorch schedulers.

  • feature_importance (dict) – dict where the keys are the column names and the values are the corresponding feature importances. This attribute will only exist if the deeptabular component is a Tabnet model

Examples

>>> import torch
>>> from torchvision.transforms import ToTensor
>>>
>>> # wide deep imports
>>> from pytorch_widedeep.callbacks import EarlyStopping, LRHistory
>>> from pytorch_widedeep.initializers import KaimingNormal, KaimingUniform, Normal, Uniform
>>> from pytorch_widedeep.models import TabResnet, DeepImage, DeepText, Wide, WideDeep
>>> from pytorch_widedeep import Trainer
>>> from pytorch_widedeep.optim import RAdam
>>>
>>> embed_input = [(u, i, j) for u, i, j in zip(["a", "b", "c"][:4], [4] * 3, [8] * 3)]
>>> column_idx = {k: v for v, k in enumerate(["a", "b", "c"])}
>>> wide = Wide(10, 1)
>>>
>>> # build the model
>>> deeptabular = TabResnet(blocks_dims=[8, 4], column_idx=column_idx, embed_input=embed_input)
>>> deeptext = DeepText(vocab_size=10, embed_dim=4, padding_idx=0)
>>> deepimage = DeepImage(pretrained=False)
>>> model = WideDeep(wide=wide, deeptabular=deeptabular, deeptext=deeptext, deepimage=deepimage)
>>>
>>> # set optimizers and schedulers
>>> wide_opt = torch.optim.Adam(model.wide.parameters())
>>> deep_opt = torch.optim.Adam(model.deeptabular.parameters())
>>> text_opt = RAdam(model.deeptext.parameters())
>>> img_opt = RAdam(model.deepimage.parameters())
>>>
>>> wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=5)
>>> deep_sch = torch.optim.lr_scheduler.StepLR(deep_opt, step_size=3)
>>> text_sch = torch.optim.lr_scheduler.StepLR(text_opt, step_size=5)
>>> img_sch = torch.optim.lr_scheduler.StepLR(img_opt, step_size=3)
>>>
>>> optimizers = {"wide": wide_opt, "deeptabular": deep_opt, "deeptext": text_opt, "deepimage": img_opt}
>>> schedulers = {"wide": wide_sch, "deeptabular": deep_sch, "deeptext": text_sch, "deepimage": img_sch}
>>>
>>> # set initializers and callbacks
>>> initializers = {"wide": Uniform, "deeptabular": Normal, "deeptext": KaimingNormal, "deepimage": KaimingUniform}
>>> transforms = [ToTensor]
>>> callbacks = [LRHistory(n_epochs=4), EarlyStopping]
>>>
>>> # set the trainer
>>> trainer = Trainer(model, objective="regression", initializers=initializers, optimizers=optimizers,
... lr_schedulers=schedulers, callbacks=callbacks, transforms=transforms)
fit(X_wide=None, X_tab=None, X_text=None, X_img=None, X_train=None, X_val=None, val_split=None, target=None, n_epochs=1, validation_freq=1, batch_size=32, custom_dataloader=None, finetune=False, finetune_epochs=5, finetune_max_lr=0.01, finetune_deeptabular_gradual=False, finetune_deeptabular_max_lr=0.01, finetune_deeptabular_layers=None, finetune_deeptext_gradual=False, finetune_deeptext_max_lr=0.01, finetune_deeptext_layers=None, finetune_deepimage_gradual=False, finetune_deepimage_max_lr=0.01, finetune_deepimage_layers=None, finetune_routine='howard', stop_after_finetuning=False, **kwargs)[source]

Fit method.

The input datasets can be passed either directly via numpy arrays (X_wide, X_tab, X_text or X_img) or alternatively, in dictionaries (X_train or X_val).

Parameters
  • X_wide (np.ndarray, Optional. default=None) – Input for the wide model component. See pytorch_widedeep.preprocessing.WidePreprocessor

  • X_tab (np.ndarray, Optional. default=None) – Input for the deeptabular model component. See pytorch_widedeep.preprocessing.TabPreprocessor

  • X_text (np.ndarray, Optional. default=None) – Input for the deeptext model component. See pytorch_widedeep.preprocessing.TextPreprocessor

  • X_img (np.ndarray, Optional. default=None) – Input for the deepimage model component. See pytorch_widedeep.preprocessing.ImagePreprocessor

  • X_train (Dict, Optional. default=None) – The training dataset can also be passed in a dictionary. Keys are X_wide, ‘X_tab’, ‘X_text’, ‘X_img’ and ‘target’. Values are the corresponding matrices.

  • X_val (Dict, Optional. default=None) – The validation dataset can also be passed in a dictionary. Keys are X_wide, ‘X_tab’, ‘X_text’, ‘X_img’ and ‘target’. Values are the corresponding matrices.

  • val_split (float, Optional. default=None) – train/val split fraction

  • target (np.ndarray, Optional. default=None) – target values

  • n_epochs (int, default=1) – number of epochs

  • validation_freq (int, default=1) – epochs validation frequency

  • batch_size (int, default=32) – batch size

  • custom_dataloader (DataLoader, Optional, default=None) – object of class torch.utils.data.DataLoader. Available predefined dataloaders are in pytorch-widedeep.dataloaders.If None, a standard torch DataLoader is used.

  • finetune (bool, default=False) –

    param alias: warmup

    fine-tune individual model components.

    Note

    This functionality can also be used to ‘warm-up’ individual components before the joined training starts, and hence its alias. See the Examples folder in the repo for more details

    pytorch_widedeep implements 3 fine-tune routines.

    • fine-tune all trainable layers at once. This routine is is inspired by the work of Howard & Sebastian Ruder 2018 in their ULMfit paper. Using a Slanted Triangular learing (see Leslie N. Smith paper), the process is the following: i) the learning rate will gradually increase for 10% of the training steps from max_lr/10 to max_lr. ii) It will then gradually decrease to max_lr/10 for the remaining 90% of the steps. The optimizer used in the process is Adam.

    and two gradual fine-tune routines, where only certain layers are trained at a time.

    • The so called Felbo gradual fine-tune rourine, based on the the Felbo et al., 2017 DeepEmoji paper.

    • The Howard routine based on the work of Howard & Sebastian Ruder 2018 in their ULMfit paper.

    For details on how these routines work, please see the Examples section in this documentation and the Examples folder in the repo.

  • finetune_epochs (int, default=4) –

    param alias: warmup_epochs

    Number of fine-tune epochs for those model components that will NOT be gradually fine-tuned. Those components with gradual fine-tune follow their corresponding specific routine.

  • finetune_max_lr (float, default=0.01) –

    param alias: warmup_max_lr

    Maximum learning rate during the Triangular Learning rate cycle for those model componenst that will NOT be gradually fine-tuned

  • finetune_deeptabular_gradual (bool, default=False) –

    param alias: warmup_deeptabular_gradual

    Boolean indicating if the deeptabular component will be fine-tuned gradually

  • finetune_deeptabular_max_lr (float, default=0.01) –

    param alias: warmup_deeptabular_max_lr

    Maximum learning rate during the Triangular Learning rate cycle for the deeptabular component

  • finetune_deeptabular_layers (List, Optional, default=None) –

    param alias: warmup_deeptabular_layers

    List of nn.Modules that will be fine-tuned gradually.

    Note

    These have to be in fine-tune-order: the layers or blocks close to the output neuron(s) first

  • finetune_deeptext_gradual (bool, default=False) –

    param alias: warmup_deeptext_gradual

    Boolean indicating if the deeptext component will be fine-tuned gradually

  • finetune_deeptext_max_lr (float, default=0.01) –

    param alias: warmup_deeptext_max_lr

    Maximum learning rate during the Triangular Learning rate cycle for the deeptext component

  • finetune_deeptext_layers (List, Optional, default=None) –

    param alias: warmup_deeptext_layers

    List of nn.Modules that will be fine-tuned gradually.

    Note

    These have to be in fine-tune-order: the layers or blocks close to the output neuron(s) first

  • finetune_deepimage_gradual (bool, default=False) –

    param alias: warmup_deepimage_gradual

    Boolean indicating if the deepimage component will be fine-tuned gradually

  • finetune_deepimage_max_lr (float, default=0.01) –

    param alias: warmup_deepimage_max_lr

    Maximum learning rate during the Triangular Learning rate cycle for the deepimage component

  • finetune_deepimage_layers (List, Optional, default=None) –

    param alias: warmup_deepimage_layers

    List of nn.Modules that will be fine-tuned gradually.

    Note

    These have to be in fine-tune-order: the layers or blocks close to the output neuron(s) first

  • finetune_routine (str, default = "howard") –

    param alias: warmup_routine

    Warm up routine. On of “felbo” or “howard”. See the examples section in this documentation and the corresponding repo for details on how to use fine-tune routines

Examples

For a series of comprehensive examples please, see the Examples folder in the repo

For completion, here we include some “fabricated” examples, i.e. these assume you have already built a model and instantiated a Trainer, that is ready to fit

# Ex 1. using train input arrays directly and no validation
trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=10, batch_size=256)
# Ex 2: using train input arrays directly and validation with val_split
trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=10, batch_size=256, val_split=0.2)
# Ex 3: using train dict and val_split
X_train = {'X_wide': X_wide, 'X_tab': X_tab, 'target': y}
trainer.fit(X_train, n_epochs=10, batch_size=256, val_split=0.2)
# Ex 4: validation using training and validation dicts
X_train = {'X_wide': X_wide_tr, 'X_tab': X_tab_tr, 'target': y_tr}
X_val = {'X_wide': X_wide_val, 'X_tab': X_tab_val, 'target': y_val}
trainer.fit(X_train=X_train, X_val=X_val n_epochs=10, batch_size=256)
predict(X_wide=None, X_tab=None, X_text=None, X_img=None, X_test=None, batch_size=256)[source]

Returns the predictions

The input datasets can be passed either directly via numpy arrays (X_wide, X_tab, X_text or X_img) or alternatively, in a dictionary (X_test)

Parameters
Return type

ndarray

predict_proba(X_wide=None, X_tab=None, X_text=None, X_img=None, X_test=None, batch_size=256)[source]

Returns the predicted probabilities for the test dataset for binary and multiclass methods

The input datasets can be passed either directly via numpy arrays (X_wide, X_tab, X_text or X_img) or alternatively, in a dictionary (X_test)

Parameters
Return type

ndarray

get_embeddings(col_name, cat_encoding_dict)[source]

Returns the learned embeddings for the categorical features passed through deeptabular.

Note

This function will be deprecated in the next relase. Please consider using Tab2Vec instead.

This method is designed to take an encoding dictionary in the same format as that of the LabelEncoder Attribute in the class TabPreprocessor. See pytorch_widedeep.preprocessing.TabPreprocessor and pytorch_widedeep.utils.dense_utils.LabelEncder.

Parameters
  • col_name (str,) – Column name of the feature we want to get the embeddings for

  • cat_encoding_dict (Dict) –

    Dictionary where the keys are the name of the column for which we want to retrieve the embeddings and the values are also of type Dict. These Dict values have keys that are the categories for that column and the values are the corresponding numberical encodings

    e.g.: {‘column’: {‘cat_0’: 1, ‘cat_1’: 2, …}}

Examples

For a series of comprehensive examples please, see the Examples folder in the repo

For completion, here we include a “fabricated” example, i.e. assuming we have already trained the model, that we have the categorical encodings in a dictionary name encoding_dict, and that there is a column called ‘education’:

trainer.get_embeddings(col_name="education", cat_encoding_dict=encoding_dict)
Return type

Dict[str, ndarray]

explain(X_tab, save_step_masks=False)[source]

if the deeptabular component is a Tabnet model, returns the aggregated feature importance for each instance (or observation) in the X_tab array. If save_step_masks is set to True, the masks per step will also be returned.

Parameters
  • X_tab (np.ndarray) – Input array corresponding only to the deeptabular component

  • save_step_masks (bool) – Boolean indicating if the masks per step will be returned

Returns

res – Array or Tuple of two arrays with the corresponding aggregated feature importance and the masks per step if save_step_masks is set to True

Return type

np.ndarray, Tuple

save(path, save_state_dict=False, model_filename='wd_model.pt')[source]

Saves the model, training and evaluation history, and the feature_importance attribute (if the deeptabular component is a Tabnet model) to disk

The Trainer class is built so that it ‘just’ trains a model. With that in mind, all the torch related parameters (such as optimizers, learning rate schedulers, initializers, etc) have to be defined externally and then passed to the Trainer. As a result, the Trainer does not generate any attribute or additional data products that need to be saved other than the model object itself, which can be saved as any other torch model (e.g. torch.save(model, path)).

The exception is Tabnet. If the deeptabular component is a Tabnet model, an attribute (a dict) called feature_importance will be created at the end of the training process. Therefore, a save method was created that will save the feature importance dictionary to a json file and, since we are here, the model weights, training history and learning rate history.

Parameters
  • path (str) – path to the directory where the model and the feature importance attribute will be saved.

  • save_state_dict (bool, default = False) – Boolean indicating whether to save directly the model or the model’s state dictionary

  • model_filename (str, Optional, default = "wd_model.pt") – filename where the model weights will be store