The models
module¶
This module contains the models that can be used as the four main components
that will comprise a Wide and Deep model (wide
, deeptabular
,
deeptext
, deepimage
), as well as the WideDeep
"constructor"
class. Note that each of the four components can be used independently. It
also contains all the documentation for the models that can be used for
self-supervised pre-training with tabular data.
Wide ¶
Wide(input_dim, pred_dim=1)
Bases: Module
Defines a Wide
(linear) model where the non-linearities are
captured via the so-called crossed-columns. This can be used as the
wide
component of a Wide & Deep model.
Parameters:
-
input_dim
(int
) –size of the Linear layer (implemented via an Embedding layer).
input_dim
is the summation of all the individual values for all the features that go through the wide model. For example, if the wide model receives 2 features with 5 individual values each,input_dim = 10
-
pred_dim
(int
, default:1
) –size of the ouput tensor containing the predictions. Note that unlike all the other models, the wide model is connected directly to the output neuron(s) when used to build a Wide and Deep model. Therefore, it requires the
pred_dim
parameter.
Attributes:
-
wide_linear
(Module
) –the linear layer that comprises the wide branch of the model
Examples:
>>> import torch
>>> from pytorch_widedeep.models import Wide
>>> X = torch.empty(4, 4).random_(4)
>>> wide = Wide(input_dim=X.unique().size(0), pred_dim=1)
>>> out = wide(X)
Source code in pytorch_widedeep/models/tabular/linear/wide.py
43 44 45 46 47 48 49 50 51 52 53 54 |
|
forward ¶
forward(X)
Forward pass. Simply connecting the Embedding layer with the ouput neuron(s)
Source code in pytorch_widedeep/models/tabular/linear/wide.py
65 66 67 68 69 |
|
TabMlp ¶
TabMlp(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, continuous_cols=None, cont_norm_layer=None, embed_continuous=None, embed_continuous_method=None, cont_embed_dim=None, cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, mlp_hidden_dims=[200, 100], mlp_activation='relu', mlp_dropout=0.1, mlp_batchnorm=False, mlp_batchnorm_last=False, mlp_linear_first=True)
Bases: BaseTabularModelWithoutAttention
Defines a TabMlp
model that can be used as the deeptabular
component of a Wide & Deep model or independently by itself.
This class combines embedding representations of the categorical features with numerical (aka continuous) features, embedded or not. These are then passed through a series of dense layers (i.e. a MLP).
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int, int]]]
, default:None
) –List of Tuples with the column name, number of unique values and embedding dimension. e.g. [(education, 11, 32), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous
(Optional[bool]
, default:None
) –Boolean indicating if the continuous columns will be embedded using one of the available methods: 'standard', 'periodic' or 'piecewise'. If
None
, it will default to 'False'.
NOTE: This parameter is deprecated and it will be removed in future releases. Please, use theembed_continuous_method
parameter instead. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:None
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dim
(Optional[int]
, default:None
) –Size of the continuous embeddings. If the continuous columns are embedded,
cont_embed_dim
must be passed. -
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
mlp_hidden_dims
(List[int]
, default:[200, 100]
) –List with the number of neurons per dense layer in the mlp.
-
mlp_activation
(str
, default:'relu'
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
mlp_dropout
(Union[float, List[float]]
, default:0.1
) –float or List of floats with the dropout between the dense layers. e.g: [0.5,0.5]
-
mlp_batchnorm
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the dense layers
-
mlp_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers
-
mlp_linear_first
(bool
, default:True
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
Attributes:
-
encoder
(Module
) –mlp model that will receive the concatenation of the embeddings and the continuous columns
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabMlp
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ["a", "b", "c", "d", "e"]
>>> cat_embed_input = [(u, i, j) for u, i, j in zip(colnames[:4], [4] * 4, [8] * 4)]
>>> column_idx = {k: v for v, k in enumerate(colnames)}
>>> model = TabMlp(mlp_hidden_dims=[8, 4], column_idx=column_idx, cat_embed_input=cat_embed_input,
... continuous_cols=["e"])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/mlp/tab_mlp.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
TabMlpDecoder ¶
TabMlpDecoder(embed_dim, mlp_hidden_dims=[100, 200], mlp_activation='relu', mlp_dropout=0.1, mlp_batchnorm=False, mlp_batchnorm_last=False, mlp_linear_first=True)
Bases: Module
Companion decoder model for the TabMlp
model (which can be considered
an encoder itself).
This class is designed to be used with the EncoderDecoderTrainer
when
using self-supervised pre-training (see the corresponding section in the
docs). The TabMlpDecoder
will receive the output from the MLP
and 'reconstruct' the embeddings.
Parameters:
-
embed_dim
(int
) –Size of the embeddings tensor that needs to be reconstructed.
-
mlp_hidden_dims
(List[int]
, default:[100, 200]
) –List with the number of neurons per dense layer in the mlp.
-
mlp_activation
(str
, default:'relu'
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
mlp_dropout
(Union[float, List[float]]
, default:0.1
) –float or List of floats with the dropout between the dense layers. e.g: [0.5,0.5]
-
mlp_batchnorm
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the dense layers
-
mlp_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers
-
mlp_linear_first
(bool
, default:True
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
Attributes:
-
decoder
(Module
) –mlp model that will receive the output of the encoder
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabMlpDecoder
>>> x_inp = torch.rand(3, 8)
>>> decoder = TabMlpDecoder(embed_dim=32, mlp_hidden_dims=[8,16])
>>> res = decoder(x_inp)
>>> res.shape
torch.Size([3, 32])
Source code in pytorch_widedeep/models/tabular/mlp/tab_mlp.py
279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 |
|
TabResnet ¶
TabResnet(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, continuous_cols=None, cont_norm_layer=None, embed_continuous=None, embed_continuous_method=None, cont_embed_dim=None, cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, blocks_dims=[200, 100, 100], blocks_dropout=0.1, simplify_blocks=False, mlp_hidden_dims=None, mlp_activation=None, mlp_dropout=None, mlp_batchnorm=None, mlp_batchnorm_last=None, mlp_linear_first=None)
Bases: BaseTabularModelWithoutAttention
Defines a TabResnet
model that can be used as the deeptabular
component of a Wide & Deep model or independently by itself.
This class combines embedding representations of the categorical features
with numerical (aka continuous) features, embedded or not. These are then
passed through a series of Resnet blocks. See
pytorch_widedeep.models.tab_resnet._layers
for details on the
structure of each block.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int, int]]]
, default:None
) –List of Tuples with the column name, number of unique values and embedding dimension. e.g. [(education, 11, 32), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous
(Optional[bool]
, default:None
) –Boolean indicating if the continuous columns will be embedded using one of the available methods: 'standard', 'periodic' or 'piecewise'. If
None
, it will default to 'False'.
NOTE: This parameter is deprecated and it will be removed in future releases. Please, use theembed_continuous_method
parameter instead. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:None
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dim
(Optional[int]
, default:None
) –Size of the continuous embeddings. If the continuous columns are embedded,
cont_embed_dim
must be passed. -
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
blocks_dims
(List[int]
, default:[200, 100, 100]
) –List of integers that define the input and output units of each block. For example: [200, 100, 100] will generate 2 blocks. The first will receive a tensor of size 200 and output a tensor of size 100, and the second will receive a tensor of size 100 and output a tensor of size 100. See
pytorch_widedeep.models.tab_resnet._layers
for details on the structure of each block. -
blocks_dropout
(float
, default:0.1
) –Block's internal dropout.
-
simplify_blocks
(bool
, default:False
) –Boolean indicating if the simplest possible residual blocks (
X -> [ [LIN, BN, ACT] + X ]
) will be used instead of a standard one (X -> [ [LIN1, BN1, ACT1] -> [LIN2, BN2] + X ]
). -
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If
None
the output of the Resnet Blocks will be connected directly to the output neuron(s). -
mlp_activation
(Optional[str]
, default:None
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(Optional[float]
, default:None
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(Optional[bool]
, default:None
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
encoder
(Module
) –deep dense Resnet model that will receive the concatenation of the embeddings and the continuous columns
-
mlp
(Module
) –if
mlp_hidden_dims
isTrue
, this attribute will be an mlp model that will receive the results of the concatenation of the embeddings and the continuous columns -- if present --.
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabResnet
>>> X_deep = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabResnet(blocks_dims=[16,4], column_idx=column_idx, cat_embed_input=cat_embed_input,
... continuous_cols = ['e'])
>>> out = model(X_deep)
Source code in pytorch_widedeep/models/tabular/resnet/tab_resnet.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
TabResnetDecoder ¶
TabResnetDecoder(embed_dim, blocks_dims=[100, 100, 200], blocks_dropout=0.1, simplify_blocks=False, mlp_hidden_dims=None, mlp_activation=None, mlp_dropout=None, mlp_batchnorm=None, mlp_batchnorm_last=None, mlp_linear_first=None)
Bases: Module
Companion decoder model for the TabResnet
model (which can be
considered an encoder itself)
This class is designed to be used with the EncoderDecoderTrainer
when
using self-supervised pre-training (see the corresponding section in the
docs). This class will receive the output from the ResNet blocks or the
MLP(if present) and 'reconstruct' the embeddings.
Parameters:
-
embed_dim
(int
) –Size of the embeddings tensor to be reconstructed.
-
blocks_dims
(List[int]
, default:[100, 100, 200]
) –List of integers that define the input and output units of each block. For example: [200, 100, 100] will generate 2 blocks. The first will receive a tensor of size 200 and output a tensor of size 100, and the second will receive a tensor of size 100 and output a tensor of size 100. See
pytorch_widedeep.models.tab_resnet._layers
for details on the structure of each block. -
blocks_dropout
(float
, default:0.1
) –Block's internal dropout.
-
simplify_blocks
(bool
, default:False
) –Boolean indicating if the simplest possible residual blocks (
X -> [ [LIN, BN, ACT] + X ]
) will be used instead of a standard one (X -> [ [LIN1, BN1, ACT1] -> [LIN2, BN2] + X ]
). -
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If
None
the output of the Resnet Blocks will be connected directly to the output neuron(s). -
mlp_activation
(Optional[str]
, default:None
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(Optional[float]
, default:None
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(Optional[bool]
, default:None
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
decoder
(Module
) –deep dense Resnet model that will receive the output of the encoder IF
mlp_hidden_dims
is None -
mlp
(Module
) –if
mlp_hidden_dims
is not None, the overall decoder will consist in an MLP that will receive the output of the encoder followed by the deep dense Resnet.
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabResnetDecoder
>>> x_inp = torch.rand(3, 8)
>>> decoder = TabResnetDecoder(embed_dim=32, blocks_dims=[8, 16, 16])
>>> res = decoder(x_inp)
>>> res.shape
torch.Size([3, 32])
Source code in pytorch_widedeep/models/tabular/resnet/tab_resnet.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 |
|
TabNet ¶
TabNet(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, continuous_cols=None, cont_norm_layer=None, embed_continuous=None, embed_continuous_method=None, cont_embed_dim=None, cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, n_steps=3, step_dim=8, attn_dim=8, dropout=0.0, n_glu_step_dependent=2, n_glu_shared=2, ghost_bn=True, virtual_batch_size=128, momentum=0.02, gamma=1.3, epsilon=1e-15, mask_type='sparsemax')
Bases: BaseTabularModelWithoutAttention
Defines a TabNet model that
can be used as the deeptabular
component of a Wide & Deep model or
independently by itself.
The implementation in this library is fully based on that
here by the dreamquark-ai team,
simply adapted so that it can work within the WideDeep
frame.
Therefore, ALL CREDIT TO THE DREAMQUARK-AI TEAM.
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int, int]]]
, default:None
) –List of Tuples with the column name, number of unique values and embedding dimension. e.g. [(education, 11, 32), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous
(Optional[bool]
, default:None
) –Boolean indicating if the continuous columns will be embedded using one of the available methods: 'standard', 'periodic' or 'piecewise'. If
None
, it will default to 'False'.
NOTE: This parameter is deprecated and it will be removed in future releases. Please, use theembed_continuous_method
parameter instead. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:None
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dim
(Optional[int]
, default:None
) –Size of the continuous embeddings. If the continuous columns are embedded,
cont_embed_dim
must be passed. -
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
n_steps
(int
, default:3
) –number of decision steps. For a better understanding of the function of
n_steps
and the upcoming parameters, please see the paper. -
step_dim
(int
, default:8
) –Step's output dimension. This is the output dimension that
WideDeep
will collect and connect to the output neuron(s). -
attn_dim
(int
, default:8
) –Attention dimension
-
dropout
(float
, default:0.0
) –GLU block's internal dropout
-
n_glu_step_dependent
(int
, default:2
) –number of GLU Blocks (
[FC -> BN -> GLU]
) that are step dependent -
n_glu_shared
(int
, default:2
) –number of GLU Blocks (
[FC -> BN -> GLU]
) that will be shared across decision steps -
ghost_bn
(bool
, default:True
) –Boolean indicating if Ghost Batch Normalization will be used.
-
virtual_batch_size
(int
, default:128
) –Batch size when using Ghost Batch Normalization
-
momentum
(float
, default:0.02
) –Ghost Batch Normalization's momentum. The dreamquark-ai advises for very low values. However high values are used in the original publication. During our tests higher values lead to better results
-
gamma
(float
, default:1.3
) –Relaxation parameter in the paper. When gamma = 1, a feature is enforced to be used only at one decision step. As gamma increases, more flexibility is provided to use a feature at multiple decision steps
-
epsilon
(float
, default:1e-15
) –Float to avoid log(0). Always keep low
-
mask_type
(str
, default:'sparsemax'
) –Mask function to use. Either 'sparsemax' or 'entmax'
Attributes:
-
encoder
(Module
) –the TabNet encoder. For details see the original publication.
Examples:
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ["a", "b", "c", "d", "e"]
>>> cat_embed_input = [(u, i, j) for u, i, j in zip(colnames[:4], [4] * 4, [8] * 4)]
>>> column_idx = {k: v for v, k in enumerate(colnames)}
>>> model = TabNet(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=["e"])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/tabnet/tab_net.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
TabNetDecoder ¶
TabNetDecoder(embed_dim, n_steps=3, step_dim=8, dropout=0.0, n_glu_step_dependent=2, n_glu_shared=2, ghost_bn=True, virtual_batch_size=128, momentum=0.02)
Bases: Module
Companion decoder model for the TabNet
model (which can be
considered an encoder itself)
This class is designed to be used with the EncoderDecoderTrainer
when
using self-supervised pre-training (see the corresponding section in the
docs). This class will receive the output from the TabNet
encoder
(i.e. the output from the so called 'steps') and 'reconstruct' the
embeddings.
Parameters:
-
embed_dim
(int
) –Size of the embeddings tensor to be reconstructed.
-
n_steps
(int
, default:3
) –number of decision steps. For a better understanding of the function of
n_steps
and the upcoming parameters, please see the paper. -
step_dim
(int
, default:8
) –Step's output dimension. This is the output dimension that
WideDeep
will collect and connect to the output neuron(s). -
dropout
(float
, default:0.0
) –GLU block's internal dropout
-
n_glu_step_dependent
(int
, default:2
) –number of GLU Blocks (
[FC -> BN -> GLU]
) that are step dependent -
n_glu_shared
(int
, default:2
) –number of GLU Blocks (
[FC -> BN -> GLU]
) that will be shared across decision steps -
ghost_bn
(bool
, default:True
) –Boolean indicating if Ghost Batch Normalization will be used.
-
virtual_batch_size
(int
, default:128
) –Batch size when using Ghost Batch Normalization
-
momentum
(float
, default:0.02
) –Ghost Batch Normalization's momentum. The dreamquark-ai advises for very low values. However high values are used in the original publication. During our tests higher values lead to better results
Attributes:
-
decoder
(Module
) –decoder that will receive the output from the encoder's steps and will reconstruct the embeddings
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabNetDecoder
>>> x_inp = [torch.rand(3, 8), torch.rand(3, 8), torch.rand(3, 8)]
>>> decoder = TabNetDecoder(embed_dim=32, ghost_bn=False)
>>> res = decoder(x_inp)
>>> res.shape
torch.Size([3, 32])
Source code in pytorch_widedeep/models/tabular/tabnet/tab_net.py
342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 |
|
ContextAttentionMLP ¶
ContextAttentionMLP(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous_method='standard', cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=32, attn_dropout=0.2, with_addnorm=False, attn_activation='leaky_relu', n_blocks=3)
Bases: BaseTabularModelWithAttention
Defines a ContextAttentionMLP
model that can be used as the
deeptabular
component of a Wide & Deep model or independently by
itself.
This class combines embedding representations of the categorical features
with numerical (aka continuous) features that are also embedded. These
are then passed through a series of attention blocks. Each attention
block is comprised by a ContextAttentionEncoder
. Such encoder is in
part inspired by the attention mechanism described in
Hierarchical Attention Networks for Document
Classification.
See pytorch_widedeep.models.tabular.mlp._attention_layers
for details.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:'standard'
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:32
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns
-
attn_dropout
(float
, default:0.2
) –Dropout for each attention block
-
with_addnorm
(bool
, default:False
) –Boolean indicating if residual connections will be used in the attention blocks
-
attn_activation
(str
, default:'leaky_relu'
) –String indicating the activation function to be applied to the dense layer in each attention encoder. 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
-
n_blocks
(int
, default:3
) –Number of attention blocks
Attributes:
-
encoder
(Module
) –Sequence of attention encoders.
Examples:
>>> import torch
>>> from pytorch_widedeep.models import ContextAttentionMLP
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = ContextAttentionMLP(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols = ['e'])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/mlp/context_attention_mlp.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, F)\), where \(N\) is the batch size and \(F\) is the number of features/columns in the dataset
SelfAttentionMLP ¶
SelfAttentionMLP(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous_method='standard', cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=32, attn_dropout=0.2, n_heads=8, use_bias=False, with_addnorm=False, attn_activation='leaky_relu', n_blocks=3)
Bases: BaseTabularModelWithAttention
Defines a SelfAttentionMLP
model that can be used as the
deeptabular component of a Wide & Deep model or independently by
itself.
This class combines embedding representations of the categorical features
with numerical (aka continuous) features that are also embedded. These
are then passed through a series of attention blocks. Each attention
block is comprised by what we would refer as a simplified
SelfAttentionEncoder
. See
pytorch_widedeep.models.tabular.mlp._attention_layers
for details. The
reason to use a simplified version of self attention is because we
observed that the 'standard' attention mechanism used in the
TabTransformer has a notable tendency to overfit.
In more detail, this model only uses Q and K (and not V). If we think about it as in terms of text (and intuitively), the Softmax(QK^T) is the attention mechanism that tells us how much, at each position in the input sentence, each word is represented or 'expressed'. We refer to that as 'attention weights'. These attention weighst are normally multiplied by a Value matrix to further strength the focus on the words that each word should be attending to (again, intuitively).
In this implementation we skip this last multiplication and instead we multiply the attention weights directly by the input tensor. This is a simplification that we expect is beneficial in terms of avoiding overfitting for tabular data.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:'standard'
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:32
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns
-
attn_dropout
(float
, default:0.2
) –Dropout for each attention block
-
n_heads
(int
, default:8
) –Number of attention heads per attention block.
-
use_bias
(bool
, default:False
) –Boolean indicating whether or not to use bias in the Q, K projection layers.
-
with_addnorm
(bool
, default:False
) –Boolean indicating if residual connections will be used in the attention blocks
-
attn_activation
(str
, default:'leaky_relu'
) –String indicating the activation function to be applied to the dense layer in each attention encoder. 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported.
-
n_blocks
(int
, default:3
) –Number of attention blocks
Attributes:
-
cat_and_cont_embed
(Module
) –This is the module that processes the categorical and continuous columns
-
encoder
(Module
) –Sequence of attention encoders.
Examples:
>>> import torch
>>> from pytorch_widedeep.models import SelfAttentionMLP
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = SelfAttentionMLP(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols = ['e'])
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/mlp/self_attention_mlp.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property neccesary to build the WideDeep class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, H, F, F)\), where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the number of features/columns in the dataset
TabTransformer ¶
TabTransformer(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous=None, embed_continuous_method=None, cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=32, n_heads=8, use_qkv_bias=False, n_blocks=4, attn_dropout=0.2, ff_dropout=0.1, ff_factor=4, transformer_activation='gelu', use_linear_attention=False, use_flash_attention=False, mlp_hidden_dims=None, mlp_activation='relu', mlp_dropout=0.1, mlp_batchnorm=False, mlp_batchnorm_last=False, mlp_linear_first=True)
Bases: BaseTabularModelWithAttention
Defines our adptation of the
TabTransformer model
that can be used as the deeptabular
component of a
Wide & Deep model or independently by itself.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: This is an enhanced adaptation of the model described in the paper. It can be considered as the flagship of our transformer family of models for tabular data and offers mutiple, additional features relative to the original publication(and some other models in the library)
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:None
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:32
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns
-
n_heads
(int
, default:8
) –Number of attention heads per Transformer block
-
use_qkv_bias
(bool
, default:False
) –Boolean indicating whether or not to use bias in the Q, K, and V projection layers.
-
n_blocks
(int
, default:4
) –Number of Transformer blocks
-
attn_dropout
(float
, default:0.2
) –Dropout that will be applied to the Multi-Head Attention layers
-
ff_dropout
(float
, default:0.1
) –Dropout that will be applied to the FeedForward network
-
ff_factor
(int
, default:4
) –Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4.
-
transformer_activation
(str
, default:'gelu'
) –Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported
-
use_linear_attention
(bool
, default:False
) –Boolean indicating if Linear Attention (from Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention) will be used. The inclusing of this mode of attention is inspired by this post, where the Uber team finds that this attention mechanism leads to the best results for their tabular data.
-
use_flash_attention
(bool
, default:False
) –Boolean indicating if Flash Attention will be used.
-
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used.
-
mlp_activation
(str
, default:'relu'
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(float
, default:0.1
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(bool
, default:True
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
encoder
(Module
) –Sequence of Transformer blocks
-
mlp
(Module
) –MLP component in the model
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabTransformer
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabTransformer(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/tab_transformer.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, H, F, F)\), where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the number of features/columns in the dataset
NOTE: if flash attention or linear attention are used, no attention weights are saved during the training process and calling this property will throw a ValueError
SAINT ¶
SAINT(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous_method='standard', cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=32, use_qkv_bias=False, n_heads=8, n_blocks=2, attn_dropout=0.1, ff_dropout=0.2, ff_factor=4, transformer_activation='gelu', mlp_hidden_dims=None, mlp_activation=None, mlp_dropout=None, mlp_batchnorm=None, mlp_batchnorm_last=None, mlp_linear_first=None)
Bases: BaseTabularModelWithAttention
Defines a SAINT model that
can be used as the deeptabular
component of a Wide & Deep model or
independently by itself.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: This is an slightly modified and enhanced version of the model described in the paper,
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:'standard'
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:32
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns
-
n_heads
(int
, default:8
) –Number of attention heads per Transformer block
-
use_qkv_bias
(bool
, default:False
) –Boolean indicating whether or not to use bias in the Q, K, and V projection layers
-
n_blocks
(int
, default:2
) –Number of SAINT-Transformer blocks.
-
attn_dropout
(float
, default:0.1
) –Dropout that will be applied to the Multi-Head Attention column and row layers
-
ff_dropout
(float
, default:0.2
) –Dropout that will be applied to the FeedForward network
-
ff_factor
(int
, default:4
) –Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4.
-
transformer_activation
(str
, default:'gelu'
) –Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported
-
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used.
-
mlp_activation
(Optional[str]
, default:None
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(Optional[float]
, default:None
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(Optional[bool]
, default:None
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
encoder
(Module
) –Sequence of SAINT-Transformer blocks
-
mlp
(Module
) –MLP component in the model
Examples:
>>> import torch
>>> from pytorch_widedeep.models import SAINT
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = SAINT(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/saint.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights. Each element of the list is a tuple where the first and the second elements are the column and row attention weights respectively
The shape of the attention weights is:
-
column attention: \((N, H, F, F)\)
-
row attention: \((1, H, N, N)\)
where \(N\) is the batch size, \(H\) is the number of heads and \(F\) is the number of features/columns in the dataset
FTTransformer ¶
FTTransformer(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous_method='standard', cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=64, kv_compression_factor=0.5, kv_sharing=False, use_qkv_bias=False, n_heads=8, n_blocks=4, attn_dropout=0.2, ff_dropout=0.1, ff_factor=1.33, transformer_activation='reglu', mlp_hidden_dims=None, mlp_activation=None, mlp_dropout=None, mlp_batchnorm=None, mlp_batchnorm_last=None, mlp_linear_first=None)
Bases: BaseTabularModelWithAttention
Defines a FTTransformer model that
can be used as the deeptabular
component of a Wide & Deep model or
independently by itself.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:'standard'
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:64
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns.
-
kv_compression_factor
(float
, default:0.5
) –By default, the FTTransformer uses Linear Attention (See Linformer: Self-Attention with Linear Complexity ). The compression factor that will be used to reduce the input sequence length. If we denote the resulting sequence length as \(k = int(kv_{compression \space factor} \times s)\) where \(s\) is the input sequence length.
-
kv_sharing
(bool
, default:False
) –Boolean indicating if the \(E\) and \(F\) projection matrices will share weights. See Linformer: Self-Attention with Linear Complexity for details
-
n_heads
(int
, default:8
) –Number of attention heads per FTTransformer block
-
use_qkv_bias
(bool
, default:False
) –Boolean indicating whether or not to use bias in the Q, K, and V projection layers
-
n_blocks
(int
, default:4
) –Number of FTTransformer blocks
-
attn_dropout
(float
, default:0.2
) –Dropout that will be applied to the Linear-Attention layers
-
ff_dropout
(float
, default:0.1
) –Dropout that will be applied to the FeedForward network
-
ff_factor
(float
, default:1.33
) –Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4, but they use 4/3 in the paper.
-
transformer_activation
(str
, default:'reglu'
) –Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported
-
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final FTTransformer block will be used.
-
mlp_activation
(Optional[str]
, default:None
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(Optional[float]
, default:None
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(Optional[bool]
, default:None
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
encoder
(Module
) –Sequence of FTTransformer blocks
-
mlp
(Module
) –MLP component in the model
Examples:
>>> import torch
>>> from pytorch_widedeep.models import FTTransformer
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = FTTransformer(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/ft_transformer.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is: \((N, H, F, k)\), where \(N\) is the batch size, \(H\) is the number of attention heads, \(F\) is the number of features/columns and \(k\) is the reduced sequence length or dimension, i.e. \(k = int(kv_{compression \space factor} \times s)\)
TabPerceiver ¶
TabPerceiver(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous_method='standard', cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=32, n_cross_attns=1, n_cross_attn_heads=4, n_latents=16, latent_dim=128, n_latent_heads=4, n_latent_blocks=4, n_perceiver_blocks=4, share_weights=False, attn_dropout=0.1, ff_dropout=0.1, ff_factor=4, transformer_activation='geglu', mlp_hidden_dims=None, mlp_activation=None, mlp_dropout=None, mlp_batchnorm=None, mlp_batchnorm_last=None, mlp_linear_first=None)
Bases: BaseTabularModelWithAttention
Defines an adaptation of a Perceiver
that can be used as the deeptabular
component of a Wide & Deep model
or independently by itself.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: while there are scientific publications for
the TabTransformer
, SAINT
and FTTransformer
, the TabPerceiver
and the TabFastFormer
are our own adaptations of the
Perceiver and the
FastFormer for tabular data.
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:'standard'
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:32
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns.
-
n_cross_attns
(int
, default:1
) –Number of times each perceiver block will cross attend to the input data (i.e. number of cross attention components per perceiver block). This should normally be 1. However, in the paper they describe some architectures (normally computer vision-related problems) where the Perceiver attends multiple times to the input array. Therefore, maybe multiple cross attention to the input array is also useful in some cases for tabular data .
-
n_cross_attn_heads
(int
, default:4
) –Number of attention heads for the cross attention component
-
n_latents
(int
, default:16
) –Number of latents. This is the \(N\) parameter in the paper. As indicated in the paper, this number should be significantly lower than \(M\) (the number of columns in the dataset). Setting \(N\) closer to \(M\) defies the main purpose of the Perceiver, which is to overcome the transformer quadratic bottleneck
-
latent_dim
(int
, default:128
) –Latent dimension.
-
n_latent_heads
(int
, default:4
) –Number of attention heads per Latent Transformer
-
n_latent_blocks
(int
, default:4
) –Number of transformer encoder blocks (normalised MHA + normalised FF) per Latent Transformer
-
n_perceiver_blocks
(int
, default:4
) –Number of Perceiver blocks defined as [Cross Attention + Latent Transformer]
-
share_weights
(bool
, default:False
) –Boolean indicating if the weights will be shared between Perceiver blocks
-
attn_dropout
(float
, default:0.1
) –Dropout that will be applied to the Multi-Head Attention layers
-
ff_dropout
(float
, default:0.1
) –Dropout that will be applied to the FeedForward network
-
ff_factor
(int
, default:4
) –Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4.
-
transformer_activation
(str
, default:'geglu'
) –Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported
-
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used.
-
mlp_activation
(Optional[str]
, default:None
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(Optional[float]
, default:None
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(Optional[bool]
, default:None
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
encoder
(ModuleDict
) –ModuleDict with the Perceiver blocks
-
latents
(Parameter
) –Latents that will be used for prediction
-
mlp
(Module
) –MLP component in the model
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabPerceiver
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabPerceiver(column_idx=column_idx, cat_embed_input=cat_embed_input,
... continuous_cols=continuous_cols, n_latents=2, latent_dim=16,
... n_perceiver_blocks=2)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/tab_perceiver.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights. If the weights are not shared between perceiver blocks each element of the list will be a list itself containing the Cross Attention and Latent Transformer attention weights respectively
The shape of the attention weights is:
-
Cross Attention: \((N, C, L, F)\)
-
Latent Attention: \((N, T, L, L)\)
WHere \(N\) is the batch size, \(C\) is the number of Cross Attention heads, \(L\) is the number of Latents, \(F\) is the number of features/columns in the dataset and \(T\) is the number of Latent Attention heads
TabFastFormer ¶
TabFastFormer(column_idx, *, cat_embed_input=None, cat_embed_dropout=None, use_cat_bias=None, cat_embed_activation=None, shared_embed=None, add_shared_embed=None, frac_shared_embed=None, continuous_cols=None, cont_norm_layer=None, embed_continuous_method='standard', cont_embed_dropout=None, cont_embed_activation=None, quantization_setup=None, n_frequencies=None, sigma=None, share_last_layer=None, full_embed_dropout=None, input_dim=32, n_heads=8, use_bias=False, n_blocks=4, attn_dropout=0.1, ff_dropout=0.2, ff_factor=4, share_qv_weights=False, share_weights=False, transformer_activation='relu', mlp_hidden_dims=None, mlp_activation=None, mlp_dropout=None, mlp_batchnorm=None, mlp_batchnorm_last=None, mlp_linear_first=None)
Bases: BaseTabularModelWithAttention
Defines an adaptation of a FastFormer
that can be used as the deeptabular
component of a Wide & Deep model
or independently by itself.
Most of the parameters for this class are Optional
since the use of
categorical or continuous is in fact optional (i.e. one can use
categorical features only, continuous features only or both).
NOTE: while there are scientific publications for
the TabTransformer
, SAINT
and FTTransformer
, the TabPerceiver
and the TabFastFormer
are our own adaptations of the
Perceiver and the
FastFormer for tabular data.
Parameters:
-
column_idx
(Dict[str, int]
) –Dict containing the index of the columns that will be passed through the
TabMlp
model. Required to slice the tensors. e.g. {'education': 0, 'relationship': 1, 'workclass': 2, ...}. -
cat_embed_input
(Optional[List[Tuple[str, int]]]
, default:None
) –List of Tuples with the column name and number of unique values and embedding dimension. e.g. [(education, 11), ...]
-
cat_embed_dropout
(Optional[float]
, default:None
) –Categorical embeddings dropout. If
None
, it will default to 0. -
use_cat_bias
(Optional[bool]
, default:None
) –Boolean indicating if bias will be used for the categorical embeddings. If
None
, it will default to 'False'. -
cat_embed_activation
(Optional[str]
, default:None
) –Activation function for the categorical embeddings, if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
shared_embed
(Optional[bool]
, default:None
) –Boolean indicating if the embeddings will be "shared". The idea behind
shared_embed
is described in the Appendix A in the TabTransformer paper: 'The goal of having column embedding is to enable the model to distinguish the classes in one column from those in the other columns'. In other words, the idea is to let the model learn which column is embedded at the time. See:pytorch_widedeep.models.transformers._layers.SharedEmbeddings
. -
add_shared_embed
(Optional[bool]
, default:None
) –The two embedding sharing strategies are: 1) add the shared embeddings to the column embeddings or 2) to replace the first
frac_shared_embed
with the shared embeddings. Seepytorch_widedeep.models.embeddings_layers.SharedEmbeddings
If 'None' is passed, it will default to 'False'. -
frac_shared_embed
(Optional[float]
, default:None
) –The fraction of embeddings that will be shared (if
add_shared_embed = False
) by all the different categories for one particular column. If 'None' is passed, it will default to 0.0. -
continuous_cols
(Optional[List[str]]
, default:None
) –List with the name of the numeric (aka continuous) columns
-
cont_norm_layer
(Optional[Literal[batchnorm, layernorm]]
, default:None
) –Type of normalization layer applied to the continuous features. Options are: 'layernorm' and 'batchnorm'. if
None
, no normalization layer will be used. -
embed_continuous_method
(Optional[Literal[standard, piecewise, periodic]]
, default:'standard'
) –Method to use to embed the continuous features. Options are: 'standard', 'periodic' or 'piecewise'. The 'standard' embedding method is based on the FT-Transformer implementation presented in the paper: Revisiting Deep Learning Models for Tabular Data. The 'periodic' and_'piecewise'_ methods were presented in the paper: On Embeddings for Numerical Features in Tabular Deep Learning. Please, read the papers for details.
-
cont_embed_dropout
(Optional[float]
, default:None
) –Dropout for the continuous embeddings. If
None
, it will default to 0.0 -
cont_embed_activation
(Optional[str]
, default:None
) –Activation function for the continuous embeddings if any. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported. If
None
, no activation function will be applied. -
quantization_setup
(Optional[Dict[str, List[float]]]
, default:None
) –This parameter is used when the 'piecewise' method is used to embed the continuous cols. It is a dict where keys are the name of the continuous columns and values are lists with the boundaries for the quantization of the continuous_cols. See the examples for details. If If the 'piecewise' method is used, this parameter is required.
-
n_frequencies
(Optional[int]
, default:None
) –This is the so called 'k' in their paper On Embeddings for Numerical Features in Tabular Deep Learning, and is the number of 'frequencies' that will be used to represent each continuous column. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
sigma
(Optional[float]
, default:None
) –This is the sigma parameter in the paper mentioned when describing the previous parameters and it is used to initialise the 'frequency weights'. See their Eq 2 in the paper for details. If the 'periodic' method is used, this parameter is required.
-
share_last_layer
(Optional[bool]
, default:None
) –This parameter is not present in the before mentioned paper but it is implemented in the official repo. If
True
the linear layer that turns the frequencies into embeddings will be shared across the continuous columns. IfFalse
a different linear layer will be used for each continuous column. If the 'periodic' method is used, this parameter is required. -
full_embed_dropout
(Optional[bool]
, default:None
) –If
True
, the full embedding corresponding to a column will be masked out/dropout. IfNone
, it will default toFalse
. -
input_dim
(int
, default:32
) –The so-called dimension of the model. Is the number of embeddings used to encode the categorical and/or continuous columns
-
n_heads
(int
, default:8
) –Number of attention heads per FastFormer block
-
use_bias
(bool
, default:False
) –Boolean indicating whether or not to use bias in the Q, K, and V projection layers
-
n_blocks
(int
, default:4
) –Number of FastFormer blocks
-
attn_dropout
(float
, default:0.1
) –Dropout that will be applied to the Additive Attention layers
-
ff_dropout
(float
, default:0.2
) –Dropout that will be applied to the FeedForward network
-
ff_factor
(int
, default:4
) –Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4.
-
share_qv_weights
(bool
, default:False
) –Following the paper, this is a boolean indicating if the Value (\(V\)) and the Query (\(Q\)) transformation parameters will be shared.
-
share_weights
(bool
, default:False
) –In addition to sharing the \(V\) and \(Q\) transformation parameters, the parameters across different Fastformer layers can also be shared. Please, see
pytorch_widedeep/models/tabular/transformers/tab_fastformer.py
for details -
transformer_activation
(str
, default:'relu'
) –Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported
-
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –MLP hidden dimensions. If not provided no MLP on top of the final FTTransformer block will be used
-
mlp_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the MLP. e.g: [64, 32]. If not provided no MLP on top of the final Transformer block will be used.
-
mlp_activation
(Optional[str]
, default:None
) –Activation function for the dense layers of the MLP. Currently 'tanh', 'relu', 'leaky'_relu' and _'gelu' are supported. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 'relu'. -
mlp_dropout
(Optional[float]
, default:None
) –float with the dropout between the dense layers of the MLP. If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to 0.0. -
mlp_batchnorm
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_batchnorm_last
(Optional[bool]
, default:None
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers If 'mlp_hidden_dims' is not
None
and this parameter isNone
, it will default to False. -
mlp_linear_first
(Optional[bool]
, default:None
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
If 'mlp_hidden_dims' is notNone
and this parameter isNone
, it will default toTrue
.
Attributes:
-
encoder
(Module
) –Sequence of FasFormer blocks.
-
mlp
(Module
) –MLP component in the model
Examples:
>>> import torch
>>> from pytorch_widedeep.models import TabFastFormer
>>> X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)
>>> colnames = ['a', 'b', 'c', 'd', 'e']
>>> cat_embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]
>>> continuous_cols = ['e']
>>> column_idx = {k:v for v,k in enumerate(colnames)}
>>> model = TabFastFormer(column_idx=column_idx, cat_embed_input=cat_embed_input, continuous_cols=continuous_cols)
>>> out = model(X_tab)
Source code in pytorch_widedeep/models/tabular/transformers/tab_fastformer.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights. Each element of the list is a tuple where the first and second elements are the \(\alpha\) and \(\beta\) attention weights in the paper.
The shape of the attention weights is \((N, H, F)\) where \(N\) is the batch size, \(H\) is the number of attention heads and \(F\) is the number of features/columns in the dataset
NOTE: when we started developing the library we
thought that combining Deep Learning architectures for tabular data, with
CNN-based architectures (pretrained or not) for images and Transformer-based
architectures for text would be an 'overkill' (also, pretrained
transformer-based models were not as readily available as they are today).
Therefore, at that time we made the decision of including in the library
simple RNN-based architectures for the text dataset. A lot has passed since
then and it is our intention to integrate this library with the
Hugginface's Transformers library
in the near future. Nonetheless, note that it is still possible to use any
custom model as the deeptext
component using this library. Please, see the
example section in this documentation for details
BasicRNN ¶
BasicRNN(vocab_size, embed_dim=None, embed_matrix=None, embed_trainable=True, rnn_type='lstm', hidden_dim=64, n_layers=3, rnn_dropout=0.1, bidirectional=False, use_hidden_state=True, padding_idx=1, head_hidden_dims=None, head_activation='relu', head_dropout=None, head_batchnorm=False, head_batchnorm_last=False, head_linear_first=False)
Bases: BaseWDModelComponent
Standard text classifier/regressor comprised by a stack of RNNs
(LSTMs or GRUs) that can be used as the deeptext
component of a Wide &
Deep model or independently by itself.
In addition, there is the option to add a Fully Connected (FC) set of dense layers on top of the stack of RNNs
Parameters:
-
vocab_size
(int
) –Number of words in the vocabulary
-
embed_dim
(Optional[int]
, default:None
) –Dimension of the word embeddings if non-pretained word vectors are used
-
embed_matrix
(Optional[ndarray]
, default:None
) –Pretrained word embeddings
-
embed_trainable
(bool
, default:True
) –Boolean indicating if the pretrained embeddings are trainable
-
rnn_type
(str
, default:'lstm'
) –String indicating the type of RNN to use. One of 'lstm' or 'gru'
-
hidden_dim
(int
, default:64
) –Hidden dim of the RNN
-
n_layers
(int
, default:3
) –Number of recurrent layers
-
rnn_dropout
(float
, default:0.1
) –Dropout for each RNN layer except the last layer
-
bidirectional
(bool
, default:False
) –Boolean indicating whether the staked RNNs are bidirectional
-
use_hidden_state
(bool
, default:True
) –Boolean indicating whether to use the final hidden state or the RNN's output as predicting features. Typically the former is used.
-
padding_idx
(int
, default:1
) –index of the padding token in the padded-tokenised sequences. The
TextPreprocessor
class within this library uses fastai's tokenizer where the token index 0 is reserved for the 'unknown' word token. Therefore, the default value is set to 1. -
head_hidden_dims
(Optional[List[int]]
, default:None
) –List with the sizes of the dense layers in the head e.g: [128, 64]
-
head_activation
(str
, default:'relu'
) –Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
head_dropout
(Optional[float]
, default:None
) –Dropout of the dense layers in the head
-
head_batchnorm
(bool
, default:False
) –Boolean indicating whether or not to include batch normalization in the dense layers that form the 'rnn_mlp'
-
head_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head
-
head_linear_first
(bool
, default:False
) –Boolean indicating whether the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
Attributes:
-
word_embed
(Module
) –word embedding matrix
-
rnn
(Module
) –Stack of RNNs
-
rnn_mlp
(Module
) –Stack of dense layers on top of the RNN. This will only exists if
head_layers_dim
is not None
Examples:
>>> import torch
>>> from pytorch_widedeep.models import BasicRNN
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = BasicRNN(vocab_size=4, hidden_dim=4, n_layers=2, padding_idx=0, embed_dim=4)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/basic_rnn.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
AttentiveRNN ¶
AttentiveRNN(vocab_size, embed_dim=None, embed_matrix=None, embed_trainable=True, rnn_type='lstm', hidden_dim=64, n_layers=3, rnn_dropout=0.1, bidirectional=False, use_hidden_state=True, padding_idx=1, attn_concatenate=True, attn_dropout=0.1, head_hidden_dims=None, head_activation='relu', head_dropout=None, head_batchnorm=False, head_batchnorm_last=False, head_linear_first=False)
Bases: BasicRNN
Text classifier/regressor comprised by a stack of RNNs
(LSTMs or GRUs) plus an attention layer. This model can be used as the
deeptext
component of a Wide & Deep model or independently by
itself.
In addition, there is the option to add a Fully Connected (FC) set of dense layers on top of attention layer
Parameters:
-
vocab_size
(int
) –Number of words in the vocabulary
-
embed_dim
(Optional[int]
, default:None
) –Dimension of the word embeddings if non-pretained word vectors are used
-
embed_matrix
(Optional[ndarray]
, default:None
) –Pretrained word embeddings
-
embed_trainable
(bool
, default:True
) –Boolean indicating if the pretrained embeddings are trainable
-
rnn_type
(str
, default:'lstm'
) –String indicating the type of RNN to use. One of 'lstm' or 'gru'
-
hidden_dim
(int
, default:64
) –Hidden dim of the RNN
-
n_layers
(int
, default:3
) –Number of recurrent layers
-
rnn_dropout
(float
, default:0.1
) –Dropout for each RNN layer except the last layer
-
bidirectional
(bool
, default:False
) –Boolean indicating whether the staked RNNs are bidirectional
-
use_hidden_state
(bool
, default:True
) –Boolean indicating whether to use the final hidden state or the RNN's output as predicting features. Typically the former is used.
-
padding_idx
(int
, default:1
) –index of the padding token in the padded-tokenised sequences. The
TextPreprocessor
class within this library uses fastai's tokenizer where the token index 0 is reserved for the 'unknown' word token. Therefore, the default value is set to 1. -
attn_concatenate
(bool
, default:True
) –Boolean indicating if the input to the attention mechanism will be the output of the RNN or the output of the RNN concatenated with the last hidden state.
-
attn_dropout
(float
, default:0.1
) –Internal dropout for the attention mechanism
-
head_hidden_dims
(Optional[List[int]]
, default:None
) –List with the sizes of the dense layers in the head e.g: [128, 64]
-
head_activation
(str
, default:'relu'
) –Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
head_dropout
(Optional[float]
, default:None
) –Dropout of the dense layers in the head
-
head_batchnorm
(bool
, default:False
) –Boolean indicating whether or not to include batch normalization in the dense layers that form the 'rnn_mlp'
-
head_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head
-
head_linear_first
(bool
, default:False
) –Boolean indicating whether the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
Attributes:
-
word_embed
(Module
) –word embedding matrix
-
rnn
(Module
) –Stack of RNNs
-
rnn_mlp
(Module
) –Stack of dense layers on top of the RNN. This will only exists if
head_layers_dim
is notNone
Examples:
>>> import torch
>>> from pytorch_widedeep.models import AttentiveRNN
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = AttentiveRNN(vocab_size=4, hidden_dim=4, n_layers=2, padding_idx=0, embed_dim=4)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/attentive_rnn.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
attention_weights
property
¶
attention_weights
List with the attention weights
The shape of the attention weights is \((N, S)\), where \(N\) is the batch size and \(S\) is the length of the sequence
StackedAttentiveRNN ¶
StackedAttentiveRNN(vocab_size, embed_dim=None, embed_matrix=None, embed_trainable=True, rnn_type='lstm', hidden_dim=64, bidirectional=False, padding_idx=1, n_blocks=3, attn_concatenate=False, attn_dropout=0.1, with_addnorm=False, head_hidden_dims=None, head_activation='relu', head_dropout=None, head_batchnorm=False, head_batchnorm_last=False, head_linear_first=False)
Bases: BaseWDModelComponent
Text classifier/regressor comprised by a stack of blocks:
[RNN + Attention]
. This can be used as the deeptext
component of a
Wide & Deep model or independently by itself.
In addition, there is the option to add a Fully Connected (FC) set of dense layers on top of the attentiob blocks
Parameters:
-
vocab_size
(int
) –Number of words in the vocabulary
-
embed_dim
(Optional[int]
, default:None
) –Dimension of the word embeddings if non-pretained word vectors are used
-
embed_matrix
(Optional[ndarray]
, default:None
) –Pretrained word embeddings
-
embed_trainable
(bool
, default:True
) –Boolean indicating if the pretrained embeddings are trainable
-
rnn_type
(str
, default:'lstm'
) –String indicating the type of RNN to use. One of 'lstm' or 'gru'
-
hidden_dim
(int
, default:64
) –Hidden dim of the RNN
-
bidirectional
(bool
, default:False
) –Boolean indicating whether the staked RNNs are bidirectional
-
padding_idx
(int
, default:1
) –index of the padding token in the padded-tokenised sequences. The
TextPreprocessor
class within this library uses fastai's tokenizer where the token index 0 is reserved for the 'unknown' word token. Therefore, the default value is set to 1. -
n_blocks
(int
, default:3
) –Number of attention blocks. Each block is comprised by an RNN and a Context Attention Encoder
-
attn_concatenate
(bool
, default:False
) –Boolean indicating if the input to the attention mechanism will be the output of the RNN or the output of the RNN concatenated with the last hidden state or simply
-
attn_dropout
(float
, default:0.1
) –Internal dropout for the attention mechanism
-
with_addnorm
(bool
, default:False
) –Boolean indicating if the output of each block will be added to the input and normalised
-
head_hidden_dims
(Optional[List[int]]
, default:None
) –List with the sizes of the dense layers in the head e.g: [128, 64]
-
head_activation
(str
, default:'relu'
) –Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
head_dropout
(Optional[float]
, default:None
) –Dropout of the dense layers in the head
-
head_batchnorm
(bool
, default:False
) –Boolean indicating whether or not to include batch normalization in the dense layers that form the 'rnn_mlp'
-
head_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head
-
head_linear_first
(bool
, default:False
) –Boolean indicating whether the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
Attributes:
-
word_embed
(Module
) –word embedding matrix
-
rnn
(Module
) –Stack of RNNs
-
rnn_mlp
(Module
) –Stack of dense layers on top of the RNN. This will only exists if
head_layers_dim
is notNone
Examples:
>>> import torch
>>> from pytorch_widedeep.models import StackedAttentiveRNN
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = StackedAttentiveRNN(vocab_size=4, hidden_dim=4, padding_idx=0, embed_dim=4)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/stacked_attentive_rnn.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
attention_weights
property
¶
attention_weights
List with the attention weights per block
The shape of the attention weights is \((N, S)\) Where \(N\) is the batch size and \(S\) is the length of the sequence
Transformer ¶
Transformer(vocab_size, seq_length, input_dim, n_heads, n_blocks, attn_dropout=0.1, ff_dropout=0.1, ff_factor=4, activation='gelu', use_linear_attention=False, use_flash_attention=False, padding_idx=0, with_cls_token=False, *, with_pos_encoding=True, pos_encoding_dropout=0.1, pos_encoder=None)
Bases: Module
Basic Encoder-Only Transformer Model for text classification/regression.
As all other models in the library this model can be used as the
deeptext
component of a Wide & Deep model or independently by itself.
NOTE: This model is introduced in the context of recommendation systems and thought for sequences of any nature (e.g. items). It can, of course, still be used for text. However, at this stage, we have decided to not include the possibility of loading pretrained word vectors since we aim to integrate the library wit Huggingface in the (hopefully) near future
Parameters:
-
vocab_size
(int
) –Number of words in the vocabulary
-
input_dim
(int
) –Dimension of the token embeddings
Param aliases:
embed_dim
,d_model
. -
seq_length
(int
) –Input sequence length
-
n_heads
(int
) –Number of attention heads per Transformer block
-
n_blocks
(int
) –Number of Transformer blocks
-
attn_dropout
(float
, default:0.1
) –Dropout that will be applied to the Multi-Head Attention layers
-
ff_dropout
(float
, default:0.1
) –Dropout that will be applied to the FeedForward network
-
ff_factor
(int
, default:4
) –Multiplicative factor applied to the first layer of the FF network in each Transformer block, This is normally set to 4.
-
activation
(str
, default:'gelu'
) –Transformer Encoder activation function. 'tanh', 'relu', 'leaky_relu', 'gelu', 'geglu' and 'reglu' are supported
-
padding_idx
(int
, default:0
) –index of the padding token in the padded-tokenised sequences.
-
with_cls_token
(bool
, default:False
) –Boolean indicating if a
'[CLS]'
token is included in the tokenized sequences. If present, the final hidden state corresponding to this token is used as the aggregated representation for classification and regression tasks. NOTE: if included in the tokenized sequences it must be inserted as the first token in the sequences. -
with_pos_encoding
(bool
, default:True
) –Boolean indicating if positional encoding will be used
-
pos_encoding_dropout
(float
, default:0.1
) –Positional encoding dropout
-
pos_encoder
(Optional[Module]
, default:None
) –This model uses by default a standard positional encoding approach. However, any custom positional encoder can also be used and pass to the Transformer model via the 'pos_encoder' parameter
Attributes:
-
embedding
(Module
) –Standard token embedding layer
-
pos_encoder
(Module
) –Positional Encoder
-
encoder
(Module
) –Sequence of Transformer blocks
Examples:
>>> import torch
>>> from pytorch_widedeep.models import Transformer
>>> X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)
>>> model = Transformer(vocab_size=4, seq_length=5, input_dim=8, n_heads=1, n_blocks=1)
>>> out = model(X_text)
Source code in pytorch_widedeep/models/text/basic_transformer.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
Vision ¶
Vision(pretrained_model_setup=None, n_trainable=None, trainable_params=None, channel_sizes=[64, 128, 256, 512], kernel_sizes=[7, 3, 3, 3], strides=[2, 1, 1, 1], head_hidden_dims=None, head_activation='relu', head_dropout=0.1, head_batchnorm=False, head_batchnorm_last=False, head_linear_first=False)
Bases: BaseWDModelComponent
Defines a standard image classifier/regressor using a pretrained
network or a sequence of convolution layers that can be used as the
deepimage
component of a Wide & Deep model or independently by
itself.
NOTE: this class represents the integration
between pytorch-widedeep
and torchvision
. New architectures will be
available as they are added to torchvision
. In a distant future we aim
to bring transformer-based architectures as well. However, simple
CNN-based architectures (and even MLP-based) seem to produce SoTA
results. For the time being, we describe below the options available
through this class
Parameters:
-
pretrained_model_setup
(Union[str, Dict[str, Union[str, WeightsEnum]]]
, default:None
) –Name of the pretrained model. Should be a variant of the following architectures: 'resnet', 'shufflenet', 'resnext', 'wide_resnet', 'regnet', 'densenet', 'mobilenetv3', 'mobilenetv2', 'mnasnet', 'efficientnet' and 'squeezenet'. if
pretrained_model_setup = None
a basic, fully trainable CNN will be used. Alternatively, since Torchvision 0.13 one can use pretrained models with different weigths. Therefore,pretrained_model_setup
can also be dictionary with the name of the model and the weights (e.g.{'resnet50': ResNet50_Weights.DEFAULT}
or{'resnet50': "IMAGENET1K_V2"}
).
Aliased aspretrained_model_name
. -
n_trainable
(Optional[int]
, default:None
) –Number of trainable layers starting from the layer closer to the output neuron(s). Note that this number DOES NOT take into account the so-called 'head' which is ALWAYS trainable. If
trainable_params
is not None this parameter will be ignored -
trainable_params
(Optional[List[str]]
, default:None
) –List of strings containing the names (or substring within the name) of the parameters that will be trained. For example, if we use a 'resnet18' pretrained model and we set
trainable_params = ['layer4']
only the parameters of 'layer4' of the network (and the head, as mentioned before) will be trained. Note that setting this or the previous parameter involves some knowledge of the architecture used. -
channel_sizes
(List[int]
, default:[64, 128, 256, 512]
) –List of integers with the channel sizes of a CNN in case we choose not to use a pretrained model
-
kernel_sizes
(Union[int, List[int]]
, default:[7, 3, 3, 3]
) –List of integers with the kernel sizes of a CNN in case we choose not to use a pretrained model. Must be of length equal to
len(channel_sizes) - 1
. -
strides
(Union[int, List[int]]
, default:[2, 1, 1, 1]
) –List of integers with the stride sizes of a CNN in case we choose not to use a pretrained model. Must be of length equal to
len(channel_sizes) - 1
. -
head_hidden_dims
(Optional[List[int]]
, default:None
) –List with the number of neurons per dense layer in the head. e.g: [64,32]
-
head_activation
(str
, default:'relu'
) –Activation function for the dense layers in the head. Currently 'tanh', 'relu', 'leaky_relu' and 'gelu' are supported
-
head_dropout
(Union[float, List[float]]
, default:0.1
) –float indicating the dropout between the dense layers.
-
head_batchnorm
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the dense layers
-
head_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not batch normalization will be applied to the last of the dense layers
-
head_linear_first
(bool
, default:False
) –Boolean indicating the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
Attributes:
-
features
(Module
) –The pretrained model or Standard CNN plus the optional head
Examples:
>>> import torch
>>> from pytorch_widedeep.models import Vision
>>> X_img = torch.rand((2,3,224,224))
>>> model = Vision(channel_sizes=[64, 128], kernel_sizes = [3, 3], strides=[1, 1], head_hidden_dims=[32, 8])
>>> out = model(X_img)
Source code in pytorch_widedeep/models/image/vision.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
output_dim
property
¶
output_dim
The output dimension of the model. This is a required property
neccesary to build the WideDeep
class
WideDeep ¶
WideDeep(wide=None, deeptabular=None, deeptext=None, deepimage=None, deephead=None, head_hidden_dims=None, head_activation='relu', head_dropout=0.1, head_batchnorm=False, head_batchnorm_last=False, head_linear_first=True, enforce_positive=False, enforce_positive_activation='softplus', pred_dim=1, with_fds=False, **fds_config)
Bases: Module
Main collector class that combines all wide
, deeptabular
deeptext
and deepimage
models.
Note that all models described so far in this library must be passed to
the WideDeep
class once constructed. This is because the models output
the last layer before the prediction layer. Such prediction layer is
added by the WideDeep
class as it collects the components for every
data mode.
There are two options to combine these models that correspond to the
two main architectures that pytorch-widedeep
can build.
-
Directly connecting the output of the model components to an ouput neuron(s).
-
Adding a
Fully-Connected Head
(FC-Head) on top of the deep models. This FC-Head will combine the output form thedeeptabular
,deeptext
anddeepimage
and will be then connected to the output neuron(s).
Parameters:
-
wide
(Optional[Module]
, default:None
) –Wide
model. This is a linear model where the non-linearities are captured via crossed-columns. -
deeptabular
(Optional[BaseWDModelComponent]
, default:None
) –Currently this library implements a number of possible architectures for the
deeptabular
component. See the documenation of the package. -
deeptext
(Optional[BaseWDModelComponent]
, default:None
) –Currently this library implements a number of possible architectures for the
deeptext
component. See the documenation of the package. -
deepimage
(Optional[BaseWDModelComponent]
, default:None
) –Currently this library uses
torchvision
and implements a number of possible architectures for thedeepimage
component. See the documenation of the package. -
deephead
(Optional[BaseWDModelComponent]
, default:None
) –Alternatively, the user can pass a custom model that will receive the output of the deep component. If
deephead
is not None all the previous fc-head parameters will be ignored -
head_hidden_dims
(Optional[List[int]]
, default:None
) –List with the sizes of the dense layers in the head e.g: [128, 64]
-
head_activation
(str
, default:'relu'
) –Activation function for the dense layers in the head. Currently
'tanh'
,'relu'
,'leaky_relu'
and'gelu'
are supported -
head_dropout
(float
, default:0.1
) –Dropout of the dense layers in the head
-
head_batchnorm
(bool
, default:False
) –Boolean indicating whether or not to include batch normalization in the dense layers that form the
'rnn_mlp'
-
head_batchnorm_last
(bool
, default:False
) –Boolean indicating whether or not to apply batch normalization to the last of the dense layers in the head
-
head_linear_first
(bool
, default:True
) –Boolean indicating whether the order of the operations in the dense layer. If
True: [LIN -> ACT -> BN -> DP]
. IfFalse: [BN -> DP -> LIN -> ACT]
-
enforce_positive
(bool
, default:False
) –Boolean indicating if the output from the final layer must be positive. This is important if you are using loss functions with non-negative input restrictions, e.g. RMSLE, or if you know your predictions are bounded in between 0 and inf
-
enforce_positive_activation
(str
, default:'softplus'
) –Activation function to enforce that the final layer has a positive output.
'softplus'
or'relu'
are supported. -
pred_dim
(int
, default:1
) –Size of the final wide and deep output layer containing the predictions.
1
for regression and binary classification or number of classes for multiclass classification. -
with_fds
(bool
, default:False
) –Boolean indicating if Feature Distribution Smoothing (FDS) will be applied before the final prediction layer. Only available for regression problems. See Delving into Deep Imbalanced Regression for details.
Other Parameters:
-
**fds_config
–Dictionary with the parameters to be used when using Feature Distribution Smoothing. Please, see the docs for the
FDSLayer
.
NOTE: Feature Distribution Smoothing is available when using ONLY adeeptabular
component
NOTE: We consider this feature absolutely experimental and we recommend the user to not use it unless the corresponding publication is well understood
Examples:
>>> from pytorch_widedeep.models import TabResnet, Vision, BasicRNN, Wide, WideDeep
>>> embed_input = [(u, i, j) for u, i, j in zip(["a", "b", "c"][:4], [4] * 3, [8] * 3)]
>>> column_idx = {k: v for v, k in enumerate(["a", "b", "c"])}
>>> wide = Wide(10, 1)
>>> deeptabular = TabResnet(blocks_dims=[8, 4], column_idx=column_idx, cat_embed_input=embed_input)
>>> deeptext = BasicRNN(vocab_size=10, embed_dim=4, padding_idx=0)
>>> deepimage = Vision()
>>> model = WideDeep(wide=wide, deeptabular=deeptabular, deeptext=deeptext, deepimage=deepimage)
NOTE: It is possible to use custom components to
build Wide & Deep models. Simply, build them and pass them as the
corresponding parameters. Note that the custom models MUST return a last
layer of activations(i.e. not the final prediction) so that these
activations are collected by WideDeep
and combined accordingly. In
addition, the models MUST also contain an attribute output_dim
with
the size of these last layers of activations. See for example
pytorch_widedeep.models.tab_mlp.TabMlp
Source code in pytorch_widedeep/models/wide_deep.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
|
FDSLayer ¶
FDSLayer(feature_dim, granularity=100, y_max=None, y_min=None, start_update=0, start_smooth=2, kernel='gaussian', ks=5, sigma=2, momentum=0.9, clip_min=None, clip_max=None)
Bases: Module
Feature Distribution Smoothing layer. Please, see Delving into Deep Imbalanced Regression for details.
NOTE: this is NOT an available model per se,
but more a utility that can be used as we run a WideDeep
model.
The parameters of this extra layers can be set as the class
WideDeep
is instantiated via the keyword arguments fds_config
.
NOTE: Feature Distribution Smoothing is
available when using ONLY a deeptabular
component
NOTE: We consider this feature absolutely experimental and we recommend the user to not use it unless the corresponding publication is well understood
The code here is based on the code at the official repo
Parameters:
-
feature_dim
(int
) –input dimension size, i.e. output size of previous layer. This will be the dimension of the output from the
deeptabular
component -
granularity
(int
, default:100
) –number of bins that the target \(y\) is divided into and that will be used to compute the features' statistics (mean and variance)
-
y_max
(Optional[float]
, default:None
) –\(y\) upper limit to be considered when binning
-
y_min
(Optional[float]
, default:None
) –\(y\) lower limit to be considered when binning
-
start_update
(int
, default:0
) –number of _'waiting epochs' after which the FDS layer will start to update its statistics
-
start_smooth
(int
, default:2
) –number of _'waiting epochs' after which the FDS layer will start smoothing the feature distributions
-
kernel
(Literal[gaussian, triang, laplace]
, default:'gaussian'
) –choice of smoothing kernel
-
ks
(int
, default:5
) –kernel window size
-
sigma
(float
, default:2
) –if a 'gaussian' or 'laplace' kernels are used, this is the corresponding standard deviation
-
momentum
(Optional[float]
, default:0.9
) –to train the layer the authors used a momentum update of the running statistics across each epoch. Set to 0.9 in the paper.
-
clip_min
(Optional[float]
, default:None
) –this parameter is used to clip the ratio between the so called running variance and the smoothed variance, and is introduced for numerical stability. We leave it as optional as we did not find a notable improvement in our experiments. The authors used a value of 0.1
-
clip_max
(Optional[float]
, default:None
) –same as
clip_min
but for the upper limit.We leave it as optional as we did not find a notable improvement in our experiments. The authors used a value of 10.
Source code in pytorch_widedeep/models/fds_layer.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|