The Bayesian Models¶
Perhaps one of the most interesting functionality in the library is the access to full Bayesian models in almost exactly the same way one would use any of the other models in the library.
Note however that the Bayesian models are ONLY available for tabular data and, at the moment, we do not support combining them to form a Wide and Deep model.
The implementation in this library is based on the publication: Weight Uncertainty in Neural Networks, by Blundell et al., 2015. Code-wise, our implementation is inspired by a number of source:
- https://joshfeldman.net/WeightUncertainty/
- https://www.nitarshan.com/bayes-by-backprop/
- https://github.com/piEsposito/blitz-bayesian-deep-learning
- https://github.com/zackchase/mxnet-the-straight-dope/tree/master/chapter18_variational-methods-and-uncertainty
The two Bayesian models available in the library are:
- BayesianWide: this is a linear model where the non-linearities are captured via crossed-columns
- BayesianMLP: this is a standard MLP that receives categorical embeddings and continuous cols (embedded or not) which are the passed through a series of dense layers. All parameters in the model are probabilistic.
import numpy as np
import torch
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from pytorch_widedeep.metrics import Accuracy
from pytorch_widedeep.datasets import load_adult
from pytorch_widedeep.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor
from pytorch_widedeep.bayesian_models import BayesianWide, BayesianTabMlp
from pytorch_widedeep.training.bayesian_trainer import BayesianTrainer
The first few things to do we know them very well, like with any other model described in any of the other notebooks
df = load_adult(as_frame=True)
df.columns = [c.replace("-", "_") for c in df.columns]
df["age_buckets"] = pd.cut(
df.age, bins=[16, 25, 30, 35, 40, 45, 50, 55, 60, 91], labels=np.arange(9)
)
df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop("income", axis=1, inplace=True)
df.head()
age | workclass | fnlwgt | education | educational_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | age_buckets | income_label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 25 | Private | 226802 | 11th | 7 | Never-married | Machine-op-inspct | Own-child | Black | Male | 0 | 0 | 40 | United-States | 0 | 0 |
1 | 38 | Private | 89814 | HS-grad | 9 | Married-civ-spouse | Farming-fishing | Husband | White | Male | 0 | 0 | 50 | United-States | 3 | 0 |
2 | 28 | Local-gov | 336951 | Assoc-acdm | 12 | Married-civ-spouse | Protective-serv | Husband | White | Male | 0 | 0 | 40 | United-States | 1 | 1 |
3 | 44 | Private | 160323 | Some-college | 10 | Married-civ-spouse | Machine-op-inspct | Husband | Black | Male | 7688 | 0 | 40 | United-States | 4 | 1 |
4 | 18 | ? | 103497 | Some-college | 10 | Never-married | ? | Own-child | White | Female | 0 | 0 | 30 | United-States | 0 | 0 |
train, test = train_test_split(df, test_size=0.2, stratify=df.income_label)
wide_cols = [
"age_buckets",
"education",
"relationship",
"workclass",
"occupation",
"native_country",
"gender",
]
crossed_cols = [("education", "occupation"), ("native_country", "occupation")]
cat_embed_cols = [
"workclass",
"education",
"marital_status",
"occupation",
"relationship",
"race",
"gender",
"capital_gain",
"capital_loss",
"native_country",
]
continuous_cols = ["age", "hours_per_week"]
target = train["income_label"].values
1. BayesianWide
¶
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_tab = wide_preprocessor.fit_transform(train)
model = BayesianWide(
input_dim=np.unique(X_tab).shape[0],
prior_sigma_1=1.0,
prior_sigma_2=0.002,
prior_pi=0.8,
posterior_mu_init=0,
posterior_rho_init=-7.0,
pred_dim=1, # here the models are NOT passed to a WideDeep constructor class so the output dim MUST be specified
)
trainer = BayesianTrainer(
model,
objective="binary",
optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
metrics=[Accuracy],
)
trainer.fit(
X_tab=X_tab,
target=target,
val_split=0.2,
n_epochs=2,
batch_size=256,
)
epoch 1: 100%|███████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 124.32it/s, loss=163, metrics={'acc': 0.7813}] valid: 100%|███████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 238.67it/s, loss=141, metrics={'acc': 0.8219}] epoch 2: 100%|███████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 132.81it/s, loss=140, metrics={'acc': 0.8285}] valid: 100%|███████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 190.16it/s, loss=140, metrics={'acc': 0.8298}]
2. BayesianTabMlp
¶
tab_preprocessor = TabPreprocessor(
cat_embed_cols=cat_embed_cols, continuous_cols=continuous_cols
)
X_tab = tab_preprocessor.fit_transform(train)
/Users/javierrodriguezzaurin/Projects/pytorch-widedeep/pytorch_widedeep/preprocessing/tab_preprocessor.py:358: UserWarning: Continuous columns will not be normalised warnings.warn("Continuous columns will not be normalised")
model = BayesianTabMlp(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
continuous_cols=continuous_cols,
# embed_continuous_method = "standard",
# cont_embed_activation="leaky_relu",
# cont_embed_dim = 8,
mlp_hidden_dims=[128, 64],
prior_sigma_1=1.0,
prior_sigma_2=0.002,
prior_pi=0.8,
posterior_mu_init=0,
posterior_rho_init=-7.0,
pred_dim=1,
)
trainer = BayesianTrainer(
model,
objective="binary",
optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
metrics=[Accuracy],
)
trainer.fit(
X_tab=X_tab,
target=target,
val_split=0.2,
n_epochs=2,
batch_size=256,
)
epoch 1: 100%|███████████████████████████████████████████████████████████| 123/123 [00:04<00:00, 28.74it/s, loss=2e+3, metrics={'acc': 0.8007}] valid: 100%|███████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 136.89it/s, loss=1.75e+3, metrics={'acc': 0.8418}] epoch 2: 100%|████████████████████████████████████████████████████████| 123/123 [00:04<00:00, 29.41it/s, loss=1.73e+3, metrics={'acc': 0.8596}] valid: 100%|███████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 143.87it/s, loss=1.71e+3, metrics={'acc': 0.8569}]
These models are powerful beyond the success metrics because they give us a sense of uncertainty as we predict. Let's have a look
X_tab_test = tab_preprocessor.transform(test)
preds = trainer.predict(X_tab_test, return_samples=True, n_samples=5)
predict: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 33.92it/s]
preds.shape
(5, 9769)
as we can see the prediction have shape (5, 9769)
, one set of predictions each time we have internally run predict (i.e. sample the network and predict, defined by the parameter n_samples
). This gives us an idea of how certain the model is about a certain prediction.
Similarly, we could obtain the probabilities
probs = trainer.predict_proba(X_tab_test, return_samples=True, n_samples=5)
predict: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 32.79it/s]
probs.shape
(5, 9769, 2)
And we could see how the model performs each time we sampled the network
for p in preds:
print(accuracy_score(p, test["income_label"].values))
0.8559729757395844 0.8564847988535162 0.8567918927218753 0.8562800696079435 0.8558706111167981