The Bayesian Models¶

Perhaps one of the most interesting functionality in the library is the access to full Bayesian models in almost exactly the same way one would use any of the other models in the library.

Note however that the Bayesian models are ONLY available for tabular data and, at the moment, we do not support combining them to form a Wide and Deep model.

The implementation in this library is based on the publication: Weight Uncertainty in Neural Networks, by Blundell et al., 2015. Code-wise, our implementation is inspired by a number of source:

The two Bayesian models available in the library are:

BayesianWide: this is a linear model where the non-linearities are captured via crossed-columns
BayesianMLP: this is a standard MLP that receives categorical embeddings and continuous cols (embedded or not) which are the passed through a series of dense layers. All parameters in the model are probabilistic.

In [1]:

Copied!





import numpy as np
import torch
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from pytorch_widedeep.metrics import Accuracy
from pytorch_widedeep.datasets import load_adult
from pytorch_widedeep.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor
from pytorch_widedeep.bayesian_models import BayesianWide, BayesianTabMlp
from pytorch_widedeep.training.bayesian_trainer import BayesianTrainer
import numpy as np
import torch
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from pytorch_widedeep.metrics import Accuracy
from pytorch_widedeep.datasets import load_adult
from pytorch_widedeep.callbacks import EarlyStopping, ModelCheckpoint
from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor
from pytorch_widedeep.bayesian_models import BayesianWide, BayesianTabMlp
from pytorch_widedeep.training.bayesian_trainer import BayesianTrainer

The first few things to do we know them very well, like with any other model described in any of the other notebooks

In [2]:

Copied!





df = load_adult(as_frame=True)
df.columns = [c.replace("-", "_") for c in df.columns]
df["age_buckets"] = pd.cut(
    df.age, bins=[16, 25, 30, 35, 40, 45, 50, 55, 60, 91], labels=np.arange(9)
)
df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop("income", axis=1, inplace=True)
df.head()
df = load_adult(as_frame=True)
df.columns = [c.replace("-", "_") for c in df.columns]
df["age_buckets"] = pd.cut(
    df.age, bins=[16, 25, 30, 35, 40, 45, 50, 55, 60, 91], labels=np.arange(9)
)
df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop("income", axis=1, inplace=True)
df.head()

Out[2]:

	age	workclass	fnlwgt	education	educational_num	marital_status	occupation	relationship	race	gender	capital_gain	hours_per_week	native_country	age_buckets	income_label
0	25	Private	226802	11th	7	Never-married	Machine-op-inspct	Own-child	Black	Male	0	40	United-States	0	0
1	38	Private	89814	HS-grad	9	Married-civ-spouse	Farming-fishing	Husband	White	Male	0	50	United-States	3	0
2	28	Local-gov	336951	Assoc-acdm	12	Married-civ-spouse	Protective-serv	Husband	White	Male	0	40	United-States	1	1
3	44	Private	160323	Some-college	10	Married-civ-spouse	Machine-op-inspct	Husband	Black	Male	7688	40	United-States	4	1
4	18	?	103497	Some-college	10	Never-married	?	Own-child	White	Female	0	30	United-States	0	0

In [3]:

Copied!

train, test = train_test_split(df, test_size=0.2, stratify=df.income_label)
train, test = train_test_split(df, test_size=0.2, stratify=df.income_label)

In [4]:

Copied!





wide_cols = [
    "age_buckets",
    "education",
    "relationship",
    "workclass",
    "occupation",
    "native_country",
    "gender",
]
crossed_cols = [("education", "occupation"), ("native_country", "occupation")]

cat_embed_cols = [
    "workclass",
    "education",
    "marital_status",
    "occupation",
    "relationship",
    "race",
    "gender",
    "capital_gain",
    "capital_loss",
    "native_country",
]
continuous_cols = ["age", "hours_per_week"]

target = train["income_label"].values
wide_cols = [
    "age_buckets",
    "education",
    "relationship",
    "workclass",
    "occupation",
    "native_country",
    "gender",
]
crossed_cols = [("education", "occupation"), ("native_country", "occupation")]

cat_embed_cols = [
    "workclass",
    "education",
    "marital_status",
    "occupation",
    "relationship",
    "race",
    "gender",
    "capital_gain",
    "capital_loss",
    "native_country",
]
continuous_cols = ["age", "hours_per_week"]

target = train["income_label"].values

1. `BayesianWide`¶

In [5]:

Copied!

wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_tab = wide_preprocessor.fit_transform(train)
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_tab = wide_preprocessor.fit_transform(train)

In [6]:

Copied!





model = BayesianWide(
    input_dim=np.unique(X_tab).shape[0],
    prior_sigma_1=1.0,
    prior_sigma_2=0.002,
    prior_pi=0.8,
    posterior_mu_init=0,
    posterior_rho_init=-7.0,
    pred_dim=1,  # here the models are NOT passed to a WideDeep constructor class so the output dim MUST be specified
)
model = BayesianWide(
    input_dim=np.unique(X_tab).shape[0],
    prior_sigma_1=1.0,
    prior_sigma_2=0.002,
    prior_pi=0.8,
    posterior_mu_init=0,
    posterior_rho_init=-7.0,
    pred_dim=1,  # here the models are NOT passed to a WideDeep constructor class so the output dim MUST be specified
)

In [7]:

Copied!





trainer = BayesianTrainer(
    model,
    objective="binary",
    optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
    metrics=[Accuracy],
)
trainer = BayesianTrainer(
    model,
    objective="binary",
    optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
    metrics=[Accuracy],
)

In [8]:

Copied!





trainer.fit(
    X_tab=X_tab,
    target=target,
    val_split=0.2,
    n_epochs=2,
    batch_size=256,
)
trainer.fit(
    X_tab=X_tab,
    target=target,
    val_split=0.2,
    n_epochs=2,
    batch_size=256,
)

epoch 1: 100%|███████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 124.32it/s, loss=163, metrics={'acc': 0.7813}]
valid: 100%|███████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 238.67it/s, loss=141, metrics={'acc': 0.8219}]
epoch 2: 100%|███████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 132.81it/s, loss=140, metrics={'acc': 0.8285}]
valid: 100%|███████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 190.16it/s, loss=140, metrics={'acc': 0.8298}]

2. `BayesianTabMlp`¶

In [9]:

Copied!





tab_preprocessor = TabPreprocessor(
    cat_embed_cols=cat_embed_cols, continuous_cols=continuous_cols
)
X_tab = tab_preprocessor.fit_transform(train)
tab_preprocessor = TabPreprocessor(
    cat_embed_cols=cat_embed_cols, continuous_cols=continuous_cols
)
X_tab = tab_preprocessor.fit_transform(train)

/Users/javierrodriguezzaurin/Projects/pytorch-widedeep/pytorch_widedeep/preprocessing/tab_preprocessor.py:358: UserWarning: Continuous columns will not be normalised
  warnings.warn("Continuous columns will not be normalised")

In [10]:

Copied!





model = BayesianTabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    continuous_cols=continuous_cols,
    #     embed_continuous_method = "standard",
    #     cont_embed_activation="leaky_relu",
    #     cont_embed_dim = 8,
    mlp_hidden_dims=[128, 64],
    prior_sigma_1=1.0,
    prior_sigma_2=0.002,
    prior_pi=0.8,
    posterior_mu_init=0,
    posterior_rho_init=-7.0,
    pred_dim=1,
)
model = BayesianTabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    continuous_cols=continuous_cols,
    #     embed_continuous_method = "standard",
    #     cont_embed_activation="leaky_relu",
    #     cont_embed_dim = 8,
    mlp_hidden_dims=[128, 64],
    prior_sigma_1=1.0,
    prior_sigma_2=0.002,
    prior_pi=0.8,
    posterior_mu_init=0,
    posterior_rho_init=-7.0,
    pred_dim=1,
)

In [11]:

Copied!





trainer = BayesianTrainer(
    model,
    objective="binary",
    optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
    metrics=[Accuracy],
)
trainer = BayesianTrainer(
    model,
    objective="binary",
    optimizer=torch.optim.Adam(model.parameters(), lr=0.01),
    metrics=[Accuracy],
)

In [12]:

Copied!





trainer.fit(
    X_tab=X_tab,
    target=target,
    val_split=0.2,
    n_epochs=2,
    batch_size=256,
)
trainer.fit(
    X_tab=X_tab,
    target=target,
    val_split=0.2,
    n_epochs=2,
    batch_size=256,
)

epoch 1: 100%|███████████████████████████████████████████████████████████| 123/123 [00:04<00:00, 28.74it/s, loss=2e+3, metrics={'acc': 0.8007}]
valid: 100%|███████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 136.89it/s, loss=1.75e+3, metrics={'acc': 0.8418}]
epoch 2: 100%|████████████████████████████████████████████████████████| 123/123 [00:04<00:00, 29.41it/s, loss=1.73e+3, metrics={'acc': 0.8596}]
valid: 100%|███████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 143.87it/s, loss=1.71e+3, metrics={'acc': 0.8569}]

These models are powerful beyond the success metrics because they give us a sense of uncertainty as we predict. Let's have a look

In [13]:

Copied!

X_tab_test = tab_preprocessor.transform(test)
X_tab_test = tab_preprocessor.transform(test)

In [14]:

Copied!

preds = trainer.predict(X_tab_test, return_samples=True, n_samples=5)
preds = trainer.predict(X_tab_test, return_samples=True, n_samples=5)

predict: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 33.92it/s]

In [15]:

Copied!

preds.shape
preds.shape

Out[15]:

(5, 9769)

as we can see the prediction have shape (5, 9769), one set of predictions each time we have internally run predict (i.e. sample the network and predict, defined by the parameter n_samples). This gives us an idea of how certain the model is about a certain prediction.

Similarly, we could obtain the probabilities

In [16]:

Copied!

probs = trainer.predict_proba(X_tab_test, return_samples=True, n_samples=5)
probs = trainer.predict_proba(X_tab_test, return_samples=True, n_samples=5)

predict: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 32.79it/s]

In [17]:

Copied!

probs.shape
probs.shape

Out[17]:

(5, 9769, 2)

And we could see how the model performs each time we sampled the network

In [18]:

Copied!

for p in preds:
    print(accuracy_score(p, test["income_label"].values))
for p in preds:
    print(accuracy_score(p, test["income_label"].values))

0.8559729757395844
0.8564847988535162
0.8567918927218753
0.8562800696079435
0.8558706111167981

In [ ]:

The Bayesian Models¶

1. BayesianWide¶

2. BayesianTabMlp¶

1. `BayesianWide`¶

2. `BayesianTabMlp`¶