Regression with Images and Text¶

In this notebook we will go through a series of examples on how to combine all Wide & Deep components.

To that aim I will use the Airbnb listings dataset for London, which you can download from here. I use this dataset simply because it contains tabular data, images and text.

I have taken a sample of 1000 listings to keep the data tractable in this notebook. Also, I have preprocessed the data and prepared it for this exercise. All preprocessing steps can be found in the notebook airbnb_data_preprocessing.ipynb in this examples folder.

In [1]:

Copied!





import numpy as np
import pandas as pd
import os
import torch
from torchvision.transforms import ToTensor, Normalize

from pytorch_widedeep import Trainer
from pytorch_widedeep.preprocessing import (
    WidePreprocessor,
    TabPreprocessor,
    TextPreprocessor,
    ImagePreprocessor,
)
from pytorch_widedeep.models import (
    Wide,
    TabMlp,
    Vision,
    BasicRNN,
    WideDeep,
)
from pytorch_widedeep.losses import RMSELoss
from pytorch_widedeep.initializers import *
from pytorch_widedeep.callbacks import *
import numpy as np
import pandas as pd
import os
import torch
from torchvision.transforms import ToTensor, Normalize

from pytorch_widedeep import Trainer
from pytorch_widedeep.preprocessing import (
    WidePreprocessor,
    TabPreprocessor,
    TextPreprocessor,
    ImagePreprocessor,
)
from pytorch_widedeep.models import (
    Wide,
    TabMlp,
    Vision,
    BasicRNN,
    WideDeep,
)
from pytorch_widedeep.losses import RMSELoss
from pytorch_widedeep.initializers import *
from pytorch_widedeep.callbacks import *

/Users/javierrodriguezzaurin/.pyenv/versions/3.10.13/envs/widedeep310/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

In [2]:

Copied!

df = pd.read_csv("../tmp_data/airbnb/airbnb_sample.csv")
df.head()
df = pd.read_csv("../tmp_data/airbnb/airbnb_sample.csv")
df.head()

Out[2]:

	id	host_id	description	host_listings_count	host_identity_verified	neighbourhood_cleansed	latitude	longitude	is_location_exact	property_type	...	amenity_wide_entrance	amenity_wifi	security_deposit	extra_people	yield
0	13913.jpg	54730	My bright double bedroom with a large window h...	4.0	f	Islington	51.56802	-0.11121	t	apartment	...	1	1	100.0	15.0	12.00
1	15400.jpg	60302	Lots of windows and light. St Luke's Gardens ...	1.0	t	Kensington and Chelsea	51.48796	-0.16898	t	apartment	...	0	1	150.0	0.0	109.50
2	17402.jpg	67564	Open from June 2018 after a 3-year break, we a...	19.0	t	Westminster	51.52098	-0.14002	t	apartment	...	0	1	350.0	10.0	149.65
3	24328.jpg	41759	Artist house, bright high ceiling rooms, priva...	2.0	t	Wandsworth	51.47298	-0.16376	t	other	...	0	1	250.0	0.0	215.60
4	25023.jpg	102813	Large, all comforts, 2-bed flat; first floor; ...	1.0	f	Wandsworth	51.44687	-0.21874	t	apartment	...	0	1	250.0	11.0	79.35

5 rows × 223 columns

Regression with the defaults¶

The set up

In [3]:

Copied!





# There are a number of columns that are already binary. Therefore, no need to one hot encode them
crossed_cols = [("property_type", "room_type")]
already_dummies = [c for c in df.columns if "amenity" in c] + ["has_house_rules"]
wide_cols = [
    "is_location_exact",
    "property_type",
    "room_type",
    "host_gender",
    "instant_bookable",
] + already_dummies

cat_embed_cols = [(c, 16) for c in df.columns if "catg" in c] + [
    ("neighbourhood_cleansed", 64),
    ("cancellation_policy", 16),
]
continuous_cols = ["latitude", "longitude", "security_deposit", "extra_people"]

# text and image colnames
text_col = "description"
img_col = "id"

# path to pretrained word embeddings and the images
word_vectors_path = "../tmp_data/glove.6B/glove.6B.100d.txt"
img_path = "../tmp_data/airbnb/property_picture"

# target
target_col = "yield"
# There are a number of columns that are already binary. Therefore, no need to one hot encode them
crossed_cols = [("property_type", "room_type")]
already_dummies = [c for c in df.columns if "amenity" in c] + ["has_house_rules"]
wide_cols = [
    "is_location_exact",
    "property_type",
    "room_type",
    "host_gender",
    "instant_bookable",
] + already_dummies

cat_embed_cols = [(c, 16) for c in df.columns if "catg" in c] + [
    ("neighbourhood_cleansed", 64),
    ("cancellation_policy", 16),
]
continuous_cols = ["latitude", "longitude", "security_deposit", "extra_people"]

# text and image colnames
text_col = "description"
img_col = "id"

# path to pretrained word embeddings and the images
word_vectors_path = "../tmp_data/glove.6B/glove.6B.100d.txt"
img_path = "../tmp_data/airbnb/property_picture"

# target
target_col = "yield"

Prepare the data¶

I will focus here on how to prepare the data and run the model. Check notebooks 1 and 2 to see what's going on behind the scences

Preparing the data is rather simple

In [4]:

Copied!

target = df[target_col].values
target = df[target_col].values

In [5]:

Copied!

wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = wide_preprocessor.fit_transform(df)
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = wide_preprocessor.fit_transform(df)

In [6]:

Copied!





tab_preprocessor = TabPreprocessor(
    cat_embed_cols=cat_embed_cols,
    continuous_cols=continuous_cols,
)
X_tab = tab_preprocessor.fit_transform(df)
tab_preprocessor = TabPreprocessor(
    cat_embed_cols=cat_embed_cols,
    continuous_cols=continuous_cols,
)
X_tab = tab_preprocessor.fit_transform(df)

/Users/javierrodriguezzaurin/Projects/pytorch-widedeep/pytorch_widedeep/preprocessing/tab_preprocessor.py:358: UserWarning: Continuous columns will not be normalised
  warnings.warn("Continuous columns will not be normalised")

In [7]:

Copied!





text_preprocessor = TextPreprocessor(
    word_vectors_path=word_vectors_path, text_col=text_col
)
X_text = text_preprocessor.fit_transform(df)
text_preprocessor = TextPreprocessor(
    word_vectors_path=word_vectors_path, text_col=text_col
)
X_text = text_preprocessor.fit_transform(df)

The vocabulary contains 2192 tokens
Indexing word vectors...
Loaded 400000 word vectors
Preparing embeddings matrix...
2175 words in the vocabulary had ../tmp_data/glove.6B/glove.6B.100d.txt vectors and appear more than 5 times

In [8]:

Copied!

image_processor = ImagePreprocessor(img_col=img_col, img_path=img_path)
X_images = image_processor.fit_transform(df)
image_processor = ImagePreprocessor(img_col=img_col, img_path=img_path)
X_images = image_processor.fit_transform(df)

Reading Images from ../tmp_data/airbnb/property_picture
Resizing

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1001/1001 [00:01<00:00, 638.00it/s]

Computing normalisation metrics

Build the model components¶

In [9]:

Copied!





# Linear model
wide = Wide(input_dim=np.unique(X_wide).shape[0], pred_dim=1)

# DeepDense: 2 Dense layers
tab_mlp = TabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    continuous_cols=continuous_cols,
    mlp_hidden_dims=[128, 64],
    mlp_dropout=0.1,
)

# DeepText: a stack of 2 LSTMs
basic_rnn = BasicRNN(
    vocab_size=len(text_preprocessor.vocab.itos),
    embed_matrix=text_preprocessor.embedding_matrix,
    n_layers=2,
    hidden_dim=64,
    rnn_dropout=0.5,
)

# Pretrained Resnet 18
resnet = Vision(pretrained_model_setup="resnet18", n_trainable=4)
# Linear model
wide = Wide(input_dim=np.unique(X_wide).shape[0], pred_dim=1)

# DeepDense: 2 Dense layers
tab_mlp = TabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    continuous_cols=continuous_cols,
    mlp_hidden_dims=[128, 64],
    mlp_dropout=0.1,
)

# DeepText: a stack of 2 LSTMs
basic_rnn = BasicRNN(
    vocab_size=len(text_preprocessor.vocab.itos),
    embed_matrix=text_preprocessor.embedding_matrix,
    n_layers=2,
    hidden_dim=64,
    rnn_dropout=0.5,
)

# Pretrained Resnet 18
resnet = Vision(pretrained_model_setup="resnet18", n_trainable=4)

Combine them all with the "collector" class WideDeep

In [10]:

Copied!





model = WideDeep(
    wide=wide,
    deeptabular=tab_mlp,
    deeptext=basic_rnn,
    deepimage=resnet,
    head_hidden_dims=[256, 128],
)
model = WideDeep(
    wide=wide,
    deeptabular=tab_mlp,
    deeptext=basic_rnn,
    deepimage=resnet,
    head_hidden_dims=[256, 128],
)

Build the trainer and fit¶

In [11]:

Copied!

trainer = Trainer(model, objective="rmse")
trainer = Trainer(model, objective="rmse")

In [12]:

Copied!





trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    X_text=X_text,
    X_img=X_images,
    target=target,
    n_epochs=1,
    batch_size=32,
    val_split=0.2,
)
trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    X_text=X_text,
    X_img=X_images,
    target=target,
    n_epochs=1,
    batch_size=32,
    val_split=0.2,
)

epoch 1: 100%|███████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:19<00:00,  1.28it/s, loss=115]
valid: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.62it/s, loss=94.1]

Both, the Text and Image components allow FC-heads on their own (have a look to the documentation).

Now let's go "kaggle crazy". Let's use different optimizers, initializers and schedulers for different components. Moreover, let's use a different learning rate for different parameter groups, for the deeptabular component

In [13]:

Copied!





deep_params = []
for childname, child in model.named_children():
    if childname == "deeptabular":
        for n, p in child.named_parameters():
            if "embed_layer" in n:
                deep_params.append({"params": p, "lr": 1e-4})
            else:
                deep_params.append({"params": p, "lr": 1e-3})
deep_params = []
for childname, child in model.named_children():
    if childname == "deeptabular":
        for n, p in child.named_parameters():
            if "embed_layer" in n:
                deep_params.append({"params": p, "lr": 1e-4})
            else:
                deep_params.append({"params": p, "lr": 1e-3})

In [14]:

Copied!





wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.03)
deep_opt = torch.optim.Adam(deep_params)
text_opt = torch.optim.AdamW(model.deeptext.parameters())
img_opt = torch.optim.AdamW(model.deepimage.parameters())
head_opt = torch.optim.Adam(model.deephead.parameters())
wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.03)
deep_opt = torch.optim.Adam(deep_params)
text_opt = torch.optim.AdamW(model.deeptext.parameters())
img_opt = torch.optim.AdamW(model.deepimage.parameters())
head_opt = torch.optim.Adam(model.deephead.parameters())

In [15]:

Copied!





wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=5)
deep_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
text_sch = torch.optim.lr_scheduler.StepLR(text_opt, step_size=5)
img_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
head_sch = torch.optim.lr_scheduler.StepLR(head_opt, step_size=5)
wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=5)
deep_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
text_sch = torch.optim.lr_scheduler.StepLR(text_opt, step_size=5)
img_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
head_sch = torch.optim.lr_scheduler.StepLR(head_opt, step_size=5)

In [16]:

Copied!





# remember, one optimizer per model components, for lr_schedures and initializers is not neccesary
optimizers = {
    "wide": wide_opt,
    "deeptabular": deep_opt,
    "deeptext": text_opt,
    "deepimage": img_opt,
    "deephead": head_opt,
}
schedulers = {
    "wide": wide_sch,
    "deeptabular": deep_sch,
    "deeptext": text_sch,
    "deepimage": img_sch,
    "deephead": head_sch,
}

# Now...we have used pretrained word embeddings, so you do not want to
# initialise these  embeddings. However you might still want to initialise the
# other layers in the DeepText component. No probs, you can do that with the
# parameter pattern and your knowledge on regular  expressions. Here we are
# telling to the KaimingNormal initializer to NOT touch the  parameters whose
# name contains the string word_embed.
initializers = {
    "wide": KaimingNormal,
    "deeptabular": KaimingNormal,
    "deeptext": KaimingNormal(pattern=r"^(?!.*word_embed).*$"),
    "deepimage": KaimingNormal,
}

mean = [0.406, 0.456, 0.485]  # BGR
std = [0.225, 0.224, 0.229]  # BGR
transforms = [ToTensor, Normalize(mean=mean, std=std)]
callbacks = [
    LRHistory(n_epochs=10),
    EarlyStopping,
    ModelCheckpoint(filepath="model_weights/wd_out"),
]
# remember, one optimizer per model components, for lr_schedures and initializers is not neccesary
optimizers = {
    "wide": wide_opt,
    "deeptabular": deep_opt,
    "deeptext": text_opt,
    "deepimage": img_opt,
    "deephead": head_opt,
}
schedulers = {
    "wide": wide_sch,
    "deeptabular": deep_sch,
    "deeptext": text_sch,
    "deepimage": img_sch,
    "deephead": head_sch,
}

# Now...we have used pretrained word embeddings, so you do not want to
# initialise these  embeddings. However you might still want to initialise the
# other layers in the DeepText component. No probs, you can do that with the
# parameter pattern and your knowledge on regular  expressions. Here we are
# telling to the KaimingNormal initializer to NOT touch the  parameters whose
# name contains the string word_embed.
initializers = {
    "wide": KaimingNormal,
    "deeptabular": KaimingNormal,
    "deeptext": KaimingNormal(pattern=r"^(?!.*word_embed).*$"),
    "deepimage": KaimingNormal,
}

mean = [0.406, 0.456, 0.485]  # BGR
std = [0.225, 0.224, 0.229]  # BGR
transforms = [ToTensor, Normalize(mean=mean, std=std)]
callbacks = [
    LRHistory(n_epochs=10),
    EarlyStopping,
    ModelCheckpoint(filepath="model_weights/wd_out"),
]

In [17]:

Copied!





trainer = Trainer(
    model,
    objective="rmse",
    initializers=initializers,
    optimizers=optimizers,
    lr_schedulers=schedulers,
    callbacks=callbacks,
    transforms=transforms,
)
trainer = Trainer(
    model,
    objective="rmse",
    initializers=initializers,
    optimizers=optimizers,
    lr_schedulers=schedulers,
    callbacks=callbacks,
    transforms=transforms,
)

/Users/javierrodriguezzaurin/Projects/pytorch-widedeep/pytorch_widedeep/initializers.py:34: UserWarning: No initializer found for deephead
  warnings.warn(

In [18]:

Copied!





trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    X_text=X_text,
    X_img=X_images,
    target=target,
    n_epochs=1,
    batch_size=32,
    val_split=0.2,
)
trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    X_text=X_text,
    X_img=X_images,
    target=target,
    n_epochs=1,
    batch_size=32,
    val_split=0.2,
)

epoch 1: 100%|███████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:19<00:00,  1.25it/s, loss=101]
valid: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.62it/s, loss=90.6]

Model weights after training corresponds to the those of the final epoch which might not be the best performing weights. Use the 'ModelCheckpoint' Callback to restore the best epoch weights.

we have only run one epoch, but let's check that the LRHistory callback records the lr values for each group

In [19]:

Copied!

trainer.lr_history
trainer.lr_history

Out[19]:

{'lr_wide_0': [0.03, 0.03],
 'lr_deeptabular_0': [0.0001, 0.0001],
 'lr_deeptabular_1': [0.0001, 0.0001],
 'lr_deeptabular_2': [0.0001, 0.0001],
 'lr_deeptabular_3': [0.0001, 0.0001],
 'lr_deeptabular_4': [0.0001, 0.0001],
 'lr_deeptabular_5': [0.0001, 0.0001],
 'lr_deeptabular_6': [0.0001, 0.0001],
 'lr_deeptabular_7': [0.0001, 0.0001],
 'lr_deeptabular_8': [0.0001, 0.0001],
 'lr_deeptabular_9': [0.001, 0.001],
 'lr_deeptabular_10': [0.001, 0.001],
 'lr_deeptabular_11': [0.001, 0.001],
 'lr_deeptabular_12': [0.001, 0.001],
 'lr_deeptext_0': [0.001, 0.001],
 'lr_deepimage_0': [0.001, 0.001],
 'lr_deephead_0': [0.001, 0.001]}