Regression with Images and Text¶
In this notebook we will go through a series of examples on how to combine all Wide & Deep components.
To that aim I will use the Airbnb listings dataset for London, which you can download from here. I use this dataset simply because it contains tabular data, images and text.
I have taken a sample of 1000 listings to keep the data tractable in this notebook. Also, I have preprocessed the data and prepared it for this exercise. All preprocessing steps can be found in the notebook airbnb_data_preprocessing.ipynb
in this examples
folder.
import numpy as np
import pandas as pd
import os
import torch
from torchvision.transforms import ToTensor, Normalize
from pytorch_widedeep import Trainer
from pytorch_widedeep.preprocessing import (
WidePreprocessor,
TabPreprocessor,
TextPreprocessor,
ImagePreprocessor,
)
from pytorch_widedeep.models import (
Wide,
TabMlp,
Vision,
BasicRNN,
WideDeep,
)
from pytorch_widedeep.losses import RMSELoss
from pytorch_widedeep.initializers import *
from pytorch_widedeep.callbacks import *
/Users/javierrodriguezzaurin/.pyenv/versions/3.10.13/envs/widedeep310/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
df = pd.read_csv("../tmp_data/airbnb/airbnb_sample.csv")
df.head()
id | host_id | description | host_listings_count | host_identity_verified | neighbourhood_cleansed | latitude | longitude | is_location_exact | property_type | ... | amenity_wide_entrance | amenity_wide_entrance_for_guests | amenity_wide_entryway | amenity_wide_hallways | amenity_wifi | amenity_window_guards | amenity_wine_cooler | security_deposit | extra_people | yield | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 13913.jpg | 54730 | My bright double bedroom with a large window h... | 4.0 | f | Islington | 51.56802 | -0.11121 | t | apartment | ... | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 100.0 | 15.0 | 12.00 |
1 | 15400.jpg | 60302 | Lots of windows and light. St Luke's Gardens ... | 1.0 | t | Kensington and Chelsea | 51.48796 | -0.16898 | t | apartment | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 150.0 | 0.0 | 109.50 |
2 | 17402.jpg | 67564 | Open from June 2018 after a 3-year break, we a... | 19.0 | t | Westminster | 51.52098 | -0.14002 | t | apartment | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 350.0 | 10.0 | 149.65 |
3 | 24328.jpg | 41759 | Artist house, bright high ceiling rooms, priva... | 2.0 | t | Wandsworth | 51.47298 | -0.16376 | t | other | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 250.0 | 0.0 | 215.60 |
4 | 25023.jpg | 102813 | Large, all comforts, 2-bed flat; first floor; ... | 1.0 | f | Wandsworth | 51.44687 | -0.21874 | t | apartment | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 250.0 | 11.0 | 79.35 |
5 rows × 223 columns
Regression with the defaults¶
The set up
# There are a number of columns that are already binary. Therefore, no need to one hot encode them
crossed_cols = [("property_type", "room_type")]
already_dummies = [c for c in df.columns if "amenity" in c] + ["has_house_rules"]
wide_cols = [
"is_location_exact",
"property_type",
"room_type",
"host_gender",
"instant_bookable",
] + already_dummies
cat_embed_cols = [(c, 16) for c in df.columns if "catg" in c] + [
("neighbourhood_cleansed", 64),
("cancellation_policy", 16),
]
continuous_cols = ["latitude", "longitude", "security_deposit", "extra_people"]
# text and image colnames
text_col = "description"
img_col = "id"
# path to pretrained word embeddings and the images
word_vectors_path = "../tmp_data/glove.6B/glove.6B.100d.txt"
img_path = "../tmp_data/airbnb/property_picture"
# target
target_col = "yield"
Prepare the data¶
I will focus here on how to prepare the data and run the model. Check notebooks 1 and 2 to see what's going on behind the scences
Preparing the data is rather simple
target = df[target_col].values
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = wide_preprocessor.fit_transform(df)
tab_preprocessor = TabPreprocessor(
cat_embed_cols=cat_embed_cols,
continuous_cols=continuous_cols,
)
X_tab = tab_preprocessor.fit_transform(df)
/Users/javierrodriguezzaurin/Projects/pytorch-widedeep/pytorch_widedeep/preprocessing/tab_preprocessor.py:358: UserWarning: Continuous columns will not be normalised warnings.warn("Continuous columns will not be normalised")
text_preprocessor = TextPreprocessor(
word_vectors_path=word_vectors_path, text_col=text_col
)
X_text = text_preprocessor.fit_transform(df)
The vocabulary contains 2192 tokens Indexing word vectors... Loaded 400000 word vectors Preparing embeddings matrix... 2175 words in the vocabulary had ../tmp_data/glove.6B/glove.6B.100d.txt vectors and appear more than 5 times
image_processor = ImagePreprocessor(img_col=img_col, img_path=img_path)
X_images = image_processor.fit_transform(df)
Reading Images from ../tmp_data/airbnb/property_picture Resizing
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1001/1001 [00:01<00:00, 638.00it/s]
Computing normalisation metrics
Build the model components¶
# Linear model
wide = Wide(input_dim=np.unique(X_wide).shape[0], pred_dim=1)
# DeepDense: 2 Dense layers
tab_mlp = TabMlp(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
continuous_cols=continuous_cols,
mlp_hidden_dims=[128, 64],
mlp_dropout=0.1,
)
# DeepText: a stack of 2 LSTMs
basic_rnn = BasicRNN(
vocab_size=len(text_preprocessor.vocab.itos),
embed_matrix=text_preprocessor.embedding_matrix,
n_layers=2,
hidden_dim=64,
rnn_dropout=0.5,
)
# Pretrained Resnet 18
resnet = Vision(pretrained_model_setup="resnet18", n_trainable=4)
Combine them all with the "collector" class WideDeep
model = WideDeep(
wide=wide,
deeptabular=tab_mlp,
deeptext=basic_rnn,
deepimage=resnet,
head_hidden_dims=[256, 128],
)
Build the trainer and fit¶
trainer = Trainer(model, objective="rmse")
trainer.fit(
X_wide=X_wide,
X_tab=X_tab,
X_text=X_text,
X_img=X_images,
target=target,
n_epochs=1,
batch_size=32,
val_split=0.2,
)
epoch 1: 100%|███████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:19<00:00, 1.28it/s, loss=115] valid: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.62it/s, loss=94.1]
Both, the Text and Image components allow FC-heads on their own (have a look to the documentation).
Now let's go "kaggle crazy". Let's use different optimizers, initializers and schedulers for different components. Moreover, let's use a different learning rate for different parameter groups, for the deeptabular
component
deep_params = []
for childname, child in model.named_children():
if childname == "deeptabular":
for n, p in child.named_parameters():
if "embed_layer" in n:
deep_params.append({"params": p, "lr": 1e-4})
else:
deep_params.append({"params": p, "lr": 1e-3})
wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.03)
deep_opt = torch.optim.Adam(deep_params)
text_opt = torch.optim.AdamW(model.deeptext.parameters())
img_opt = torch.optim.AdamW(model.deepimage.parameters())
head_opt = torch.optim.Adam(model.deephead.parameters())
wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=5)
deep_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
text_sch = torch.optim.lr_scheduler.StepLR(text_opt, step_size=5)
img_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
head_sch = torch.optim.lr_scheduler.StepLR(head_opt, step_size=5)
# remember, one optimizer per model components, for lr_schedures and initializers is not neccesary
optimizers = {
"wide": wide_opt,
"deeptabular": deep_opt,
"deeptext": text_opt,
"deepimage": img_opt,
"deephead": head_opt,
}
schedulers = {
"wide": wide_sch,
"deeptabular": deep_sch,
"deeptext": text_sch,
"deepimage": img_sch,
"deephead": head_sch,
}
# Now...we have used pretrained word embeddings, so you do not want to
# initialise these embeddings. However you might still want to initialise the
# other layers in the DeepText component. No probs, you can do that with the
# parameter pattern and your knowledge on regular expressions. Here we are
# telling to the KaimingNormal initializer to NOT touch the parameters whose
# name contains the string word_embed.
initializers = {
"wide": KaimingNormal,
"deeptabular": KaimingNormal,
"deeptext": KaimingNormal(pattern=r"^(?!.*word_embed).*$"),
"deepimage": KaimingNormal,
}
mean = [0.406, 0.456, 0.485] # BGR
std = [0.225, 0.224, 0.229] # BGR
transforms = [ToTensor, Normalize(mean=mean, std=std)]
callbacks = [
LRHistory(n_epochs=10),
EarlyStopping,
ModelCheckpoint(filepath="model_weights/wd_out"),
]
trainer = Trainer(
model,
objective="rmse",
initializers=initializers,
optimizers=optimizers,
lr_schedulers=schedulers,
callbacks=callbacks,
transforms=transforms,
)
/Users/javierrodriguezzaurin/Projects/pytorch-widedeep/pytorch_widedeep/initializers.py:34: UserWarning: No initializer found for deephead warnings.warn(
trainer.fit(
X_wide=X_wide,
X_tab=X_tab,
X_text=X_text,
X_img=X_images,
target=target,
n_epochs=1,
batch_size=32,
val_split=0.2,
)
epoch 1: 100%|███████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:19<00:00, 1.25it/s, loss=101] valid: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.62it/s, loss=90.6]
Model weights after training corresponds to the those of the final epoch which might not be the best performing weights. Use the 'ModelCheckpoint' Callback to restore the best epoch weights.
we have only run one epoch, but let's check that the LRHistory callback records the lr values for each group
trainer.lr_history
{'lr_wide_0': [0.03, 0.03], 'lr_deeptabular_0': [0.0001, 0.0001], 'lr_deeptabular_1': [0.0001, 0.0001], 'lr_deeptabular_2': [0.0001, 0.0001], 'lr_deeptabular_3': [0.0001, 0.0001], 'lr_deeptabular_4': [0.0001, 0.0001], 'lr_deeptabular_5': [0.0001, 0.0001], 'lr_deeptabular_6': [0.0001, 0.0001], 'lr_deeptabular_7': [0.0001, 0.0001], 'lr_deeptabular_8': [0.0001, 0.0001], 'lr_deeptabular_9': [0.001, 0.001], 'lr_deeptabular_10': [0.001, 0.001], 'lr_deeptabular_11': [0.001, 0.001], 'lr_deeptabular_12': [0.001, 0.001], 'lr_deeptext_0': [0.001, 0.001], 'lr_deepimage_0': [0.001, 0.001], 'lr_deephead_0': [0.001, 0.001]}