PyTorch:樣式遷移
- 2020 年 3 月 26 日
- 筆記

作者 | Joseph Nelson
來源 | Medium
編輯 | 程式碼醫生團隊
在這篇文章中,將重新創建在紙上,列出的風格遷移法影像式轉換使用卷積神經網路,在PyTorch。
影像樣式轉換是一種旨在以一種影像的內容呈現另一種影像的內容的技術,由於實踐和科學原因,這是必不可少且令人興奮的。樣式轉換技術已廣泛用於影像處理應用程式,例如移動相機濾鏡和創意影像生成。
在本文中,將使用經過預訓練的19層VGG(視覺幾何組)網路來實現樣式轉換任務。VGG網路由一系列卷積,池化和完全連接的層組成。在下圖中,按堆棧和堆棧順序命名卷積層。例如,conv_1_1表示第一堆棧中的第一卷積層;conv_2_1代表第二個堆棧中的第一個卷積層。在下面顯示的架構中,網路中最深的卷積層是conv_5_4。

對於樣式轉換任務,首先需要兩個影像。
- 內容影像-代表您要設置樣式的影像
- 樣式影像-表示要在內容影像上使用的樣式,顏色和紋理。
使用這些內容影像和樣式影像來創建新的目標影像,該目標影像同時具有樣式和內容影像的屬性。在這裡,將Gal Gadot的美麗圖片作為內容影像,將抽象藝術設計作為樣式影像。要使用Gal Gadot圖片上的樣式影像來獲得Gal Gadot的藝術素描肖像。

在PyTorch中開始魔術
將使用經過預訓練的VGG19 Net提取內容或樣式特徵。然後將形式化內容損失和樣式損失的概念,並將其應用於迭代更新目標影像,直到獲得所需的結果。為模型導入必要的資源。
# import resources %matplotlib inline from PIL import Image from io import BytesIO import matplotlib.pyplot as plt import numpy as np import torch import torch.optim as optim import requests from torchvision import transforms, models
載入預訓練的VGG模型
PyTorch的VGG19預訓練模型有兩個部分。vgg19.features包含卷積和池化層,而vgg19.classifier具有3個完全連接的分類器。只需要vgg19.features來提取影像的內容和樣式特徵,因此將載入它們並凍結權重。
# get the "features" portion of VGG19 (we will not need the "classifier" portion) vgg = models.vgg19(pretrained=True).features # freeze all VGG parameters since we're only optimizing the target image for param in vgg.parameters(): param.requires_grad_(False) # move the model to GPU, if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") vgg.to(device)
以下是幫助程式功能,用於載入內容和樣式影像並將影像轉換為標準化張量。這些功能將調整相同大小的內容和樣式影像的大小。為這些功能提供適當的影像位置。
def load_image(img_path, max_size=400, shape=None): ''' Load in and transform an image, making sure the image is <= 400 pixels in the x-y dims.''' if "http" in img_path: response = requests.get(img_path) image = Image.open(BytesIO(response.content)).convert('RGB') else: image = Image.open(img_path).convert('RGB') # large images will slow down processing if max(image.size) > max_size: size = max_size else: size = max(image.size) if shape is not None: size = shape in_transform = transforms.Compose([ transforms.Resize(size), transforms.ToTensor(), transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]) # discard the transparent, alpha channel (that's the :3) and add the batch dimension image = in_transform(image)[:3,:,:].unsqueeze(0) return image # helper function for un-normalizing an image # and converting it from a Tensor image to a NumPy image for display def im_convert(tensor): """ Display a tensor as an image. """ image = tensor.to("cpu").clone().detach() image = image.numpy().squeeze() image = image.transpose(1,2,0) image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406)) image = image.clip(0, 1) return image # load in content and style image content = load_image('/PATH_TO/content.jpg').to(device) # Resize style to match content, makes code easier style = load_image('/PATH_TO/style.jpg', shape=content.shape[-2:]).to(device)
功能和Gram Matrix
為了獲得影像的內容和樣式表示,必須將影像向前傳遞通過VGG19網路,直到到達所需的圖層,然後從該圖層獲取輸出。在樣式轉換紙中,他們使用conv1_1(第0層),conv2_1(第5層),conv_3_1(第10層),conv4_1(第19層)和conv_5_1(第28層)進行樣式表示,並使用conv5_1(第28層)進行內容表示表示。下面的函數get_features()可以做到這一點。需要一個名為Gram Matrix的東西來獲取所需格式的樣式表示,以訓練我們的模型。下面的函數gram_matrix()執行此操作。使用get_features()函數獲取內容和樣式特徵,然後為樣式表示的每一層計算gram矩陣。現在創建一個「目標影像」以將樣式和內容表示形式結合在一起。複製內容影像作為起點,然後迭代更改其樣式。
def get_features(image, model, layers=None): """ Run an image forward through a model and get the features for a set of layers. Default layers are for VGGNet matching Gatys et al (2016) """ ## The mapping layer names of PyTorch's VGGNet to names from the paper ## Need the layers for the content and style representations of an image if layers is None: layers = {'0': 'conv1_1', '5': 'conv2_1', '10': 'conv3_1', '19': 'conv4_1', '21': 'conv4_2', ## content representation '28': 'conv5_1'} features = {} x = image # model._modules is a dictionary holding each module in the model for name, layer in model._modules.items(): x = layer(x) if name in layers: features[layers[name]] = x return features def gram_matrix(tensor): """ Calculate the Gram Matrix of a given tensor Gram Matrix: https://en.wikipedia.org/wiki/Gramian_matrix """ ## get the batch_size, depth, height, and width of the Tensor ## reshape it, so we're multiplying the features for each channel ## calculate the gram matrix _, d, h, w = tensor.size() matrix1 = tensor.view(d,h*w) matrix2 = matrix1.t() gram = torch.mm(matrix1,matrix2) return gram # get content and style features only once before forming the target image content_features = get_features(content, vgg) style_features = get_features(style, vgg) # calculate the gram matrices for each layer of our style representation style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features} # create a third "target" image and prep it for change # it is a good idea to start off with the target as a copy of our *content* image # then iteratively change its style target = content.clone().requires_grad_(True).to(device)
各個圖層樣式權重
在下面,可以選擇在每個相關層對樣式表示進行加權。建議您使用0–1之間的範圍來加權這些圖層。通過對較早的圖層(conv1_1和conv2_1)進行加權,可以期望在最終的目標影像中獲得更大的樣式效果。
內容和樣式權重
就像在本文中一樣,定義一個alpha(content_weight)和一個beta(style_weight)。該比率將影響最終影像的樣式。建議保留content_weight = 1並設置style_weight以實現所需的比率。
訓練模型
將決定要更新目標影像的許多步驟,只更改目標影像,而關於VGG19的其他所有操作都沒有。在訓練循環中使用了3000個步驟。在迭代循環中,將計算內容和樣式損失並更新目標影像。內容丟失是目標和內容功能之間的MSE。樣式損失也以類似的方式計算,在style_weights中提到的圖層進行迭代。最後,將通過添加樣式和內容損失並使用指定的alpha和beta值對其加權來創建總損失。以下程式碼段給出了內容權重,樣式權重和訓練循環。
# weights for each style layer # weighting earlier layers more will result in *larger* style artifacts # notice we are excluding `conv4_2` our content representation style_weights = {'conv1_1': 1., 'conv2_1': 0.8, 'conv3_1': 0.5, 'conv4_1': 0.3, 'conv5_1': 0.1} # you may choose to leave these as is content_weight = 1 # alpha style_weight = 1e6 # beta # for displaying the target image, intermittently show_every = 400 # iteration hyperparameters optimizer = optim.Adam([target], lr=0.003) steps = 3000 # decide how many iterations to update your image (5000) for ii in range(1, steps+1): ## get the features from your target image ## Then calculate the content loss target_features = get_features(target,vgg) content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2) # the style loss # initialize the style loss to 0 style_loss = 0 # iterate through each style layer and add to the style loss for layer in style_weights: # get the "target" style representation for the layer target_feature = target_features[layer] _, d, h, w = target_feature.shape ## Calculate the target gram matrix target_gram = gram_matrix(target_feature) ## get the "style" style representation style_gram = style_grams[layer] ## Calculate the style loss for one layer, weighted appropriately layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2) # add to the style loss style_loss += layer_style_loss / (d * h * w) ## calculate the *total* loss total_loss = content_weight * content_loss + style_weight * style_loss # update your target image optimizer.zero_grad() total_loss.backward() optimizer.step() # display intermediate images and print the loss if ii % show_every == 0: print('Total loss: ', total_loss.item()) plt.imshow(im_convert(target)) plt.show()
以下是每400步列印出的訓練結果。

最終,經過3000步,獲得了Gal Gadot的藝術素描肖像。

原始影像與樣式遷移的影像
請查看程式碼段以了解詳細資訊。
參考:
使用PyTorch進行神經傳遞-PyTorch教程1.4.0文檔
https://pytorch.org/tutorials/advanced/neural_style_tutorial.html
本文程式碼
https://github.com/udacity/deep-learning-v2-pytorch/tree/master/style-transfer