PyTorch:樣式遷移

  • 2020 年 3 月 26 日
  • 筆記

作者 | Joseph Nelson

來源 | Medium

編輯 | 代碼醫生團隊

在這篇文章中,將重新創建在紙上,列出的風格遷移法影像式轉換使用卷積神經網絡,在PyTorch。

圖像樣式轉換是一種旨在以一種圖像的內容呈現另一種圖像的內容的技術,由於實踐和科學原因,這是必不可少且令人興奮的。樣式轉換技術已廣泛用於圖像處理應用程序,例如移動相機濾鏡和創意圖像生成。

在本文中,將使用經過預訓練的19層VGG(視覺幾何組)網絡來實現樣式轉換任務。VGG網絡由一系列卷積,池化和完全連接的層組成。在下圖中,按堆棧和堆棧順序命名卷積層。例如,conv_1_1表示第一堆棧中的第一卷積層;conv_2_1代表第二個堆棧中的第一個卷積層。在下面顯示的架構中,網絡中最深的卷積層是conv_5_4。

對於樣式轉換任務,首先需要兩個圖像。

  1. 內容圖像-代表您要設置樣式的圖像
  2. 樣式圖像-表示要在內容圖像上使用的樣式,顏色和紋理。

使用這些內容圖像和樣式圖像來創建新的目標圖像,該目標圖像同時具有樣式和內容圖像的屬性。在這裡,將Gal Gadot的美麗圖片作為內容圖像,將抽象藝術設計作為樣式圖像。要使用Gal Gadot圖片上的樣式圖像來獲得Gal Gadot的藝術素描肖像。

在PyTorch中開始魔術

將使用經過預訓練的VGG19 Net提取內容或樣式特徵。然後將形式化內容損失和樣式損失的概念,並將其應用於迭代更新目標圖像,直到獲得所需的結果。為模型導入必要的資源。

# import resources  %matplotlib inline  from PIL import Image  from io import BytesIO  import matplotlib.pyplot as plt  import numpy as np  import torch  import torch.optim as optim  import requests  from torchvision import transforms, models

加載預訓練的VGG模型

PyTorch的VGG19預訓練模型有兩個部分。vgg19.features包含卷積和池化層,而vgg19.classifier具有3個完全連接的分類器。只需要vgg19.features來提取圖像的內容和樣式特徵,因此將加載它們並凍結權重。

# get the "features" portion of VGG19 (we will not need the "classifier" portion)  vgg = models.vgg19(pretrained=True).features  # freeze all VGG parameters since we're only optimizing the target image  for param in vgg.parameters():      param.requires_grad_(False)  # move the model to GPU, if available  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  vgg.to(device)

以下是幫助程序功能,用於加載內容和樣式圖像並將圖像轉換為標準化張量。這些功能將調整相同大小的內容和樣式圖像的大小。為這些功能提供適當的圖像位置。

def load_image(img_path, max_size=400, shape=None):      ''' Load in and transform an image, making sure the image         is <= 400 pixels in the x-y dims.'''      if "http" in img_path:          response = requests.get(img_path)          image = Image.open(BytesIO(response.content)).convert('RGB')      else:          image = Image.open(img_path).convert('RGB')        # large images will slow down processing      if max(image.size) > max_size:          size = max_size      else:          size = max(image.size)        if shape is not None:          size = shape        in_transform = transforms.Compose([                          transforms.Resize(size),                          transforms.ToTensor(),                          transforms.Normalize((0.485, 0.456, 0.406),                                               (0.229, 0.224, 0.225))])        # discard the transparent, alpha channel (that's the :3) and add the batch dimension      image = in_transform(image)[:3,:,:].unsqueeze(0)        return image        # helper function for un-normalizing an image  # and converting it from a Tensor image to a NumPy image for display  def im_convert(tensor):      """ Display a tensor as an image. """        image = tensor.to("cpu").clone().detach()      image = image.numpy().squeeze()      image = image.transpose(1,2,0)      image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))      image = image.clip(0, 1)        return image       # load in content and style image  content = load_image('/PATH_TO/content.jpg').to(device)  # Resize style to match content, makes code easier  style = load_image('/PATH_TO/style.jpg', shape=content.shape[-2:]).to(device)

功能和Gram Matrix

為了獲得圖像的內容和樣式表示,必須將圖像向前傳遞通過VGG19網絡,直到到達所需的圖層,然後從該圖層獲取輸出。在樣式轉換紙中,他們使用conv1_1(第0層),conv2_1(第5層),conv_3_1(第10層),conv4_1(第19層)和conv_5_1(第28層)進行樣式表示,並使用conv5_1(第28層)進行內容表示表示。下面的函數get_features()可以做到這一點。需要一個名為Gram Matrix的東西來獲取所需格式的樣式表示,以訓練我們的模型。下面的函數gram_matrix()執行此操作。使用get_features()函數獲取內容和樣式特徵,然後為樣式表示的每一層計算gram矩陣。現在創建一個「目標圖像」以將樣式和內容表示形式結合在一起。複製內容圖像作為起點,然後迭代更改其樣式。

def get_features(image, model, layers=None):      """ Run an image forward through a model and get the features for          a set of layers. Default layers are for VGGNet matching Gatys et al (2016)      """        ## The mapping layer names of PyTorch's VGGNet to names from the paper      ## Need the layers for the content and style representations of an image      if layers is None:          layers = {'0': 'conv1_1',                    '5': 'conv2_1',                    '10': 'conv3_1',                    '19': 'conv4_1',                    '21': 'conv4_2',  ## content representation                    '28': 'conv5_1'}        features = {}      x = image      # model._modules is a dictionary holding each module in the model      for name, layer in model._modules.items():          x = layer(x)          if name in layers:              features[layers[name]] = x        return features      def gram_matrix(tensor):      """ Calculate the Gram Matrix of a given tensor          Gram Matrix: https://en.wikipedia.org/wiki/Gramian_matrix      """        ## get the batch_size, depth, height, and width of the Tensor      ## reshape it, so we're multiplying the features for each channel      ## calculate the gram matrix      _, d, h, w = tensor.size()      matrix1 = tensor.view(d,h*w)      matrix2 = matrix1.t()      gram = torch.mm(matrix1,matrix2)        return gram      # get content and style features only once before forming the target image  content_features = get_features(content, vgg)  style_features = get_features(style, vgg)    # calculate the gram matrices for each layer of our style representation  style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}    # create a third "target" image and prep it for change  # it is a good idea to start off with the target as a copy of our *content* image  # then iteratively change its style  target = content.clone().requires_grad_(True).to(device)

各個圖層樣式權重

在下面,可以選擇在每個相關層對樣式表示進行加權。建議您使用0–1之間的範圍來加權這些圖層。通過對較早的圖層(conv1_1和conv2_1)進行加權,可以期望在最終的目標圖像中獲得更大的樣式效果。

內容和樣式權重

就像在本文中一樣,定義一個alpha(content_weight)和一個beta(style_weight)。該比率將影響最終圖像的樣式。建議保留content_weight = 1並設置style_weight以實現所需的比率。

訓練模型

將決定要更新目標圖像的許多步驟,只更改目標圖像,而關於VGG19的其他所有操作都沒有。在訓練循環中使用了3000個步驟。在迭代循環中,將計算內容和樣式損失並更新目標圖像。內容丟失是目標和內容功能之間的MSE。樣式損失也以類似的方式計算,在style_weights中提到的圖層進行迭代。最後,將通過添加樣式和內容損失並使用指定的alpha和beta值對其加權來創建總損失。以下代碼段給出了內容權重,樣式權重和訓練循環。

# weights for each style layer  # weighting earlier layers more will result in *larger* style artifacts  # notice we are excluding `conv4_2` our content representation  style_weights = {'conv1_1': 1.,                   'conv2_1': 0.8,                   'conv3_1': 0.5,                   'conv4_1': 0.3,                   'conv5_1': 0.1}    # you may choose to leave these as is  content_weight = 1  # alpha  style_weight = 1e6  # beta      # for displaying the target image, intermittently  show_every = 400    # iteration hyperparameters  optimizer = optim.Adam([target], lr=0.003)  steps = 3000  # decide how many iterations to update your image (5000)    for ii in range(1, steps+1):        ## get the features from your target image      ## Then calculate the content loss      target_features = get_features(target,vgg)      content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)        # the style loss      # initialize the style loss to 0      style_loss = 0      # iterate through each style layer and add to the style loss      for layer in style_weights:          # get the "target" style representation for the layer          target_feature = target_features[layer]          _, d, h, w = target_feature.shape            ##  Calculate the target gram matrix          target_gram = gram_matrix(target_feature)            ## get the "style" style representation          style_gram = style_grams[layer]          ##  Calculate the style loss for one layer, weighted appropriately          layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)            # add to the style loss          style_loss += layer_style_loss / (d * h * w)          ##  calculate the *total* loss      total_loss = content_weight * content_loss + style_weight *  style_loss        # update your target image      optimizer.zero_grad()      total_loss.backward()      optimizer.step()        # display intermediate images and print the loss      if  ii % show_every == 0:          print('Total loss: ', total_loss.item())          plt.imshow(im_convert(target))          plt.show()

以下是每400步打印出的訓練結果。

最終,經過3000步,獲得了Gal Gadot的藝術素描肖像。

原始圖像與樣式遷移的圖像

請查看代碼段以了解詳細信息。

參考:

使用PyTorch進行神經傳遞-PyTorch教程1.4.0文檔

https://pytorch.org/tutorials/advanced/neural_style_tutorial.html

本文代碼

https://github.com/udacity/deep-learning-v2-pytorch/tree/master/style-transfer