PyTorch:样式迁移

  • 2020 年 3 月 26 日
  • 筆記

作者 | Joseph Nelson

来源 | Medium

编辑 | 代码医生团队

在这篇文章中,将重新创建在纸上,列出的风格迁移法影像式转换使用卷积神经网络,在PyTorch。

图像样式转换是一种旨在以一种图像的内容呈现另一种图像的内容的技术,由于实践和科学原因,这是必不可少且令人兴奋的。样式转换技术已广泛用于图像处理应用程序,例如移动相机滤镜和创意图像生成。

在本文中,将使用经过预训练的19层VGG(视觉几何组)网络来实现样式转换任务。VGG网络由一系列卷积,池化和完全连接的层组成。在下图中,按堆栈和堆栈顺序命名卷积层。例如,conv_1_1表示第一堆栈中的第一卷积层;conv_2_1代表第二个堆栈中的第一个卷积层。在下面显示的架构中,网络中最深的卷积层是conv_5_4。

对于样式转换任务,首先需要两个图像。

  1. 内容图像-代表您要设置样式的图像
  2. 样式图像-表示要在内容图像上使用的样式,颜色和纹理。

使用这些内容图像和样式图像来创建新的目标图像,该目标图像同时具有样式和内容图像的属性。在这里,将Gal Gadot的美丽图片作为内容图像,将抽象艺术设计作为样式图像。要使用Gal Gadot图片上的样式图像来获得Gal Gadot的艺术素描肖像。

在PyTorch中开始魔术

将使用经过预训练的VGG19 Net提取内容或样式特征。然后将形式化内容损失和样式损失的概念,并将其应用于迭代更新目标图像,直到获得所需的结果。为模型导入必要的资源。

# import resources  %matplotlib inline  from PIL import Image  from io import BytesIO  import matplotlib.pyplot as plt  import numpy as np  import torch  import torch.optim as optim  import requests  from torchvision import transforms, models

加载预训练的VGG模型

PyTorch的VGG19预训练模型有两个部分。vgg19.features包含卷积和池化层,而vgg19.classifier具有3个完全连接的分类器。只需要vgg19.features来提取图像的内容和样式特征,因此将加载它们并冻结权重。

# get the "features" portion of VGG19 (we will not need the "classifier" portion)  vgg = models.vgg19(pretrained=True).features  # freeze all VGG parameters since we're only optimizing the target image  for param in vgg.parameters():      param.requires_grad_(False)  # move the model to GPU, if available  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  vgg.to(device)

以下是帮助程序功能,用于加载内容和样式图像并将图像转换为标准化张量。这些功能将调整相同大小的内容和样式图像的大小。为这些功能提供适当的图像位置。

def load_image(img_path, max_size=400, shape=None):      ''' Load in and transform an image, making sure the image         is <= 400 pixels in the x-y dims.'''      if "http" in img_path:          response = requests.get(img_path)          image = Image.open(BytesIO(response.content)).convert('RGB')      else:          image = Image.open(img_path).convert('RGB')        # large images will slow down processing      if max(image.size) > max_size:          size = max_size      else:          size = max(image.size)        if shape is not None:          size = shape        in_transform = transforms.Compose([                          transforms.Resize(size),                          transforms.ToTensor(),                          transforms.Normalize((0.485, 0.456, 0.406),                                               (0.229, 0.224, 0.225))])        # discard the transparent, alpha channel (that's the :3) and add the batch dimension      image = in_transform(image)[:3,:,:].unsqueeze(0)        return image        # helper function for un-normalizing an image  # and converting it from a Tensor image to a NumPy image for display  def im_convert(tensor):      """ Display a tensor as an image. """        image = tensor.to("cpu").clone().detach()      image = image.numpy().squeeze()      image = image.transpose(1,2,0)      image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))      image = image.clip(0, 1)        return image       # load in content and style image  content = load_image('/PATH_TO/content.jpg').to(device)  # Resize style to match content, makes code easier  style = load_image('/PATH_TO/style.jpg', shape=content.shape[-2:]).to(device)

功能和Gram Matrix

为了获得图像的内容和样式表示,必须将图像向前传递通过VGG19网络,直到到达所需的图层,然后从该图层获取输出。在样式转换纸中,他们使用conv1_1(第0层),conv2_1(第5层),conv_3_1(第10层),conv4_1(第19层)和conv_5_1(第28层)进行样式表示,并使用conv5_1(第28层)进行内容表示表示。下面的函数get_features()可以做到这一点。需要一个名为Gram Matrix的东西来获取所需格式的样式表示,以训练我们的模型。下面的函数gram_matrix()执行此操作。使用get_features()函数获取内容和样式特征,然后为样式表示的每一层计算gram矩阵。现在创建一个“目标图像”以将样式和内容表示形式结合在一起。复制内容图像作为起点,然后迭代更改其样式。

def get_features(image, model, layers=None):      """ Run an image forward through a model and get the features for          a set of layers. Default layers are for VGGNet matching Gatys et al (2016)      """        ## The mapping layer names of PyTorch's VGGNet to names from the paper      ## Need the layers for the content and style representations of an image      if layers is None:          layers = {'0': 'conv1_1',                    '5': 'conv2_1',                    '10': 'conv3_1',                    '19': 'conv4_1',                    '21': 'conv4_2',  ## content representation                    '28': 'conv5_1'}        features = {}      x = image      # model._modules is a dictionary holding each module in the model      for name, layer in model._modules.items():          x = layer(x)          if name in layers:              features[layers[name]] = x        return features      def gram_matrix(tensor):      """ Calculate the Gram Matrix of a given tensor          Gram Matrix: https://en.wikipedia.org/wiki/Gramian_matrix      """        ## get the batch_size, depth, height, and width of the Tensor      ## reshape it, so we're multiplying the features for each channel      ## calculate the gram matrix      _, d, h, w = tensor.size()      matrix1 = tensor.view(d,h*w)      matrix2 = matrix1.t()      gram = torch.mm(matrix1,matrix2)        return gram      # get content and style features only once before forming the target image  content_features = get_features(content, vgg)  style_features = get_features(style, vgg)    # calculate the gram matrices for each layer of our style representation  style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}    # create a third "target" image and prep it for change  # it is a good idea to start off with the target as a copy of our *content* image  # then iteratively change its style  target = content.clone().requires_grad_(True).to(device)

各个图层样式权重

在下面,可以选择在每个相关层对样式表示进行加权。建议您使用0–1之间的范围来加权这些图层。通过对较早的图层(conv1_1和conv2_1)进行加权,可以期望在最终的目标图像中获得更大的样式效果。

内容和样式权重

就像在本文中一样,定义一个alpha(content_weight)和一个beta(style_weight)。该比率将影响最终图像的样式。建议保留content_weight = 1并设置style_weight以实现所需的比率。

训练模型

将决定要更新目标图像的许多步骤,只更改目标图像,而关于VGG19的其他所有操作都没有。在训练循环中使用了3000个步骤。在迭代循环中,将计算内容和样式损失并更新目标图像。内容丢失是目标和内容功能之间的MSE。样式损失也以类似的方式计算,在style_weights中提到的图层进行迭代。最后,将通过添加样式和内容损失并使用指定的alpha和beta值对其加权来创建总损失。以下代码段给出了内容权重,样式权重和训练循环。

# weights for each style layer  # weighting earlier layers more will result in *larger* style artifacts  # notice we are excluding `conv4_2` our content representation  style_weights = {'conv1_1': 1.,                   'conv2_1': 0.8,                   'conv3_1': 0.5,                   'conv4_1': 0.3,                   'conv5_1': 0.1}    # you may choose to leave these as is  content_weight = 1  # alpha  style_weight = 1e6  # beta      # for displaying the target image, intermittently  show_every = 400    # iteration hyperparameters  optimizer = optim.Adam([target], lr=0.003)  steps = 3000  # decide how many iterations to update your image (5000)    for ii in range(1, steps+1):        ## get the features from your target image      ## Then calculate the content loss      target_features = get_features(target,vgg)      content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)        # the style loss      # initialize the style loss to 0      style_loss = 0      # iterate through each style layer and add to the style loss      for layer in style_weights:          # get the "target" style representation for the layer          target_feature = target_features[layer]          _, d, h, w = target_feature.shape            ##  Calculate the target gram matrix          target_gram = gram_matrix(target_feature)            ## get the "style" style representation          style_gram = style_grams[layer]          ##  Calculate the style loss for one layer, weighted appropriately          layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)            # add to the style loss          style_loss += layer_style_loss / (d * h * w)          ##  calculate the *total* loss      total_loss = content_weight * content_loss + style_weight *  style_loss        # update your target image      optimizer.zero_grad()      total_loss.backward()      optimizer.step()        # display intermediate images and print the loss      if  ii % show_every == 0:          print('Total loss: ', total_loss.item())          plt.imshow(im_convert(target))          plt.show()

以下是每400步打印出的训练结果。

最终,经过3000步,获得了Gal Gadot的艺术素描肖像。

原始图像与样式迁移的图像

请查看代码段以了解详细信息。

参考:

使用PyTorch进行神经传递-PyTorch教程1.4.0文档

https://pytorch.org/tutorials/advanced/neural_style_tutorial.html

本文代码

https://github.com/udacity/deep-learning-v2-pytorch/tree/master/style-transfer