
  • 2020 年 3 月 26 日
  • 筆記

作者 | Joseph Nelson

來源 | Medium

編輯 | 代碼醫生團隊





  1. 內容圖像-代表您要設置樣式的圖像
  2. 樣式圖像-表示要在內容圖像上使用的樣式,顏色和紋理。

使用這些內容圖像和樣式圖像來創建新的目標圖像,該目標圖像同時具有樣式和內容圖像的屬性。在這裡,將Gal Gadot的美麗圖片作為內容圖像,將抽象藝術設計作為樣式圖像。要使用Gal Gadot圖片上的樣式圖像來獲得Gal Gadot的藝術素描肖像。


將使用經過預訓練的VGG19 Net提取內容或樣式特徵。然後將形式化內容損失和樣式損失的概念,並將其應用於迭代更新目標圖像,直到獲得所需的結果。為模型導入必要的資源。

# import resources  %matplotlib inline  from PIL import Image  from io import BytesIO  import matplotlib.pyplot as plt  import numpy as np  import torch  import torch.optim as optim  import requests  from torchvision import transforms, models



# get the "features" portion of VGG19 (we will not need the "classifier" portion)  vgg = models.vgg19(pretrained=True).features  # freeze all VGG parameters since we're only optimizing the target image  for param in vgg.parameters():      param.requires_grad_(False)  # move the model to GPU, if available  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def load_image(img_path, max_size=400, shape=None):      ''' Load in and transform an image, making sure the image         is <= 400 pixels in the x-y dims.'''      if "http" in img_path:          response = requests.get(img_path)          image ='RGB')      else:          image ='RGB')        # large images will slow down processing      if max(image.size) > max_size:          size = max_size      else:          size = max(image.size)        if shape is not None:          size = shape        in_transform = transforms.Compose([                          transforms.Resize(size),                          transforms.ToTensor(),                          transforms.Normalize((0.485, 0.456, 0.406),                                               (0.229, 0.224, 0.225))])        # discard the transparent, alpha channel (that's the :3) and add the batch dimension      image = in_transform(image)[:3,:,:].unsqueeze(0)        return image        # helper function for un-normalizing an image  # and converting it from a Tensor image to a NumPy image for display  def im_convert(tensor):      """ Display a tensor as an image. """        image ="cpu").clone().detach()      image = image.numpy().squeeze()      image = image.transpose(1,2,0)      image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))      image = image.clip(0, 1)        return image       # load in content and style image  content = load_image('/PATH_TO/content.jpg').to(device)  # Resize style to match content, makes code easier  style = load_image('/PATH_TO/style.jpg', shape=content.shape[-2:]).to(device)

功能和Gram Matrix

為了獲得圖像的內容和樣式表示,必須將圖像向前傳遞通過VGG19網絡,直到到達所需的圖層,然後從該圖層獲取輸出。在樣式轉換紙中,他們使用conv1_1(第0層),conv2_1(第5層),conv_3_1(第10層),conv4_1(第19層)和conv_5_1(第28層)進行樣式表示,並使用conv5_1(第28層)進行內容表示表示。下面的函數get_features()可以做到這一點。需要一個名為Gram Matrix的東西來獲取所需格式的樣式表示,以訓練我們的模型。下面的函數gram_matrix()執行此操作。使用get_features()函數獲取內容和樣式特徵,然後為樣式表示的每一層計算gram矩陣。現在創建一個「目標圖像」以將樣式和內容表示形式結合在一起。複製內容圖像作為起點,然後迭代更改其樣式。

def get_features(image, model, layers=None):      """ Run an image forward through a model and get the features for          a set of layers. Default layers are for VGGNet matching Gatys et al (2016)      """        ## The mapping layer names of PyTorch's VGGNet to names from the paper      ## Need the layers for the content and style representations of an image      if layers is None:          layers = {'0': 'conv1_1',                    '5': 'conv2_1',                    '10': 'conv3_1',                    '19': 'conv4_1',                    '21': 'conv4_2',  ## content representation                    '28': 'conv5_1'}        features = {}      x = image      # model._modules is a dictionary holding each module in the model      for name, layer in model._modules.items():          x = layer(x)          if name in layers:              features[layers[name]] = x        return features      def gram_matrix(tensor):      """ Calculate the Gram Matrix of a given tensor          Gram Matrix:      """        ## get the batch_size, depth, height, and width of the Tensor      ## reshape it, so we're multiplying the features for each channel      ## calculate the gram matrix      _, d, h, w = tensor.size()      matrix1 = tensor.view(d,h*w)      matrix2 = matrix1.t()      gram =,matrix2)        return gram      # get content and style features only once before forming the target image  content_features = get_features(content, vgg)  style_features = get_features(style, vgg)    # calculate the gram matrices for each layer of our style representation  style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}    # create a third "target" image and prep it for change  # it is a good idea to start off with the target as a copy of our *content* image  # then iteratively change its style  target = content.clone().requires_grad_(True).to(device)




就像在本文中一樣,定義一個alpha(content_weight)和一個beta(style_weight)。該比率將影響最終圖像的樣式。建議保留content_weight = 1並設置style_weight以實現所需的比率。



# weights for each style layer  # weighting earlier layers more will result in *larger* style artifacts  # notice we are excluding `conv4_2` our content representation  style_weights = {'conv1_1': 1.,                   'conv2_1': 0.8,                   'conv3_1': 0.5,                   'conv4_1': 0.3,                   'conv5_1': 0.1}    # you may choose to leave these as is  content_weight = 1  # alpha  style_weight = 1e6  # beta      # for displaying the target image, intermittently  show_every = 400    # iteration hyperparameters  optimizer = optim.Adam([target], lr=0.003)  steps = 3000  # decide how many iterations to update your image (5000)    for ii in range(1, steps+1):        ## get the features from your target image      ## Then calculate the content loss      target_features = get_features(target,vgg)      content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)        # the style loss      # initialize the style loss to 0      style_loss = 0      # iterate through each style layer and add to the style loss      for layer in style_weights:          # get the "target" style representation for the layer          target_feature = target_features[layer]          _, d, h, w = target_feature.shape            ##  Calculate the target gram matrix          target_gram = gram_matrix(target_feature)            ## get the "style" style representation          style_gram = style_grams[layer]          ##  Calculate the style loss for one layer, weighted appropriately          layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)            # add to the style loss          style_loss += layer_style_loss / (d * h * w)          ##  calculate the *total* loss      total_loss = content_weight * content_loss + style_weight *  style_loss        # update your target image      optimizer.zero_grad()      total_loss.backward()      optimizer.step()        # display intermediate images and print the loss      if  ii % show_every == 0:          print('Total loss: ', total_loss.item())          plt.imshow(im_convert(target))


最終,經過3000步,獲得了Gal Gadot的藝術素描肖像。




