基础分类网络VGG

  • 2019 年 10 月 3 日
  • 筆記

vgg16是牛津大学视觉几何组(Oxford Visual Geometry Group)2014年提出的一个模型. vgg模型也得名于此.
2014年,vgg16拿了Imagenet Large Scale Visual Recognition Challenge 2014 (ILSVRC2014)
比赛的冠军.

论文连接:https://arxiv.org/abs/1409.1556

http://www.robots.ox.ac.uk/~vgg/research/very_deep/牛津大学视觉研究小组在这里放出了他们在ImageNet比赛训练得到的模型文件.

网上有很多vgg16的实现,下面

vgg的模型结构如下:

每一层的卷积核的大小都是3*3.

现在的keras里已经集成了很多模型,具体可以参考keras的文档.
https://keras.io/applications/#models-for-image-classification-with-weights-trained-on-imagenet

下面是keras_applications/vgg16.py的实现.比tensorflow的代码更易于理解.

"""VGG16 model for Keras.    # Reference    - [Very Deep Convolutional Networks for Large-Scale Image Recognition](      https://arxiv.org/abs/1409.1556) (ICLR 2015)    """  from __future__ import absolute_import  from __future__ import division  from __future__ import print_function    import os    from . import get_submodules_from_kwargs  from . import imagenet_utils  from .imagenet_utils import decode_predictions  from .imagenet_utils import _obtain_input_shape    preprocess_input = imagenet_utils.preprocess_input    WEIGHTS_PATH = ('https://github.com/fchollet/deep-learning-models/'                  'releases/download/v0.1/'                  'vgg16_weights_tf_dim_ordering_tf_kernels.h5')  WEIGHTS_PATH_NO_TOP = ('https://github.com/fchollet/deep-learning-models/'                         'releases/download/v0.1/'                         'vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5')      def VGG16(include_top=True,            weights='imagenet',            input_tensor=None,            input_shape=None,            pooling=None,            classes=1000,            **kwargs):      """Instantiates the VGG16 architecture.        Optionally loads weights pre-trained on ImageNet.      Note that the data format convention used by the model is      the one specified in your Keras config at `~/.keras/keras.json`.        # Arguments          include_top: whether to include the 3 fully-connected              layers at the top of the network.          weights: one of `None` (random initialization),                'imagenet' (pre-training on ImageNet),                or the path to the weights file to be loaded.          input_tensor: optional Keras tensor              (i.e. output of `layers.Input()`)              to use as image input for the model.          input_shape: optional shape tuple, only to be specified              if `include_top` is False (otherwise the input shape              has to be `(224, 224, 3)`              (with `channels_last` data format)              or `(3, 224, 224)` (with `channels_first` data format).              It should have exactly 3 input channels,              and width and height should be no smaller than 32.              E.g. `(200, 200, 3)` would be one valid value.          pooling: Optional pooling mode for feature extraction              when `include_top` is `False`.              - `None` means that the output of the model will be                  the 4D tensor output of the                  last convolutional block.              - `avg` means that global average pooling                  will be applied to the output of the                  last convolutional block, and thus                  the output of the model will be a 2D tensor.              - `max` means that global max pooling will                  be applied.          classes: optional number of classes to classify images              into, only to be specified if `include_top` is True, and              if no `weights` argument is specified.        # Returns          A Keras model instance.        # Raises          ValueError: in case of invalid argument for `weights`,              or invalid input shape.      """      backend, layers, models, keras_utils = get_submodules_from_kwargs(kwargs)        if not (weights in {'imagenet', None} or os.path.exists(weights)):          raise ValueError('The `weights` argument should be either '                           '`None` (random initialization), `imagenet` '                           '(pre-training on ImageNet), '                           'or the path to the weights file to be loaded.')        if weights == 'imagenet' and include_top and classes != 1000:          raise ValueError('If using `weights` as `"imagenet"` with `include_top`'                           ' as true, `classes` should be 1000')      # Determine proper input shape      input_shape = _obtain_input_shape(input_shape,                                        default_size=224,                                        min_size=32,                                        data_format=backend.image_data_format(),                                        require_flatten=include_top,                                        weights=weights)        if input_tensor is None:          img_input = layers.Input(shape=input_shape)      else:          if not backend.is_keras_tensor(input_tensor):              img_input = layers.Input(tensor=input_tensor, shape=input_shape)          else:              img_input = input_tensor      # Block 1      x = layers.Conv2D(64, (3, 3),                        activation='relu',                        padding='same',                        name='block1_conv1')(img_input)      x = layers.Conv2D(64, (3, 3),                        activation='relu',                        padding='same',                        name='block1_conv2')(x)      x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)        # Block 2      x = layers.Conv2D(128, (3, 3),                        activation='relu',                        padding='same',                        name='block2_conv1')(x)      x = layers.Conv2D(128, (3, 3),                        activation='relu',                        padding='same',                        name='block2_conv2')(x)      x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)        # Block 3      x = layers.Conv2D(256, (3, 3),                        activation='relu',                        padding='same',                        name='block3_conv1')(x)      x = layers.Conv2D(256, (3, 3),                        activation='relu',                        padding='same',                        name='block3_conv2')(x)      x = layers.Conv2D(256, (3, 3),                        activation='relu',                        padding='same',                        name='block3_conv3')(x)      x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)        # Block 4      x = layers.Conv2D(512, (3, 3),                        activation='relu',                        padding='same',                        name='block4_conv1')(x)      x = layers.Conv2D(512, (3, 3),                        activation='relu',                        padding='same',                        name='block4_conv2')(x)      x = layers.Conv2D(512, (3, 3),                        activation='relu',                        padding='same',                        name='block4_conv3')(x)      x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)        # Block 5      x = layers.Conv2D(512, (3, 3),                        activation='relu',                        padding='same',                        name='block5_conv1')(x)      x = layers.Conv2D(512, (3, 3),                        activation='relu',                        padding='same',                        name='block5_conv2')(x)      x = layers.Conv2D(512, (3, 3),                        activation='relu',                        padding='same',                        name='block5_conv3')(x)      x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)        if include_top:          # Classification block          x = layers.Flatten(name='flatten')(x)          x = layers.Dense(4096, activation='relu', name='fc1')(x)          x = layers.Dense(4096, activation='relu', name='fc2')(x)          x = layers.Dense(classes, activation='softmax', name='predictions')(x)      else:          if pooling == 'avg':              x = layers.GlobalAveragePooling2D()(x)          elif pooling == 'max':              x = layers.GlobalMaxPooling2D()(x)        # Ensure that the model takes into account      # any potential predecessors of `input_tensor`.      if input_tensor is not None:          inputs = keras_utils.get_source_inputs(input_tensor)      else:          inputs = img_input      # Create model.      model = models.Model(inputs, x, name='vgg16')        # Load weights.      if weights == 'imagenet':          if include_top:              weights_path = keras_utils.get_file(                  'vgg16_weights_tf_dim_ordering_tf_kernels.h5',                  WEIGHTS_PATH,                  cache_subdir='models',                  file_hash='64373286793e3c8b2b4e3219cbf3544b')          else:              weights_path = keras_utils.get_file(                  'vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',                  WEIGHTS_PATH_NO_TOP,                  cache_subdir='models',                  file_hash='6d6bbae143d832006294945121d1f1fc')          model.load_weights(weights_path)          if backend.backend() == 'theano':              keras_utils.convert_all_kernels_in_model(model)      elif weights is not None:          model.load_weights(weights)        return model  

可以清楚地看出来,所用的卷积核全部是3*3的.

用keras做预测也很简单,

from keras.applications.vgg16 import VGG16  model = VGG16()  print(model.summary())

上面代码会把权重文件下载到

这里贴一段网上找的代码

from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions    from keras.preprocessing.image import load_img, img_to_array  import numpy as np  # VGG-16 instance  model = VGG16(weights='imagenet', include_top=True)    image = load_img('C:/Pictures/Pictures/test_imgs/golden.jpg', target_size=(224, 224))  image_data = img_to_array(image)    # reshape it into the specific format  image_data = image_data.reshape((1,) + image_data.shape)  print(image_data.shape)    # prepare the image data for VGG  image_data = preprocess_input(image_data)    # using the pre-trained model to predict  prediction = model.predict(image_data)    # decode the prediction results  results = decode_predictions(prediction, top=3)    print(results)

很简单

  • 加载模型
  • 加载图片,预处理
  • 前向传播
  • 解释输出tensor

 


vgg19和vgg16结构基本一致的,就是多了几个卷积层.