【連載18】GoogLeNet Inception V3

公眾號後台回復「python「,立刻領取100本機器學習必備Python電子書

GoogLeNet Inception V3在《Rethinking the Inception Architecture for Computer Vision》中提出(注意,在這篇論文中作者把該網路結構叫做v2版,我們以最終的v4版論文的劃分為標準),該論文的亮點在於:

  • 提出通用的網路結構設計準則
  • 引入卷積分解提高效率
  • 引入高效的feature map降維

網路結構設計的準則

前面也說過,深度學習網路的探索更多是個實驗科學,在實驗中人們總結出一些結構設計準則,但說實話我覺得不一定都有實操性:

  • 避免特徵表示上的瓶頸,尤其在神經網路的前若干層 神經網路包含一個自動提取特徵的過程,例如多層卷積,直觀並符合常識的理解:如果在網路初期特徵提取的太粗,細節已經丟了,後續即使結構再精細也沒法做有效表示了;舉個極端的例子:在宇宙中辨別一個星球,正常來說是通過由近及遠,從房屋、樹木到海洋、大陸板塊再到整個星球之後進入整個宇宙,如果我們一開始就直接拉遠到宇宙,你會發現所有星球都是球體,沒法區分哪個是地球哪個是水星。所以feature map的大小應該是隨著層數的加深逐步變小,但為了保證特徵能得到有效表示和組合其通道數量會逐漸增加。 下圖違反了這個原則,剛開就始直接從35×35×320被抽樣降維到了17×17×320,特徵細節被大量丟失,即使後面有Inception去做各種特徵提取和組合也沒用。
  • 對於神經網路的某一層,通過更多的激活輸出分支可以產生互相解耦的特徵表示,從而產生高階稀疏特徵,從而加速收斂,注意下圖的1×3和3×1激活輸出:
  • 合理使用維度縮減不會破壞網路特徵表示能力反而能加快收斂速度,典型的例如通過兩個3×3代替一個5×5的降維策略,不考慮padding,用兩個3×3代替一個5×5能節省1-(3×3+3×3)/(5×5)=28%的計算消耗。

以及一個n×n卷積核通過順序相連的兩個1×n和n×1做降維(有點像矩陣分解),如果n=3,計算性能可以提升1-(3+3)/9=33%,但如果考慮高性能計算性能,這種分解可能會造成L1 cache miss率上升。

  • 通過合理平衡網路的寬度和深度優化網路計算消耗(這句話尤其不具有實操性)。
  • 抽樣降維,傳統抽樣方法為pooling+卷積操作,為了防止出現特徵表示的瓶頸,往往需要更多的卷積核,例如輸入為n個d×d的feature map,共有k個卷積核,pooling時stride=2,為不出現特徵表示瓶頸,往往k的取值為2n,通過引入inception module結構,即降低計算複雜度,又不會出現特徵表示瓶頸,實現上有如下兩種方式:

平滑樣本標註

對於多分類的樣本標註一般是one-hot的,例如[0,0,0,1],使用類似交叉熵的損失函數會使得模型學習中對ground truth標籤分配過於置信的概率,並且由於ground truth標籤的logit值與其他標籤差距過大導致,出現過擬合,導致降低泛化性。一種解決方法是加正則項,即對樣本標籤給個概率分布做調節,使得樣本標註變成「soft」的,例如[0.1,0.2,0.1,0.6],這種方式在實驗中降低了top-1和top-5的錯誤率0.2%。

網路結構

程式碼實踐

為了能在單機跑起來,對feature map做了縮減,為適應cifar10的輸入大小,對輸入的stride做了調整,程式碼如下。

# -*- coding: utf-8 -*-  import numpy as np  from keras.layers import Input, merge, Dropout, Dense, Lambda, Flatten, Activation, merge  from keras.layers.convolutional import MaxPooling2D, Conv2D, AveragePooling2D  from keras.layers.normalization import BatchNormalization  from keras.layers.merge import concatenate, add  from keras.regularizers import l1_l2  from keras.models import Model  from keras.callbacks import CSVLogger, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping  lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)  early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)  csv_logger = CSVLogger('resnet34_cifar10.csv')  from keras.utils.vis_utils import plot_model  import os  from keras.preprocessing.image import ImageDataGenerator  from keras.utils import np_utils  from keras.datasets import cifar10  from keras import backend as K  import tensorflow as tf  tf.python.control_flow_ops = tf  import warnings  warnings.filterwarnings('ignore')  filter_control = 8  def bn_relu(input):      """Helper to build a BN -> relu block      """      norm = BatchNormalization()(input)      return Activation("relu")(norm)  def before_inception(input_shape, small_mode=False):      input_layer = input_shape      if small_mode:          strides = (1, 1)      else:          strides = (2, 2)      before_conv1_3x3 = Conv2D(name="before_conv1_3x3/2",                              filters=32 // filter_control,                              kernel_size=(3, 3),                              strides=strides,                              kernel_initializer='he_normal',                              activation='relu',                              kernel_regularizer=l1_l2(0.00001))(input_layer)      before_conv2_3x3 = Conv2D(name="before_conv2_3x3/1",                                filters=32 // filter_control,                                kernel_size=(3, 3),                                strides=(1, 1),                                kernel_initializer='he_normal',                                activation='relu',                                kernel_regularizer=l1_l2(0.00001))(before_conv1_3x3)      before_conv3_3x3 = Conv2D(name="before_conv3_3x3/1",                                filters=64 // filter_control,                                kernel_size=(3, 3),                                strides=(1, 1),                                kernel_initializer='he_normal',                                activation='relu',                                padding='same',                                kernel_regularizer=l1_l2(0.00001))(before_conv2_3x3)      before_pool1_3x3 = MaxPooling2D(name="before_pool1_3x3/2",                                    pool_size=(3, 3),                                    strides=strides,                                    padding='valid')(before_conv3_3x3)      before_conv4_3x3 = Conv2D(name="before_conv4_3x3/1",                                filters=80 // filter_control,                                kernel_size=(3, 3),                                strides=(1, 1),                                kernel_initializer='he_normal',                                activation='relu',                                padding='valid',                                kernel_regularizer=l1_l2(0.00001))(before_pool1_3x3)      before_conv5_3x3 = Conv2D(name="before_conv3_3x3/2",                                filters=192 // filter_control,                                kernel_size=(3, 3),                                strides=strides,                                kernel_initializer='he_normal',                                activation='relu',                                padding='valid',                                kernel_regularizer=l1_l2(0.00001))(before_conv4_3x3)      before_conv6_3x3 = Conv2D(name="before_conv6_3x3/1",                                filters=288 // filter_control,                                kernel_size=(3, 3),                                strides=(1, 1),                                kernel_initializer='he_normal',                                activation='relu',                                padding='valid',                                kernel_regularizer=l1_l2(0.00001))(before_conv5_3x3)      return before_conv6_3x3  def inception_A(i, input_shape):      input_layer = input_shape      # (20,20,288)      inception_A_conv1_1x1 = Conv2D(name="inception_A_conv1_1x1/1" + i,                              filters=64 // filter_control,                              kernel_size=(1, 1),                              strides=(1, 1),                              kernel_initializer='he_normal',                              activation='relu',                              padding='same',                              kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_A_conv2_3x3 = Conv2D(name="inception_A_conv2_3x3/1" + i,                                     filters=96 // filter_control,                                     kernel_size=(3, 3),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_A_conv1_1x1)      inception_A_conv3_3x3 = Conv2D(name="inception_A_conv3_3x3/1" + i,                                     filters=96 // filter_control,                                     kernel_size=(3, 3),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_A_conv2_3x3)      inception_A_conv4_1x1 = Conv2D(name="inception_A_conv4_1x1/1" + i,                                     filters=48 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_A_conv5_3x3 = Conv2D(name="inception_A_conv5_3x3/1" + i,                                     filters=64 // filter_control,                                     kernel_size=(3, 3),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_A_conv4_1x1)      inception_A_pool1_3x3 = AveragePooling2D(name="inception_A_pool1_3x3/1" + i,                                      pool_size=(3, 3),                                      strides=(1, 1),                                      padding='same')(input_layer)      inception_A_conv6_1x1 = Conv2D(name="inception_A_conv6_1x1/1" + i,                                     filters=64 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_A_pool1_3x3)      inception_A_conv7_1x1 = Conv2D(name="inception_A_conv7_1x1/1" + i,                                     filters=64 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_A_merge1 = concatenate([inception_A_conv3_3x3, inception_A_conv5_3x3, inception_A_conv6_1x1, inception_A_conv7_1x1])      return bn_relu(inception_A_merge1)  def inception_B(i, input_shape):      input_layer = input_shape      inception_B_conv1_1x1 = Conv2D(name="inception_B_conv1_1x1/1" + i,                                     filters=128 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_B_conv2_1x7 = Conv2D(name="inception_A_conv2_3x3/1" + i,                                     filters=128 // filter_control,                                     kernel_size=(1, 7),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_conv1_1x1)      inception_B_conv3_7x1 = Conv2D(name="inception_B_conv3_7x1/1" + i,                                     filters=128 // filter_control,                                     kernel_size=(7, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_conv2_1x7)      inception_B_conv4_1x7 = Conv2D(name="inception_B_conv4_1x7/1" + i,                                     filters=128 // filter_control,                                     kernel_size=(1, 7),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_conv3_7x1)      inception_B_conv5_7x1 = Conv2D(name="inception_B_conv5_7x1/1" + i,                                     filters=192 // filter_control,                                     kernel_size=(7, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_conv4_1x7)      inception_B_conv6_1x1 = Conv2D(name="inception_B_conv6_1x1/1" + i,                                     filters=128 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_B_conv7_1x7 = Conv2D(name="inception_B_conv7_1x7/1" + i,                                     filters=128 // filter_control,                                     kernel_size=(1, 7),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_conv6_1x1)      inception_B_conv8_7x1 = Conv2D(name="inception_B_conv8_7x1/1" + i,                                     filters=192 // filter_control,                                     kernel_size=(7, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_conv7_1x7)      inception_B_pool1_3x3 = AveragePooling2D(name="inception_B_pool1_3x3/1" + i,                                               pool_size=(3, 3),                                               strides=(1, 1),                                               padding='same')(input_layer)      inception_B_conv9_1x1 = Conv2D(name="inception_B_conv9_1x1/1" + i,                                     filters=192 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_B_pool1_3x3)      inception_B_conv10_1x1 = Conv2D(name="inception_B_conv10_1x1/1" + i,                                     filters=192 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_B_merge1 = concatenate(          [inception_B_conv5_7x1, inception_B_conv8_7x1, inception_B_conv9_1x1, inception_B_conv10_1x1])      return bn_relu(inception_B_merge1)  def inception_C(i, input_shape):      input_layer = input_shape      inception_C_conv1_1x1 = Conv2D(name="inception_C_conv1_1x1/1" + i,                                     filters=448 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_C_conv2_3x3 = Conv2D(name="inception_C_conv2_3x3/1" + i,                                     filters=384 // filter_control,                                     kernel_size=(3, 3),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_C_conv1_1x1)      inception_C_conv3_1x3 = Conv2D(name="inception_C_conv3_1x3/1" + i,                                     filters=384 // filter_control,                                     kernel_size=(1, 3),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_C_conv2_3x3)      inception_C_conv4_3x1 = Conv2D(name="inception_C_conv4_3x1/1" + i,                                     filters=384 // filter_control,                                     kernel_size=(3, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_C_conv2_3x3)      inception_C_merge1 = concatenate([inception_C_conv3_1x3, inception_C_conv4_3x1])      inception_C_conv5_1x1 = Conv2D(name="inception_C_conv5_1x1/1" + i,                                     filters=384 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_C_conv6_1x3 = Conv2D(name="inception_C_conv6_1x3/1" + i,                                     filters=384 // filter_control,                                     kernel_size=(1, 3),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_C_conv5_1x1)      inception_C_conv7_3x1 = Conv2D(name="inception_C_conv7_3x1/1" + i,                                     filters=384 // filter_control,                                     kernel_size=(3, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_C_conv5_1x1)      inception_C_merge2 = concatenate([inception_C_conv6_1x3, inception_C_conv7_3x1])      inception_C_pool1_3x3 = AveragePooling2D(name="inception_C_pool1_3x3/1" + i,                                               pool_size=(3, 3),                                               strides=(1, 1),                                               padding='same')(input_layer)      inception_C_conv8_1x1 = Conv2D(name="inception_C_conv8_1x1/1" + i,                                     filters=192 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(inception_C_pool1_3x3)      inception_C_conv9_1x1 = Conv2D(name="inception_C_conv9_1x1/1" + i,                                     filters=320 // filter_control,                                     kernel_size=(1, 1),                                     strides=(1, 1),                                     kernel_initializer='he_normal',                                     activation='relu',                                     padding='same',                                     kernel_regularizer=l1_l2(0.00001))(input_layer)      inception_C_merge3 = concatenate(          [inception_C_merge1, inception_C_merge2, inception_C_conv8_1x1, inception_C_conv9_1x1])      return bn_relu(inception_C_merge3)  def create_inception_v3(input_shape, nb_classes=10, small_mode=False):      input_layer = Input(input_shape)      x = before_inception(input_layer, small_mode)      # 3 x Inception A      for i in range(3):          x = inception_A(str(i), x)      # 5 x Inception B      for i in range(5):          x = inception_B(str(i), x)      # 2 x Inception C      for i in range(2):          x = inception_C(str(i), x)      x = AveragePooling2D((8, 8), strides=(1, 1))(x)      # Dropout      x = Dropout(0.8)(x)      x = Flatten()(x)      # Output      out = Dense(output_dim=nb_classes, activation='softmax')(x)      model = Model(input_layer, output=out, name='Inception-v3')      return model  if __name__ == "__main__":      with tf.device('/gpu:3'):          gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)          os.environ["CUDA_VISIBLE_DEVICES"] = "3"          tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,                                             log_device_placement=True,                                             gpu_options=gpu_options))          (x_train, y_train), (x_test, y_test) = cifar10.load_data()          # reorder dimensions for tensorflow          x_train = np.transpose(x_train.astype('float32') / 255., (0, 1, 2, 3))          x_test = np.transpose(x_test.astype('float32') / 255., (0, 1, 2, 3))          print('x_train shape:', x_train.shape)          print(x_train.shape[0], 'train samples')          print(x_test.shape[0], 'test samples')          # convert class vectors to binary class matrices          y_train = np_utils.to_categorical(y_train)          y_test = np_utils.to_categorical(y_test)          s = x_train.shape[1:]          batch_size = 128          nb_epoch = 10          nb_classes = 10          model = create_inception_v3(s, nb_classes)          model.summary()          plot_model(model, to_file="GoogLeNet-Inception-V3.jpg", show_shapes=True)          model.compile(optimizer='adadelta',                        loss='categorical_crossentropy',                        metrics=['accuracy'])          model.fit(x_train, y_train,                    batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,                    validation_data=(x_test, y_test), shuffle=True,                    callbacks=[])          # Model saving callback          checkpointer = ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss',                                         verbose=0,                                         save_best_only=False, save_weights_only=False, mode='auto')          print('Using real-time data augmentation.')          datagen_train = ImageDataGenerator(              featurewise_center=False,              samplewise_center=False,              featurewise_std_normalization=False,              samplewise_std_normalization=False,              zca_whitening=False,              rotation_range=0,              width_shift_range=0.125,              height_shift_range=0.125,              horizontal_flip=True,              vertical_flip=False)          datagen_train.fit(x_train)          history = model.fit_generator(datagen_train.flow(x_train, y_train, batch_size=batch_size, shuffle=True),                                        samples_per_epoch=x_train.shape[0],                                        nb_epoch=nb_epoch, verbose=1,                                        validation_data=(x_test, y_test),                                        callbacks=[lr_reducer, early_stopper, csv_logger, checkpointer])  

‍‍‍‍‍‍‍‍