Caffe中的Siamese网络

2019 年 12 月 25 日
筆記

Siamese原意是”泰国的，泰国人”，而与之相关的一个比较常见的词是”Siamese twin”，意思是是”连体双胞胎”，所以Siamemse Network是从这个意思转变而来，指的是结构非常相似的两路网络，分别训练，但共享各个层的参数，在最后有一个连接的部分。Siamese网络对于相似性比较的场景比较有效。此外Siamese因为共享参数，所以能减少训练过程中的参数个数。这里的slides讲解了Siamese网络在深度学习中的应用。这里我参照Caffe中的Siamese文档，以LeNet为例，简单地总结下Caffe中Siamese网络的prototxt文件的写法。

1. Data层

Data层输入可以是LMDB和LevelDB格式的数据，这种格式的数据可以通过参照$CAFFE_ROOT/examples/siamese/create_mnist_siamese.sh来生成（该脚本是从MNIST原先的格式生成DB文件，如果要从JPEG格式的图片来生成DB文件，需要进行一定的修改）。 Data层有2个输出，一个是pair_data，表示配对好的图片对;另一个是sim，表示图片对是否属于同一个label。

2. Slice层

Slice层是Caffe中的一个工具层，功能就是把输入的层(bottom)切分成几个输出层(top)。官网给出的如下例子:

layer {    name: "slicer_label"    type: "Slice"    bottom: "label"    ## Example of label with a shape N x 3 x 1 x 1    top: "label1"    top: "label2"    top: "label3"    slice_param {      axis: 1      slice_point: 1      slice_point: 2    }  }

完成的功能就是把slicer_label划分成3份。axis表示划分的维度，这里1表示在第二个维度上划分;slice_point表示划分的中间的点，分别是1，2表示在1-2层和2-3层之间进行一个划分。在Siamese网络中，为了对数据对进行单独的训练，需要在Data层后面接一个Slice层，将数据均匀地划分为2个部分。

3. 共享层

后面的卷积层，Pooling层，Relu层对于两路网络是没有区别的，所以可以直接写好一路后，复制一份在后面作为另一路，不过得将name，bottom和top的名字改成不一样的(示例中第二路的名字都是在第一路对应层的名字后面加了个_p表示pair)。

4. 如何共享参数

两路网络如何共享参数呢？Caffe里是这样实现的:在每路中对应的层里面都定义一个同名的参数，这样更新参数的时候就可以共享参数了。如下面的例子:

...    layer {    name: "ip2"    type: "InnerProduct"    bottom: "ip1"    top: "ip2"    param {      name: "ip2_w"      lr_mult: 1    }  }    ...    layer {    name: "ip2_p"    type: "InnerProduct"    bottom: "ip1_p"    top: "ip2_p"    param {      name: "ip2_w"      lr_mult: 1    }  }    ...

上面例子中，两路网络对应层都定义了ip2_w的参数，这样训练的时候就可以共享这个变量的值了。

5. feature层

在2个全连接层后，我们将原来的分类的sofatmax层改为输出一个2维向量的全连接层:

layer {    name: "feat"    type: "InnerProduct"    bottom: "ip2"    top: "feat"    param {      name: "feat_w"      lr_mult: 1    }    param {      name: "feat_b"      lr_mult: 2    }    inner_product_param {      num_output: 2      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }

6. ContrastiveLoss层

在两个feature产生后，就可以利用2个feature和前面定义的sim来计算loss了。Siamese网络采用了一个叫做“ContrastiveLoss”的loss计算方式，如果两个图片越相似，则loss越小;如果越不相似，则loss越大。

layer {    name: "loss"    type: "ContrastiveLoss"    bottom: "feat"    bottom: "feat_p"    bottom: "sim"    top: "loss"    contrastive_loss_param {      margin: 1    }  }

7. 网络结构的可视化

上面就是所有的网络结构，利用$CAFFE_ROOT/python/draw_net.py这个脚本可以画出网络结构，如图所示:

整个网络的完整内容如下:

name: "mnist_siamese_train_test"  layer {    name: "pair_data"    type: "Data"    top: "pair_data"    top: "sim"    include {      phase: TRAIN    }    transform_param {      scale: 0.00390625    }    data_param {      source: "examples/siamese/mnist_siamese_train_leveldb"      batch_size: 64    }  }  layer {    name: "pair_data"    type: "Data"    top: "pair_data"    top: "sim"    include {      phase: TEST    }    transform_param {      scale: 0.00390625    }    data_param {      source: "examples/siamese/mnist_siamese_test_leveldb"      batch_size: 100    }  }  layer {    name: "slice_pair"    type: "Slice"    bottom: "pair_data"    top: "data"    top: "data_p"    slice_param {      slice_dim: 1      slice_point: 1    }  }  layer {    name: "conv1"    type: "Convolution"    bottom: "data"    top: "conv1"    param {      name: "conv1_w"      lr_mult: 1    }    param {      name: "conv1_b"      lr_mult: 2    }    convolution_param {      num_output: 20      kernel_size: 5      stride: 1      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "pool1"    type: "Pooling"    bottom: "conv1"    top: "pool1"    pooling_param {      pool: MAX      kernel_size: 2      stride: 2    }  }  layer {    name: "conv2"    type: "Convolution"    bottom: "pool1"    top: "conv2"    param {      name: "conv2_w"      lr_mult: 1    }    param {      name: "conv2_b"      lr_mult: 2    }    convolution_param {      num_output: 50      kernel_size: 5      stride: 1      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "pool2"    type: "Pooling"    bottom: "conv2"    top: "pool2"    pooling_param {      pool: MAX      kernel_size: 2      stride: 2    }  }  layer {    name: "ip1"    type: "InnerProduct"    bottom: "pool2"    top: "ip1"    param {      name: "ip1_w"      lr_mult: 1    }    param {      name: "ip1_b"      lr_mult: 2    }    inner_product_param {      num_output: 500      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "relu1"    type: "ReLU"    bottom: "ip1"    top: "ip1"  }  layer {    name: "ip2"    type: "InnerProduct"    bottom: "ip1"    top: "ip2"    param {      name: "ip2_w"      lr_mult: 1    }    param {      name: "ip2_b"      lr_mult: 2    }    inner_product_param {      num_output: 10      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "feat"    type: "InnerProduct"    bottom: "ip2"    top: "feat"    param {      name: "feat_w"      lr_mult: 1    }    param {      name: "feat_b"      lr_mult: 2    }    inner_product_param {      num_output: 2      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "conv1_p"    type: "Convolution"    bottom: "data_p"    top: "conv1_p"    param {      name: "conv1_w"      lr_mult: 1    }    param {      name: "conv1_b"      lr_mult: 2    }    convolution_param {      num_output: 20      kernel_size: 5      stride: 1      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "pool1_p"    type: "Pooling"    bottom: "conv1_p"    top: "pool1_p"    pooling_param {      pool: MAX      kernel_size: 2      stride: 2    }  }  layer {    name: "conv2_p"    type: "Convolution"    bottom: "pool1_p"    top: "conv2_p"    param {      name: "conv2_w"      lr_mult: 1    }    param {      name: "conv2_b"      lr_mult: 2    }    convolution_param {      num_output: 50      kernel_size: 5      stride: 1      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "pool2_p"    type: "Pooling"    bottom: "conv2_p"    top: "pool2_p"    pooling_param {      pool: MAX      kernel_size: 2      stride: 2    }  }  layer {    name: "ip1_p"    type: "InnerProduct"    bottom: "pool2_p"    top: "ip1_p"    param {      name: "ip1_w"      lr_mult: 1    }    param {      name: "ip1_b"      lr_mult: 2    }    inner_product_param {      num_output: 500      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "relu1_p"    type: "ReLU"    bottom: "ip1_p"    top: "ip1_p"  }  layer {    name: "ip2_p"    type: "InnerProduct"    bottom: "ip1_p"    top: "ip2_p"    param {      name: "ip2_w"      lr_mult: 1    }    param {      name: "ip2_b"      lr_mult: 2    }    inner_product_param {      num_output: 10      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "feat_p"    type: "InnerProduct"    bottom: "ip2_p"    top: "feat_p"    param {      name: "feat_w"      lr_mult: 1    }    param {      name: "feat_b"      lr_mult: 2    }    inner_product_param {      num_output: 2      weight_filler {        type: "xavier"      }      bias_filler {        type: "constant"      }    }  }  layer {    name: "loss"    type: "ContrastiveLoss"    bottom: "feat"    bottom: "feat_p"    bottom: "sim"    top: "loss"    contrastive_loss_param {      margin: 1    }  }

8. 训练过程

训练过程与别的网络是一样的，这里就不具体展开了。