卷積神經網路實戰MNIST
- 2019 年 10 月 6 日
- 筆記
導語
關於卷積神經網路理論的學習,可以看:卷積神經網路。
本節學習來源斯坦福大學cs20課程,有關本節源程式碼已同步只至github,歡迎大家star與轉發,收藏!
cs20是一門對於深度學習研究者學習Tensorflow的課程,今天學習第 六與七節,非常有收穫,並且陸續將內容寫入jupytebook notebook中,有關這個源程式碼及倉庫地址,大家可以點擊閱讀原文或者直接複製下面鏈接!
直通車: https://github.com/Light-City/Translating_documents
TensorFlow中的卷積
在TensorFlow中去做卷積,我們有很多內建的層可以使用。你可以輸入2維數據做1維卷積,輸入3維數據做2維卷積,輸入4維數據做3維卷積,最常用的是2維卷積。
# 函數模型 tf.nn.conv2d( input, filter, strides, padding, use_cudnn_on_gpu=True, data_format='NHWC', dilations=[1, 1, 1, 1], name=None) Input: Batch size (N) x Height (H) x Width (W) x Channels (C) Filter: Height x Width x Input Channels x Output Channels (e.g. [5, 5, 3, 64]) Strides: 4 element 1-D tensor, strides in each direction (often [1, 1, 1, 1] or [1, 2, 2, 1]) Padding: 'SAME' or 'VALID' Dilations: The dilation factor. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. Data_format: default to NHWC
作一個有趣的練習:在上面GitHub中的kernes.py文件中看到一些著名的核的值,在07_run_kernels.py中看到它們的用法。
用CNN處理MNIST
在第三課中學習了邏輯回歸處理MNIST,現在我們使用CNN來處理,看看結果如何!
將採用如下架構:兩個步長為1的卷積層,每個卷積層後跟一個relu激活層與最大池化層Maxpool,最後跟兩個全連接層。

1.卷積層
- 輸入尺寸(W)
- 過濾器尺寸(F)
- 步長(S)
- 零填充(P)
在定義函數之前,讓我們看一下獲取輸出大小的公式。當您具有上述輸入值時,輸出的大小如下所示:

在我們的MNIST模型中,輸入為28×28,濾波器為5×5。並且步幅使用1和填充使用2。因此,輸出的大小如下:

def conv_relu(inputs, filters, k_size, stride, padding, scope_name): with tf.variable_scope(scope_name, reuse=tf.AUTO_REUSE) as scope: # rgb通道 in_channels = inputs.shape[-1] # 卷積核 kernel = tf.get_variable('kernel', [k_size, k_size, in_channels, filters], initializer=tf.truncated_normal_initializer()) biases = tf.get_variable('biases', [filters], initializer=tf.random_normal_initializer()) # 卷積結果 conv = tf.nn.conv2d(inputs, kernel, strides=[1, stride, stride, 1], padding=padding) # relu層對卷積結果處理 return tf.nn.relu(conv + biases, name=scope.name)
2.池化層
池化可減少要素圖的維數,提取要素並縮短執行時間。
通常使用max-pooling或average-pooling。
由於在此模型中使用了max-pooling,因此我們定義了max-pooling函數,如下所示:
- 輸入尺寸(W)
- 池化大小(K)
- 池化步長(S)
- 池化零填充(P)

在我們的模型中,輸入是28×28,池大小是2×2,補長是2,零填充,所以我們將輸出大小如下。

def maxpool(inputs, ksize, stride, padding='VALID', scope_name='pool'): with tf.variable_scope(scope_name, reuse=tf.AUTO_REUSE) as scope: pool = tf.nn.max_pool(inputs, ksize=[1, ksize, ksize, 1], strides=[1, stride, stride, 1], padding=padding) return pool
3.全連接層
def fully_connected(inputs, out_dim, scope_name='fc'): with tf.variable_scope(scope_name, reuse=tf.AUTO_REUSE) as scope: in_dim = inputs.shape[-1] w = tf.get_variable('weights', [in_dim, out_dim], initializer=tf.truncated_normal_initializer()) b = tf.get_variable('biases', [out_dim], initializer=tf.constant_initializer(0.0)) out = tf.matmul(inputs, w) + b return out
4.組合調用
現在讓我們通過組合我們創建的函數來創建整個模型。您可以使用我們按順序創建的功能。
需要注意的一點是,當您在最後一次池化後轉到fc層時,必須通過將一維向量的大小乘以原始數組的每個維度的長度來重新整形三維數組的一維數組。
最後,將dropout應用到fc層。
def inference(self): conv1 = conv_relu(inputs=self.img, filters=32, k_size=5, stride=1, padding='SAME', scope_name='conv1') pool1 = maxpool(conv1, 2, 2, 'VALID', 'pool1') conv2 = conv_relu(inputs=pool1, filters=64, k_size=5, stride=1, padding='SAME', scope_name='conv2') pool2 = maxpool(conv2, 2, 2, 'VALID', 'pool2') feature_dim = pool2.shape[1] * pool2.shape[2] * pool2.shape[3] pool2 = tf.reshape(pool2, [-1, feature_dim]) fc = tf.nn.relu(fully_connected(pool2, 1024, 'fc')) dropout = tf.layers.dropout(fc, self.keep_prob, training=self.training, name='dropout') self.logits = fully_connected(dropout, self.n_classes, 'logits')
5.loss
def loss(self): ''' define loss function use softmax cross entropy with logits as the loss function compute mean cross entropy, softmax is applied internally ''' # with tf.name_scope('loss'): entropy = tf.nn.softmax_cross_entropy_with_logits(labels=self.label, logits=self.logits) self.loss = tf.reduce_mean(entropy, name='loss')
6.評估
在訓練時,需要評估每個epoch的準確率。
def eval(self): ''' Count the number of right predictions in a batch ''' with tf.name_scope('predict'): preds = tf.nn.softmax(self.logits) correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(self.label, 1)) self.accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32))
7.運行
7.1 數據shape變化
conv:Tensor("conv1_1:0", shape=(?, 28, 28, 32), dtype=float32) pool1:Tensor("pool1/MaxPool:0", shape=(?, 14, 14, 32), dtype=float32) conv2:Tensor("conv2_1:0", shape=(?, 14, 14, 64), dtype=float32) pool2:Tensor("pool2/MaxPool:0", shape=(?, 7, 7, 64), dtype=float32) feature_dim:3136 pool2:Tensor("Reshape:0", shape=(?, 3136), dtype=float32) fc:Tensor("fc/add:0", shape=(?, 1024), dtype=float32) dropout:Tensor("relu_dropout/mul:0", shape=(?, 1024), dtype=float32) self.logits:Tensor("logits/add:0", shape=(?, 10), dtype=float32)

7.2 運行精度
... ... ... Loss at step 19: 15894.556640625 Loss at step 39: 8952.953125 Loss at step 59: 6065.05322265625 Loss at step 79: 2913.25048828125 Loss at step 99: 2803.952392578125 Loss at step 119: 1727.0462646484375 Loss at step 139: 2886.213134765625 Loss at step 159: 2611.1953125 Loss at step 179: 1743.4693603515625 Loss at step 199: 898.48046875 Loss at step 219: 2171.2890625 Loss at step 239: 475.59246826171875 Loss at step 259: 1289.218017578125 Loss at step 279: 933.6298828125 Loss at step 299: 614.7198486328125 Loss at step 319: 1771.800048828125 Loss at step 339: 1211.3431396484375 Loss at step 359: 1274.873291015625 Loss at step 379: 820.397705078125 Loss at step 399: 633.9185791015625 Loss at step 419: 830.4837646484375 Average loss at epoch 0: 3882.1572788682097 ... ... ... Average loss at epoch 29: 3.834926734323245 Took: 13.498510360717773 seconds Accuracy at epoch 29: 0.9825 Took: 0.7468070983886719 seconds