豬年快樂之TensorFlow中實現word2vec及如何結構化TensorFlow模型
- 2019 年 10 月 6 日
- 筆記
豬年快樂之TensorFlow中實現word2vec及如何結構化TensorFlow模型
導語
今天是2019年新年第一天,首先祝福大家豬年大吉,在新的一年裡多多學習,多多鍛煉,身體健康,萬事如意!
本節學習來源斯坦福大學cs20課程,有關本屆源代碼已同步只至github,歡迎大家star與轉發,收藏!
cs20是一門對於深度學習研究者學習Tensorflow的課程,今天學習了四節,非常有收穫,並且陸續將內容寫入jupytebook notebook中,有關這個源代碼及倉庫地址,大家可以點擊閱讀原文或者直接複製下面鏈接!
直通車: https://github.com/Light-City/Translating_documents
今日日程
- word2vec
- Embedding visualization
- Structure your TensorFlow model
- Variable sharing
- Manage experiments
- Autodiff
在本天學習中,嘗試基於更複雜的模型word2vec創建一個模型,將使用它來描述變量,模型共享和管理。
Tensorflow中的word2vec
我們如何以有效的方式表達文字?
One-hot Representation
每個單詞由一個向量表示,只有一個1,其餘為0
例如:
Vocab: i, it, california, meh i = [1 0 0 0] it = [0 1 0 0] california = [0 0 1 0] meh = [0 0 0 1]
- 詞彙量可能很大=>大尺寸,低效計算
- 不能代表單詞之間的關係=>「anxious」和「nervous」是相似的,但會有完全不同的表現形式
詞嵌入
- 分佈式表示
- 連續值
- 低維度
- 捕獲單詞之間的語義關係
Tensorflow中實現word2vec
導包
import os os.environ['TF_CPP_MIN_LOG_LEVEL']='2' import numpy as np from tensorflow.contrib.tensorboard.plugins import projector import tensorflow as tf import utils import word2vec_utils
超參數
VOCAB_SIZE = 50000 BATCH_SIZE = 128 EMBED_SIZE = 128 # dimension of the word embedding vectors SKIP_WINDOW = 1 # the context window NUM_SAMPLED = 64 # number of negative examples to sample LEARNING_RATE = 1.0 NUM_TRAIN_STEPS = 100000 VISUAL_FLD = 'visualization' SKIP_STEP = 5000 DOWNLOAD_URL = 'http://mattmahoney.net/dc/text8.zip' EXPECTED_BYTES = 31344016 NUM_VISUALIZE = 3000 # number of tokens to visualize
def gen(): yield from word2vec_utils.batch_gen(DOWNLOAD_URL, EXPECTED_BYTES, VOCAB_SIZE, BATCH_SIZE, SKIP_WINDOW, VISUAL_FLD)
dataset = tf.data.Dataset.from_generator(gen, (tf.int32, tf.int32), (tf.TensorShape([BATCH_SIZE]), tf.TensorShape([BATCH_SIZE, 1]))) iterator = dataset.make_initializable_iterator() center_words, target_words = iterator.get_next()
skip-gram模型中的參數是矩陣形式,該矩陣的行向量是字嵌入向量。因此,矩陣的大小為[VOCAB_SIZE,EMBED_SIZE]。通常將相應的參數矩陣初始化為遵循隨機分佈,其中我們將其初始化以遵循均勻分佈。
embed_matrix = tf.get_variable('embed_matrix', shape=[VOCAB_SIZE, EMBED_SIZE],initializer=tf.random_uniform_initializer())
在skip-gram模型中,單詞最初是單熱編碼的並且乘以參數。最終,餘數都被計算出來,即使它們都是零。 TensorFlow提供了一個函數tf.nn.embedding_lookup
來解決這個問題。因此,只能通過該函數使用與批次的單詞對應的行的向量值。
函數原型
tf.nn.embedding_lookup( params, ids, partition_strategy='mod', name=None, validate_indices=True, max_norm=None )
因此,使用上述函數後,如下:
embed = tf.nn.embedding_lookup(embed_matrix, center_words, name='embedding')
現在我們需要定義損失函數。我們將使用NCE函數作為損失函數。我們已經在tf中使用了這個函數,所以讓我們使用它。 NCE功能的結構如下。
函數原型:
tf.nn.nce_loss( weights, biases, labels, inputs, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, partition_strategy='mod', name='nce_loss' )
(上面函數中的第三個參數實際上是輸入,第四個是標籤)
要使用NCE損失,我們分別定義nce_weight和nce_bias並定義損失函數。
nce_weight = tf.get_variable('nce_weight', shape=[VOCAB_SIZE, EMBED_SIZE], initializer=tf.truncated_normal_initializer(stddev=1.0 / (EMBED_SIZE ** 0.5))) nce_bias = tf.get_variable('nce_bias', initializer=tf.zeros([VOCAB_SIZE])) # define loss function to be NCE loss function loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, biases=nce_bias, labels=target_words, inputs=embed, num_sampled=NUM_SAMPLED, num_classes=VOCAB_SIZE), name='loss')
現在您只需要定義優化器。梯度下降優化器。
optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)
現在您可以運行定義的圖形。讓我們通過Session。
utils.safe_mkdir('checkpoints') with tf.Session() as sess: sess.run(iterator.initializer) sess.run(tf.global_variables_initializer()) total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps writer = tf.summary.FileWriter('graphs/word2vec_simple', sess.graph) for index in range(NUM_TRAIN_STEPS): try: loss_batch, _ = sess.run([loss, optimizer]) total_loss += loss_batch if (index + 1) % SKIP_STEP == 0: print('Average loss at step {}: {:5.1f}'.format(index, total_loss / SKIP_STEP)) total_loss = 0.0 except tf.errors.OutOfRangeError: sess.run(iterator.initializer) writer.close()
data/text8.zip already exists Average loss at step 4999: 65.5 Average loss at step 9999: 18.2 Average loss at step 14999: 9.5 Average loss at step 19999: 6.7 Average loss at step 24999: 5.7 Average loss at step 29999: 5.2 Average loss at step 34999: 5.0 Average loss at step 39999: 4.8 Average loss at step 44999: 4.8 Average loss at step 49999: 4.8 Average loss at step 54999: 4.7 Average loss at step 59999: 4.7 Average loss at step 64999: 4.6 Average loss at step 69999: 4.7 Average loss at step 74999: 4.6 Average loss at step 79999: 4.6 Average loss at step 84999: 4.7 Average loss at step 89999: 4.7 Average loss at step 94999: 4.6 Average loss at step 99999: 4.6
如何結構化TensorFlow模型
1.步驟
階段1:組裝圖表
- 加載數據(tf.data或佔位符)
- 參數定義
- 推理模型定義
- 損失函數的定義
- 優化器定義
階段2:執行計算
- 初始化所有變量
- 數據迭代器,初始化feed
- 運行推理模型(計算每個輸入的學習結果)
- cost計算
- 參數更新
2.封裝
定義函數進行封裝,例如:
def gen(): yield from word2vec_utils.batch_gen(DOWNLOAD_URL, EXPECTED_BYTES, VOCAB_SIZE, BATCH_SIZE, SKIP_WINDOW, VISUAL_FLD) def main(): dataset = tf.data.Dataset.from_generator(gen, (tf.int32, tf.int32), (tf.TensorShape([BATCH_SIZE]), tf.TensorShape([BATCH_SIZE, 1]))) word2vec(dataset)
3.命名空間
該圖顯示節點都是分散的。如果模型比word2vec稍微複雜一點,那麼很難看到圖形。那麼如果你能將這些圖表更好地組合在一起呢?使用tf.name_scope可以輕鬆進行分組。
tf.name_scope可以像這樣使用 with tf.name_scope(name_of_that_scope): # declare op_1 # declare op_2 # ...
with tf.name_scope('data'): iterator = dataset.make_initializable_iterator() center_words, target_words = iterator.get_next() with tf.name_scope('embed'): embed_matrix = tf.get_variable('embed_matrix', shape=[VOCAB_SIZE, EMBED_SIZE], initializer=tf.random_uniform_initializer()) embed = tf.nn.embedding_lookup(embed_matrix, center_words, name='embedding')
使用前:

使用後:

4.變量重複(variable_scope)
使用TensorFlow時,我有時想知道何時使用name_scope
和variable_scope
。
這一次,讓我們來看看variable_scope
。考慮具有兩個隱藏層和兩個輸入的神經網絡。
然後我們將定義和使用神經網絡,每次執行函數時,TensorFlow都會創建一組不同的變量。所以每次調用上面的two_hidden_layers()時,都會執行get_variable
來創建一個新變量。因此,出現以下錯誤消息(ValueError: Variable h1_weights already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
),因為它生成重複項。我們使用VarScope來防止這些變量的重複。
我們使用variable_scope來防止這些變量的重複。
def fully_connected(x, output_dim, scope): with tf.variable_scope(scope) as scope: w = tf.get_variable("weights", [x.shape[1], output_dim], initializer=tf.random_normal_initializer()) b = tf.get_variable("biases", [output_dim], initializer=tf.constant_initializer(0.0)) return tf.matmul(x, w) + b def two_hidden_layers(x): h1 = fully_connected(x, 50, 'h1') h2 = fully_connected(h1, 10, 'h2') with tf.variable_scope('two_layers') as scope: logits1 = two_hidden_layers(x1) scope.reuse_variables() logits2 = two_hidden_layers(x2)
5.圖集合(Graph collections)
在創建模型時,可能會出現將變量放在圖形的不同部分的情況。使用tf.get_collection
訪問一組特定的變量。
tf.get_collection( key, scope=None )
默認情況下,所有變量都在tf.GraphKeys.GLOBAL_VARIABLES
中。要使用'my_scope'
中的所有變量,您可以使用:
tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='my_scope')
如果要將變量集中的變量設置為trainable = True
,則可以使用tf.GraphKeys.TRAINABLE_VARIABLES
集合。
6.word2vec
我們將word2vec變成了一個較小的數據集,發現結果非常好。但實際上您需要更多數據集,因此需要花費大量時間。
模型越複雜,學習所需的時間就越多。例如,在機器翻譯領域,您必須至少學習一天,並且在某些情況下您必須學習更多知識。
如果我們學習一個需要幾天的模型,在模型完成學習之前我們不會知道結果。即使您中間有計算機問題,也無法檢查結果。
另一個問題是,當通過試驗各種因素來試驗模型時,很難對這些因素進行比較。
所以能夠在任何時間點停止訓練並能恢復運行十分關鍵。讓我們來看看我們在試驗模型時可以使用的一些功能。讓我們看看tf.train.Saver()
,TensorFlow的隨機狀態和可視化。
tf.train.Saver()
您可以使用tf.train.Saver()定期存儲模型的參數值。將圖形變量保存為二進制文件。該類的保存功能結構如下。
tf.train.Saver.save( sess, save_path, global_step=None, latest_filename=None, meta_graph_suffix='meta', write_meta_graph=True, write_state=True )
例如,如果我們每隔1000步保存一次計算圖的變量:
# 定義模型 # create a saver object saver = tf.train.Saver() # launch a session to execute the computation with tf.Session() as sess: # actual training loop for step in range(training_steps): sess.run([optimizer]) if (step + 1) % 1000 == 0: saver.save(sess, 'checkpoint_directory/model_name', global_step=global_step)
在TensorFlow中,你保存計算圖變量的那一步叫一個檢查點(checkpoint)。因為我們會建立很多個檢查點,在我們的模型中添加了一個名為global_step的變量有助於記錄訓練步驟。 你會在很多TensorFlow程序中看到這個變量,我們首先會創建它並初始化為0,然後將它設置成不用被訓練(因為我們不希望TensorFlow優化它)。
global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
需要將global_step作為參數傳遞給optimizer,讓它知道在每個訓練步驟對global_step進行累加。
optimizer = tf.train.AdamOptimizer(lr).minimize(loss, global_step=global_step)
要將變量值保存到checkpoints目錄中,我們使用:
saver.save(sess, 'checkpoints/model-name', global_step=global_step)
要恢復變量,我們用tf.train.Saver.restore(sess, save_path),例如用第10000步的checkpoint進行恢復:
saver.restore(sess, 'checkpoints/skip-gram-10000')
但是當然,我們只能在有checkpoint的時候才能加載變量,當沒有時重新訓練。TensorFlow允許我們使用tf.train.get_checkpoint_state(『directory-name/checkpoint』)
從一個文件夾讀取checkpoint。
ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint')) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path)
'checkpoint'文件會自動的跟蹤時間最近的checkpoint,它的內容像這樣:

所以word2vec的訓練循環像這樣:
initial_step = 0 utils.safe_mkdir('checkpoints') with tf.Session() as sess: sess.run(self.iterator.initializer) sess.run(tf.global_variables_initializer()) ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint')) # if that checkpoint exists, restore from checkpoint if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps writer = tf.summary.FileWriter('graphs/word2vec/lr' + str(self.lr), sess.graph) initial_step = self.global_step.eval() for index in range(initial_step, initial_step + num_train_steps): try: loss_batch, _, summary = sess.run([self.loss, self.optimizer, self.summary_op]) writer.add_summary(summary, global_step=index) total_loss += loss_batch if (index + 1) % self.skip_step == 0: print('Average loss at step {}: {:5.1f}'.format(index, total_loss / self.skip_step)) total_loss = 0.0 saver.save(sess, 'checkpoints/skip-gram', index) except tf.errors.OutOfRangeError: sess.run(self.iterator.initializer) writer.close()
查看'checkpoints'目錄,你會看到這些文件:
默認情況下,saver.save()保存計算圖的所有變量,這是TensorFlow推薦的。然而你也可以選擇保存什麼變量,在我們創建saver對象時將它們以list或dict傳入。
v1 = tf.Variable(..., name='v1') v2 = tf.Variable(..., name='v2') saver = tf.train.Saver({'v1': v1, 'v2': v2}) saver = tf.train.Saver([v1, v2]) saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})
tf.summary
通常我們使用matplotlib來使用TensorFlow可視化我們的損失,準確性等。使用TensorBoard可以輕鬆地可視化我們的匯總數據。
讓我們可視化損失,平均損失和準確性,這些通常是可視化值。可視化以標量圖,直方圖和圖像格式提供。
首先,我們在使用摘要操作後定義我們將用作名稱範圍的值。
def _create_summaries(self): with tf.name_scope("summaries"): tf.summary.scalar("loss", self.loss) tf.summary.scalar("accuracy", self.accuracy) tf.summary.histogram("histogram loss", self.loss) # because you have several summaries, we should merge them all # into one op to make it easier to manage self.summary_op = tf.summary.merge_all()
由於summary是一個操作,因此必須作為會話執行。
loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict=feed_dict)
現在你已經得到了summary,還需要將summary用FileWriter對象寫入文件中來進行可視化。
writer.add_summary(summary, global_step=step)
完整代碼
import os os.environ['TF_CPP_MIN_LOG_LEVEL']='2' import numpy as np from tensorflow.contrib.tensorboard.plugins import projector import tensorflow as tf import utils import word2vec_utils # Model hyperparameters VOCAB_SIZE = 50000 BATCH_SIZE = 128 EMBED_SIZE = 128 # dimension of the word embedding vectors SKIP_WINDOW = 1 # the context window NUM_SAMPLED = 64 # number of negative examples to sample LEARNING_RATE = 0.5 NUM_TRAIN_STEPS = 100000 VISUAL_FLD = 'visualization' SKIP_STEP = 5000 # Parameters for downloading data DOWNLOAD_URL = 'http://mattmahoney.net/dc/text8.zip' EXPECTED_BYTES = 31344016 NUM_VISUALIZE = 3000 # number of tokens to visualize class SkipGramModel: """ Build the graph for word2vec model """ def __init__(self, dataset, vocab_size, embed_size, batch_size, num_sampled, learning_rate): self.vocab_size = vocab_size self.embed_size = embed_size self.batch_size = batch_size self.num_sampled = num_sampled self.lr = learning_rate self.global_step = tf.get_variable('global_step', initializer=tf.constant(0), trainable=False) self.skip_step = SKIP_STEP self.dataset = dataset def _import_data(self): """ Step 1: import data """ with tf.name_scope('data'): self.iterator = self.dataset.make_initializable_iterator() self.center_words, self.target_words = self.iterator.get_next() def _create_embedding(self): """ Step 2 + 3: define weights and embedding lookup. In word2vec, it's actually the weights that we care about """ with tf.name_scope('embed'): self.embed_matrix = tf.get_variable('embed_matrix', shape=[self.vocab_size, self.embed_size], initializer=tf.random_uniform_initializer()) self.embed = tf.nn.embedding_lookup(self.embed_matrix, self.center_words, name='embedding') def _create_loss(self): """ Step 4: define the loss function """ with tf.name_scope('loss'): # construct variables for NCE loss nce_weight = tf.get_variable('nce_weight', shape=[self.vocab_size, self.embed_size], initializer=tf.truncated_normal_initializer(stddev=1.0 / (self.embed_size ** 0.5))) nce_bias = tf.get_variable('nce_bias', initializer=tf.zeros([VOCAB_SIZE])) # define loss function to be NCE loss function self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, biases=nce_bias, labels=self.target_words, inputs=self.embed, num_sampled=self.num_sampled, num_classes=self.vocab_size), name='loss') def _create_optimizer(self): """ Step 5: define optimizer """ self.optimizer = tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss, global_step=self.global_step) def _create_summaries(self): with tf.name_scope('summaries'): tf.summary.scalar('loss', self.loss) tf.summary.histogram('histogram loss', self.loss) # because you have several summaries, we should merge them all # into one op to make it easier to manage self.summary_op = tf.summary.merge_all() def build_graph(self): """ Build the graph for our model """ self._import_data() self._create_embedding() self._create_loss() self._create_optimizer() self._create_summaries() def train(self, num_train_steps): saver = tf.train.Saver() # defaults to saving all variables - in this case embed_matrix, nce_weight, nce_bias initial_step = 0 utils.safe_mkdir('checkpoints') with tf.Session() as sess: sess.run(self.iterator.initializer) sess.run(tf.global_variables_initializer()) ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint')) # if that checkpoint exists, restore from checkpoint if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps writer = tf.summary.FileWriter('graphs/word2vec/lr' + str(self.lr), sess.graph) initial_step = self.global_step.eval() for index in range(initial_step, initial_step + num_train_steps): try: loss_batch, _, summary = sess.run([self.loss, self.optimizer, self.summary_op]) writer.add_summary(summary, global_step=index) total_loss += loss_batch if (index + 1) % self.skip_step == 0: print('Average loss at step {}: {:5.1f}'.format(index, total_loss / self.skip_step)) total_loss = 0.0 saver.save(sess, 'checkpoints/skip-gram', index) except tf.errors.OutOfRangeError: sess.run(self.iterator.initializer) writer.close() def visualize(self, visual_fld, num_visualize): """ run "'tensorboard --logdir='visualization'" to see the embeddings """ # create the list of num_variable most common words to visualize word2vec_utils.most_common_words(visual_fld, num_visualize) saver = tf.train.Saver() with tf.Session() as sess: sess.run(tf.global_variables_initializer()) ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint')) # if that checkpoint exists, restore from checkpoint if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) final_embed_matrix = sess.run(self.embed_matrix) # you have to store embeddings in a new variable embedding_var = tf.Variable(final_embed_matrix[:num_visualize], name='embedding') sess.run(embedding_var.initializer) config = projector.ProjectorConfig() summary_writer = tf.summary.FileWriter(visual_fld) # add embedding to the config file embedding = config.embeddings.add() embedding.tensor_name = embedding_var.name # link this tensor to its metadata file, in this case the first NUM_VISUALIZE words of vocab embedding.metadata_path = 'vocab_' + str(num_visualize) + '.tsv' # saves a configuration file that TensorBoard will read during startup. projector.visualize_embeddings(summary_writer, config) saver_embed = tf.train.Saver([embedding_var]) saver_embed.save(sess, os.path.join(visual_fld, 'model.ckpt'), 1) def gen(): yield from word2vec_utils.batch_gen(DOWNLOAD_URL, EXPECTED_BYTES, VOCAB_SIZE, BATCH_SIZE, SKIP_WINDOW, VISUAL_FLD) def main(): dataset = tf.data.Dataset.from_generator(gen, (tf.int32, tf.int32), (tf.TensorShape([BATCH_SIZE]), tf.TensorShape([BATCH_SIZE, 1]))) model = SkipGramModel(dataset, VOCAB_SIZE, EMBED_SIZE, BATCH_SIZE, NUM_SAMPLED, LEARNING_RATE) model.build_graph() model.train(NUM_TRAIN_STEPS) model.visualize(VISUAL_FLD, NUM_VISUALIZE) if __name__ == '__main__': main()
INFO:tensorflow:Summary name histogram loss is illegal; using histogram_loss instead. data/text8.zip already exists Average loss at step 4999: 64.1 Average loss at step 9999: 17.3 Average loss at step 14999: 9.1 Average loss at step 19999: 6.3 Average loss at step 24999: 5.3 Average loss at step 29999: 4.9 Average loss at step 34999: 4.8 Average loss at step 39999: 4.7 Average loss at step 44999: 4.6 Average loss at step 49999: 4.6 Average loss at step 54999: 4.6 Average loss at step 59999: 4.6 Average loss at step 64999: 4.6 Average loss at step 69999: 4.6 Average loss at step 74999: 4.6 Average loss at step 79999: 4.6 Average loss at step 84999: 4.6 Average loss at step 89999: 4.6 Average loss at step 94999: 4.6 Average loss at step 99999: 4.5 INFO:tensorflow:Restoring parameters from checkpoints/skip-gram-99999
現在查看TensorBoard,在Scalars頁面你可以看到標量摘要圖,這是你的loss的摘要圖。



random seed in operation level
使用張量流時,有很多時候需要使用隨機值。您可以通過多種方式獲得隨機值。有一種方法可以控制此隨機值。種子用於區分兩種類型。
這是在操作步驟中分配隨機種子的方法。我們來看看下面的幾個例子,並學習如何使用它們。
1.在計算層面設置隨機種子。所有的隨機tensor允許在初始化時傳入隨機種子。
c = tf.random_uniform([], -10, 10, seed=2) with tf.Session() as sess: print(sess.run(c)) # >> 3.57493 print(sess.run(c)) # >> -5.97319
3.574932 -5.9731865
2.使用tf.Graph.seed
在計算圖層面設置隨機種子。
tf.set_random_seed(seed)
Autodiff(TensorFlow是怎樣計算梯度的)
張量流提供自動微分功能,並且有明確使用的功能。使用tf.gradients()
,我們可以將我們想要的函數區分為我們設置的變量。該功能的結構如下。
tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)
ys是導數函數,xs是導數。它可以通過幾個變量區分或通過鏈規則區分。考慮下面的例子。
x = tf.Variable(2.0) y = 2.0 * (x ** 3) grad_y = tf.gradients(y, x) with tf.Session() as sess: sess.run(x.initializer) print(sess.run(grad_y)) # >> 24.0
[24.0]
x = tf.Variable(2.0) y = 2.0 * (x ** 3) z = 3.0 + y ** 2 grad_z = tf.gradients(z, [x, y]) with tf.Session() as sess: sess.run(x.initializer) print(sess.run(grad_z)) # >> [768.0, 32.0] # 768 is the gradient of z with respect to x, 32 with respect to y
[768.0, 32.0]
所以問題是:為什麼我們還要學習如何計算梯度?為什麼Chris Manning和Richard Socher還要我們計算cross entropy and softmax的梯度?用手算梯度會不會到某一天就像因為發明計算器而使用手算平方根一樣過時嗎?
也許。但是現在,TensorFlow可以為我們計算梯度,但它不能讓我們直觀地知道要使用什麼函數。它不能告訴我們函數是否將會遭受梯度爆炸或梯度消失。我們仍然需要了解梯度以便理解為什麼一個模型可以工作但是另一個不行。
學習資料:
1.https://www.jianshu.com/p/d49ae48312d3
2.https://reniew.github.io/36/