通过示例学习PYTORCH

2022 年 2 月 11 日
笔记
PyTorch官网-中文教程

注意：这是旧版本的PyTorch教程的一部分。你可以在Learn the Basics查看最新的开始目录。

该教程通过几个独立的例子较少了PyTorch的基本概念。

核心是：PyTorch提供了两个主要的特性：

一个n维的Tensor，与Numpy相似但可以在GPU上运行
构建和训练神经网络的自动微分

我们将使用一个三阶多项式拟合 \(y=sin(x)\) 的问题作为我们的运行示例。该网络会有4个参数，将使用梯度下降来训练，通过最小化神经网络输出和真值之间的欧氏距离来拟合随机数据。

Tensors

热身:numpy

在介绍PyTorch之前，我们首先使用numpy实现网络

Numpy提供了一个n维的array对象，以及对数组操作的多种方法。Numpy是一个用于科学计算的通用框架，它没有关于计算图、深度学习、梯度的任何内容。但是我们可以利用numpy操作，通过人工实现贯穿网络的前向和后向传递，从而简单的向sin函数拟合一个三阶多项式。

# -*- coding: utf-8 -*-
import numpy as np
import math

# Create random input and output data
x = np.linspace(-math.pi, math.pi, 2000)  # 生成含有2000个数的-π到π的等差数列
y = np.sin(x)

# Randomly initialize weights
a = np.random.randn() # 返回浮点数
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predict y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    
    # Compute and print loss
    loss = np.square(y_pred - y).sum() # 所有样本与真值的差值平方的和
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y) # loss关于y_pred的偏导（梯度），这里没有对所有样本求和
    grad_a = grad_y_pred.sum() # 这里及下面都要对所有样本得到的梯度求和
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()
    
    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f"Result: y = {a} + {b} x + {c} x^2 + {d} x^3")

PyTorch: Tensors

Numpy是一个强大的框架，但是它无法使用GPUs加速数值计算。对于现代的深度神经网络，GPUs通常提供了50倍或更高的加速性能，所以很遗憾，numpy对于现代的深度学习是不够的。

现在介绍PyTorch基础中的基础：Tensor。PyTorch Tensor概念上来说与numpy array相同：一个Tensor就是一个n维数组，并且PyTorch提供了许多用于tensor的操作。在幕后，张量可以跟踪计算图和梯度，但它们也可用作科学计算的通用工具。

而且不像numpy，PyTorch Tensors可以使用GPUs加速数值计算。简单地制定正确的设备，即可在GPU上运行PyTorch tensor。

这里我们使用PyTorch Tensors为sin函数拟合一个3阶多项式。像上面的numpy例子一样，我们需要手动实现贯穿网络的前向和后向传递：

# -*- coding: utf-8 -*-

import torch
import math

dtype = torch.float
device = torch.device('cpu')
# device = torch.device('cuda:0') # Uncomment this to run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item() # .item()是取tensor的数值
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f"Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3")

Autograd

PyTorch: Tensors and autograd

在上面的例子中，我们必须手动实现神经网络的前向和后向传递。手动实现后向传递对于小型的只有两层的网络不算什么，但是对于大型复杂的网络的将变得非常困难。

幸运的是，我们可以使用自动微分来使神经网络反向传递的计算自动化。PyTorch中的autograd包提供了该功能。当使用autograd，神经网络前向传递将定义一个计算图，图中的节点是Tensor，edges是从输入tensor产生输出tensor的函数。然后通过该图，反向传播可以轻松地计算梯度。

这听起来很复杂，在实践中使用却非常简单。每个Tensor表示计算图中的一个节点。如果 x 是一个Tensor，它有属性 x.requires_grad=True，那么 x.grad 就是另一个保存x关于一些标量值的梯度的tensor。

这里，我们使用PyTorch tensors和autograd实现了拟合3阶多项式的例子；现在我们不再需要手动实现网络的反向传递了。

# -*- coding: utf-8 -*-
import torch
import math

dtype = torch.float
device = torch.device('cpu')
# device = torch.device('cuda:0') # Uncomment this to run on GPU

# Create Tensors to hold input and outputs.
# 默认情况下，requires_grad=False, 表示在反向传递中，无需计算关于这些tensrs的的梯度
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights.对于一个3阶多项式，我们需要4个权重参数：
# y = a + b x + c x^2 + d x^3
# 设置requires_grad=True表示我们想要在反向传递中计算关于这些Tensors的梯度
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # 前向传递：使用tensor操作计算预测值y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # 使用tensor操作计算和打印loss
    # 现在loss是一个shape为（1，）的Tensor
    # loss.item() 获得loss中保存的标量值
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # 使用autograd计算反向传递。该调用将会计算loss关于所有具有requires_grad=True属性的tensor的梯度
    # 调用之后，a.grad, b.grad, c.grad, d.grad将分别称为保存loss关于a,b,c,d的梯度的Tensor
    loss.backward()
    
    # 使用梯度下降手动更新权重。包围在torch.no_grad()进行该操作是因为
    # 权重具有requires_grad=True属性，但我们不需要在autograd中跟踪该操作：
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # 更新权重后，手动地将梯度置为0，不清零会累加
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

PyTorch: 定义个新的autograd函数

在底层，每个原始的autograd操作实际是两个在tensor上操作的函数，forward函数计算从输入张量得到的输出张量。backward函数

在PyToch中，我们可以通过定义一个 torch.autograd.Function 子类，简单地定义一个autograd操作，并实现 forward 和 backward 函数。然后，我们可以通过构造一个实例并向函数一样调用它，传递包含输入数据的Tensor来使用我们新的autograd操作符。

在这个例子中，我们定义了一个模型 \(y = a + b P_3(c + dx)\) 来代替 \(y = a + bx + cx^2 + dx^3\)，\(P_3(x) = \frac{1}{2}(5x^3-3x)\)，即3阶勒让德多项式，我们编写了自己的autograd函数，实现了\(P_3\)的前向和后向计算，并使用它来实现我们的模型。

# -*- coding: utf-8 -*-
import torch
import math

class LegendrePolynomial3(torch.autograd.Function):
    """
    我们可以通过继承torch.autograd.Function来实现自定义autograd Functions。
    Function和实现对Tensor进行操作的前向和反向传递。
    """
    
    @staticmethod
    def forward(ctx, input):
        """
        前向传递，我们接收包含输入的Tensor并返回包含输出的Tensor。ctx是一个上下文对象，可用于储存信息以进行反向计算。
        你可以使用ctx.save_for_backward方法缓存任意对象以用于反向传递。
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)
    
    @staticmethod
    def backward(ctx, grad_output):
        """
        后向传递，我们接收了一个包含loss关于output的梯度的Tenor，我们需要计算loss关于input的梯度??? 
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 -1)

dtype = torch.float
device = torch.device('cpu')
# device = torch.device('cuda:0') # Uncomment this to run on GPU

# 构建tensors保存input和output
# 默认情况下，requires_grad=False, 表明我们在后向传递中无需计算关于这些tensor的梯度
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# 创建权重tensor。例如，我们需要4个权重参数：y = a + b * P3(c + d * x)
# 为了确保收敛，这些权重的初始化值需要与正确的结果相近
# 设置requires_grad=True表示我们希望在后向传递中计算关于这些tensor的梯度
a = torch,full((), 0.0, device=device, dtype=dtype, requires_grad=True) # 创建元素全为0.0的tensor
b = torch,full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch,full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch,full((), 0.3, device=device, dtype=dtype, requires_grad=True)

learning_rate = 5e-6
for t in range(2000):
    # 为了应用我们的Function，使用Function.apply，并赋为'P3'
    P3 = LegendrePolynomial3.apply
    
    ## 前向传递：计算预测值y_pred，使用自动以的autograd操作计算P3
    y_pred = a + b * P3(c + d * x)
    
    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 9:
        print(t, loss.item())
    
    # Use autograd to compute the backward pass
    loss.backward()

    # Update weights using gradient descent
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')

nn module

PyTorch: nn

计算图和autograd是定义复杂运算符合自动求导的非常强大的工具，但是对于大型神经网络，原生的autograd就显得有些低级了。

构建神经网络时，我们常会思考将计算放入layers，它包含训练时将被优化的learnable parameters。

在TensorFlow中，类似Keras、TensorFlow-Slim,TFLearn等库在原生计算图上提供了更高级别的抽象，这对于构建神经网络很有用。

在PyTorch中，nn 库同样为这个目标服务。nn 库定义了Modules的集合，它与神经网络层大致对等。一个Module接受输入Tensors，计算输出Tensors，但也可能保持内部状态，例如包含可学习参数的Tensors。nn 库还定义了训练神经网络时常用的损失函数的集合。

该例中，我们使用 nn 库实现我们的多项式模型网络：

# -*- coding: utf-8 -*-
import torch
import math

# Create Tensors to hold input and outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 对于这个例子，输出的y是（x, x^2, x^3）的线性函数，所以
# 我们可以将它认为是一个线性神经网络层。
# 准备tensor(x, x^2, x^3)
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p) # 增加维度，原来是(2000,)，现在是(2000, 1)
# 在上述代码中，x.unsqueeze(-1)的shape是(2000, 1)，p的shape是(3,)，
# "broadcasting semantics"将会获得shape为(2000, 3)的张量

# 使用nn库将我们的模型定义为一系列层。nn.Sequential是一个包含其它Modules的Module，按顺序使用以产生输出。
# 线性Module使用线性函数从输入计算输出，并持有内部张量的权重和偏差。
# 为了匹配 'y'的shape，Flatten层将线性层的输出展平至1D tensor，

model = torch.nn.Sequential(torch.nn.Linear(3, 1),
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# nn库还包含了流行的损失函数的定义
# 该例中，我们将使用Mean Squared Error (MSE)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):

    # 前向传递：将x传入模型计算预测值y。Module对象重写了__call__操作，所以你可以向函数一样调用它们。
    y_pred = model(xx)
    
    # 计算和打印loss
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())
    
    # 在后向传递前将梯度置0
    model.zero_grad()
    
    # 后向传递：计算loss关于所有模型可学习参数的梯度。
    # 每个Module的参数都保存在具有requires_grad=True属性的Tensors中，
    # 所以下面的调用将为模型中所有可学习参数计算梯度。
    loss.backward()

    # 使用梯度下降更新权重。每个参数都是一个Tensor，所以我们可以像之前那样访问它的梯度
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# 你可以像访问列表的item一样访问'model'的第一个layer
linear_layer = model[0]

# 对于线性层，它的参数被保存为'weight'和'bias'
print(f'Result: y = {learn_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x +
{linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

PyTorch: optim

到目前为止，我们已经通过使用 torch.no_grad() 手动改变持有可学习参数的张量来更新模型参数。这对于如随机梯度下降这样的简单优化算法没有什么问题，但在实践中，我们常使用更复杂的优化器如AdaGrad，RMSProp，Adam等来训练网络。

PyTorch的 optim 库提供了常用优化算法的实现

下例中，我们将首先使用 nn 库定义我们的模型，使用 optim 库提供的RMSProp算法优化模型。

# -*- coding: utf-8 -*-
import torch
import math

# Create Tensors to hold input and outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 准备输入张量(x, x^2, x^3)
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# 使用nn库定义模型和损失函数
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1)
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# 使用optim库定义优化器更新模型参数，这里使用RMSProp，optim库包含许多其它优化算法。
# RMSProp构造函数的第一个参数是告诉优化器应该更新哪些Tensors。
learning_rate = 1e-3
optitmizer = torch.optim.RMSProp(model.parameters(), lr=learning_rate)

for t in range(2000):
    # 前向传递：将x传入模型，计算预测值y
    y_pred = model(xx)
    
    # 计算和打印loss
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # 在反向传递之前，使用optimizer对象将所有将更新的变量（即模型的可学习参数）的梯度置0，这是因为默认情况下，每当调        
    # 用.backward()，梯度在缓存中是累加的（而不是重写），查阅torch.autograd.backward()获得更多细节。
    optimizer.zero_grad()

    # 后向传递：计算loss关于模型参数的梯度
    loss.backward()

    # 在optimizer上调用step函数用于更新其参数
    optimizer.step()

linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x +
{linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

PyTorch: Custom nn Modules

有些时候你想指定比一系列现有Modules更复杂的模型，那么可以通过继承 nn.Module来定义自己的Modules，并且定义 forward，用以接收输入Tensors，利用其它modules或其它在Tensor上的autograd操作符产生输出Tensor。

实现3阶多项式，作为一个自定义的Module模块的子类。

# -*- coding: utf-8 -*-
import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        """
        在构造函数中，我们实例化了4个参数，并将它们赋为成员参数
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        在前向传递中，接收输入数据tensor，也要返回输出数据的tensor。可以使用构造函数中定义的Modules，
        也可以是其它任意Tensor上的操作。
        """
        return self.a + self.b * x + self.c x ** 2 + self.d * x ** 3

    def string(self):
        """
        就像Python的其它类一样，你可以在PyTorch modules上自定义方法
        """
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'

# 创建tensor保存input和output
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 通过实例化之前定义的类构造模型
model = Polynomial3()

# 构造损失函数和优化器。SGD构造函数中调用的model.parameters()包含可学习参数（由torch.nn.Parameter定义的模型成员）
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

PyTorch: 控制流 + 权重共享

作为动态图和权重共享的例子，我们实现了一个非常奇怪的模型：一个3到5阶的多项式，在每一次前向传递时，选择一个3到5之间的随机值作为阶，并且多次重用相同的权重计算第4和第5阶。

对于这个模型，我们可以使用普通的Python控制流实现循环，并且在定义前向传递时，可以通过简单的多次复用相同的参数实现权重共享。

我们可以简单地将其作为Module子类来实现模型。

# -*- coding: utf-8 -*-
import random
import torch
import math

class DynamicNet(torch.nn.Module):
    def __init__(self):
        """
        构造函数中，我们实例化5个参数并将其赋为成员
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        对于模型的前向传递，我们随机选择4，5并重用参数e计算这两个阶的共享
        
        因为每次前向传递都会构建一个动态计算图，当定义模型前向传递时，我们可以使用普通的Python控制流语句，如循环或条件语句

        这里我们还可以看到，定义计算图时，多次重用相同的参数时完全安全的
        """
        y = self.a + self.b + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4, 6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
      """
      就像Python中的其它任何类一样，你还可以在PyTorch modules上自定义方法
      """
      return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'

# 创建Tensors保存input和outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 通过实例化上面定义的类构造模型
model = DynamicNet()

# 构造损失函数和优化器，使用vanilla（batch）梯度下降训练这个奇怪的网络有些困难，我们使用momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)

for t in range(30000):
    # 前向传递：将x传入模型，计算预测值y
    y_pred = model(x)
    
    # 计算并打印loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    # 梯度归0，反向传递，权重更新
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

Tags: PyTorch官网-中文教程