利用深度学习来给机器学习赋能(1)

2021 年 3 月 29 日
AI

这篇主要讲下将torch用于lightgbm的一个比较有意思的操作,和之前的autograd的方式类似,不过更简单方便.

众所周知,lightgbm和xgboost这类框架的内置损失函数都不够”风骚”,仅仅实现了常见的一些损失函数,而作为风骚之首,深度学习,各种稀奇古怪的loss满天飞,下面用一个例子简单介绍一下,怎么把torch内置的或者是github上torch实现的一些有意思的损失函数放到lightgbm中.

其实方法很简单:

第一种:使用torch自带的损失函数

我们以torch的smooth l1损失函数为例子:

torch.nn.SmoothL1Loss(reduction='mean')

其中:

import torch
from torch import autograd
import numpy as np
y_pred=np.array([1.5,1.4,1.3,1.2,1.4])
y_pred=torch.from_numpy(y_pred)
y_pred.requires_grad=True

y_true=np.array([1.2,1.3,1.2,1.1,1.5])
y_true=torch.from_numpy(y_true)
y_true.requires_grad=False

因为lgb或者xgb的内置损失函数输出为numpy形式的y_pred和y_true,所以这个地方需要注意要将numpy转化为tensor,torch将numpy转tensor的方式有两种,一种是torch.tensor,一种是torch.from_numpy,前者开辟了新的内存空间来存放原始的numpy,也就是重新复制了一份数据,速度相对慢一些,而torch.from_numpy和torch.numpy都是共享内存的,转换速度很快.

然后注意需要将我们要求梯度的变量的requires_grad设置为True,这样torch才知道这个向量是一个变量,才能在后续的计算中对其计算梯度.

然后就很简单了:

from torch import autograd
loss=torch.nn.SmoothL1Loss()(y_pred,y_true)
dy_dx = torch.autograd.grad(loss,y_pred,create_graph=True,retain_graph=True)[0]

注意y_pred在前，y_true在后,torch这个地方的loss的设计比较反传统…

注意这里要create_graph创建计算图然后retain_graph保留计算图,因为二阶梯度是在一阶梯度的基础上进行的,所以要保留计算一阶梯度的计算图便于后续的程序在这一步的计算图上继续计算, 可以想象为计算图上的不同程序运行的计算可以按照顺序连接起来:

这样我们就得到了我们的一阶梯度了:

dy_dx2 = torch.autograd.grad(dy_dx,y_pred,
                          grad_outputs=torch.ones(y_pred.shape), 
                          create_graph=False)[0]

计算二阶梯度的时候需要注意,要设置grad_outputs=torch.ones(y_pred.shape),

主要是因为计算一阶梯度的时候,我们的loss是一个标量

torch这个地方的设计非常的像小学生思考的过程(这也是torch设计人性化的地方),因为小学生只会计算偏导,但是对向量形式的导数计算无能为力,比如下面的例子

Z要是一个标量才会算,不是标量就不知道咋算了,其实我们只要转化为标量就可以了:

求和之后分别计算偏导就可以了

例子来源于:

marsggbo：Pytorch autograd,backward详解zhuanlan.zhihu.com

在autograd中我们只要grad_outputs=torch.ones(y_pred.shape)就可以了,因为向量点乘一个相同大小的全1向量就是求和的操作了.

这样我们就得到了smoothL1的一阶和二阶梯度了:

因为dy_dx创建了计算图,并且在计算图中,dy_dx2没有继续retain_graph所以已经不在计算图上了,但是dy_dx还在,我们通过detach将其从计算图中分离出来.

torch的函数里面经常会有detach和detach_这样大体相同但是名字后面会多一个”_”的情况,前者的操作是copy一份新数据出来创建新的内存空间存放,后者则是原地替换的操作,不开辟新的内存空间,省内存(Python通过id函数可以很方便的查看数据存放的内存地址,比较一下就知道了)

最后将dy_dx作为grad,将dy_dx2作为hessian按照lgb或者xgb的自定义损失函数要求的格式回传就可以了.

第二种:github上用torch写的损失函数

以 torch的loss toolbox这个开源library为例

from torch import nn
from torch.nn import functional as F
class BinaryFocalLoss(nn.Module):
    """
    This is a implementation of Focal Loss with smooth label cross entropy supported which is proposed in
    'Focal Loss for Dense Object Detection. (//arxiv.org/abs/1708.02002)'
        Focal_Loss= -1*alpha*(1-pt)*log(pt)
    :param alpha: (tensor) 3D or 4D the scalar factor for this criterion
    :param gamma: (float,double) gamma > 0 reduces the relative loss for well-classified examples (p>0.5) putting more
                    focus on hard misclassified example
    :param reduction: `none`|`mean`|`sum`
    :param **kwargs
        balance_index: (int) balance class index, should be specific when alpha is float
    """

    def __init__(self, alpha=3, gamma=2, ignore_index=None, reduction='mean',**kwargs):
        super(BinaryFocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.smooth = 1e-6 # set '1e-4' when train with FP16
        self.ignore_index = ignore_index
        self.reduction = reduction

        assert self.reduction in ['none', 'mean', 'sum']

        # if self.alpha is None:
        #     self.alpha = torch.ones(2)
        # elif isinstance(self.alpha, (list, np.ndarray)):
        #     self.alpha = np.asarray(self.alpha)
        #     self.alpha = np.reshape(self.alpha, (2))
        #     assert self.alpha.shape[0] == 2, \
        #         'the `alpha` shape is not match the number of class'
        # elif isinstance(self.alpha, (float, int)):
        #     self.alpha = np.asarray([self.alpha, 1.0 - self.alpha], dtype=np.float).view(2)

        # else:
        #     raise TypeError('{} not supported'.format(type(self.alpha)))

    def forward(self, output, target):
        prob = torch.sigmoid(output)
        prob = torch.clamp(prob, self.smooth, 1.0 - self.smooth)

        valid_mask = None
        if self.ignore_index is not None:
            valid_mask = (target != self.ignore_index).float()

        pos_mask = (target == 1).float()
        neg_mask = (target == 0).float()
        if valid_mask is not None:
            pos_mask = pos_mask * valid_mask
            neg_mask = neg_mask * valid_mask

        pos_weight = (pos_mask * torch.pow(1 - prob, self.gamma)).detach()
        pos_loss = -torch.sum(pos_weight * torch.log(prob)) / (torch.sum(pos_weight) + 1e-4)
        
        
        neg_weight = (neg_mask * torch.pow(prob, self.gamma)).detach()
        neg_loss = -self.alpha * torch.sum(neg_weight * F.logsigmoid(-output)) / (torch.sum(neg_weight) + 1e-4)
        loss = pos_loss + neg_loss

        return loss

用法基本上是一样的,一般来说git上常见的torch的自定义损失函数也是按照自定义model的形式来写的,可以参考:

Pytorch如何自定义损失函数（Loss Function）？www.zhihu.com

这里的回答.

torch的设计理念:万物皆可nn.module,自定义模型,自定义layer,自定义损失函数都是一套api的模板就可以搞定了.