模型压缩基础

2022 年 4 月 11 日
筆記
nlp

模型压缩

模型压缩

网络剪枝 Netwrok pruning

剪掉网络中无用的参数。

有意思的图，连接先增加后减少。

train large model
评估重要性
1. 参数重要性（以参数为剪枝单位）
  1. 比如根据权重的绝对值
2. 神经元重要性（以神经元为剪枝单位）
  1. 比如神经元是否为0
剪掉不重要的
微调小模型，重复执行

weights pruning

网络的形状会变得不规则，难以构造模型，GPU加速；虽然可以充0，但是实际网络并没有变小。

neuron pruning

为什么舍本逐末？不直接train小模型

小网络难以训练，为什么？

根据大乐透假说 Lottery Ticket Hypothesis

可以理解为增加试验次数，样本量等，海选总会有好的；大模型包含了很多小的子模型

调大学习率，也许会得到和大乐透假说不一样的结果。

知识蒸馏 knowledge Distillation

Student Net 拟合Teacher Net 的输出

temperature softmax

使用了平滑的思想

Parameter Quantization

混合精度
Weight clustering
常出现的参数使用更少的bits
- 如 Huffman encoding

架构设计 architecture design

Depthwise Separable Convolution

1 Depthwise Convolution

Filter number = Input channel number
Each filter only considers one channel.
The filters are 𝑘 × 𝑘 matrices
There is no interaction between channels.

2 Pointwise Convolution

专门用来跨 channel

must \(1*1\) filter

参数变化：

\[\frac{k*k*I+I*O}{k*k*I*O}=\frac{1}{O}+\frac{1}{k*k}
\]

I: input channel

O: output channel

原理（为什么有效）

Low rank approximation

Dynamic Computation

按照资源分配

方法：

模型的每一层接出来训练，使用选不同的层
Multi-Scale Dense Network (MSDNet)
Dynamic width
Computation based on Sample Difficulty
- SkipNet: Learning Dynamic Routing in Convolutional Networks
- Runtime Neural Pruning
- BlockDrop: Dynamic Inference Paths in Residual Networks

references

【1】//speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/tiny_v7.pdf

【2】//colab.research.google.com/drive/1lJS0ApIyi7eZ2b3GMyGxjPShI8jXM2UC

【3】//colab.research.google.com/drive/1iuEkPP-SvCopHEN9X6xiPA8E6eACbL5u

【4】//colab.research.google.com/drive/1CIn-Qqn9LBz-0f71Skm4vmdTDnE17uwy

【5】//colab.research.google.com/drive/1G1_I5xoxnX4xfLUmQjxCZKw40rRjjZMQ

【6】//colab.research.google.com/github/ga642381/ML2021-Spring/blob/main/HW13/HW13.ipynb

【7】//github.com/nlp-with-transformers/notebooks/blob/main/08_model-compression.ipynb

Tags: nlp

模型压缩基础

模型压缩

网络剪枝 Netwrok pruning

知识蒸馏 knowledge Distillation

Parameter Quantization

架构设计 architecture design

Dynamic Computation

VirMach 便宜 VPS

QNews

模型压缩基础

模型压缩

网络剪枝 Netwrok pruning

知识蒸馏 knowledge Distillation

Parameter Quantization

架构设计 architecture design

Dynamic Computation

分享此文：

Related Posts

《一个程序猿的生命周期》-《发展篇》- 39.身为中层，坚定不移的个人战略转型

《刻意练习》第4章 黄金标准

首次出手就是多面手！vivo Pad首发评测：14+小时续航无敌

线性二自由度汽车模型的微分方程

VirMach 便宜 VPS

QNews

熱門搜尋

《刻意练习》第4章黄金标准