Coursera 学习笔记｜Machine Learning by Standford University – 吴恩达

2022 年 4 月 4 日
笔记
学习笔记

/ 20220404 Week 1 – 2 /

Chapter 1 – Introduction

1.1 Definition

Arthur Samuel
The field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

1.2 Concepts

1.2.1 Classification of Machine Learning

Supervised Learning 监督学习：given a labeled data set; already know what a correct output/result should look like
- Regression 回归：continuous output
- Classification 分类：discrete output
Unsupervised Learning 无监督学习：given an unlabeled data set or an data set with the same labels; group the data by ourselves
- Clustering 聚类：group the data into different clusters
- Non-Clustering 非聚类
Others: Reinforcement Learning, Recommender Systems…

1.2.2 Model Representation

Training Set 训练集

\[\begin{matrix}
x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\
x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\
\vdots&\vdots&\ddots&\vdots&&\vdots\\
x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}
\end{matrix}\]
符号说明
$m=$ the number of training examples 训练样本的数量 – 行数
$n=$ the number of features 特征数量 – 列数
$x=$ input variable/feature 输入变量/特征
$y=$ output variable/target variable 输出变量/目标变量
$(x^{(i)}_j,y^{(i)})$ ：第$j$个特征的第 $i$ 个训练样本，其中 $i=1, …, m$，$j=1, …, n$

1.2.3 Cost Function 代价函数

1.2.4 Gradient Descent 梯度下降

Chapter 2 – Linear Regression 线性回归

\[\begin{matrix}
x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\
x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\
\vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\
x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\
\\
\theta_0&\theta_1&\theta_2&\cdots&\theta_n&&
\end{matrix}\]

2.1 Linear Regression with One Variable 单元线性回归

Hypothesis Function

\[h_{\theta}(x)=\theta_0+\theta_1x
\]
Cost Function – Square Error Cost Function 平方误差代价函数

\[J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2
\]

Goal

\[\min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1)
\]

2.2 Multivariate Linear Regression 多元线性回归

Hypothesis Function

\[\theta=
\left[
\begin{matrix}
\theta_0\\
\theta_1\\
\vdots\\
\theta_n
\end{matrix}
\right],\
x=
\left[
\begin{matrix}
x_0\\
x_1\\
\vdots\\
x_n
\end{matrix}
\right]\]

\[\begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\
&=\theta^Tx
\end{aligned}\]
Cost Function

\[J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2
\]
Goal

\[\min_{\theta^T}J(\theta^T)
\]

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

算法过程
Repeat until convergence(simultaneous update for each $j=1, …, n$)

\[\begin{aligned}
\theta_j
&:=\theta_j-\alpha{\partial\over\partial\theta_j}J(\theta^T)\\
&:=\theta_j-\alpha{1\over{m}}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j
\end{aligned}\]

Feature Scaling 特征缩放
对每个特征 $x_j$ 有$$x_j={{x_j-\mu_j}\over{s_j}}$$
其中 $\mu_j$ 为 $m$ 个特征 $x_j$ 的平均值，$s_j$ 为 $m$ 个特征 $x_j$ 的范围（最大值与最小值之差）或标准差。
Learning Rate 学习率

2.3.2 Normal Equation(s) 正规方程（组）

令

\[X=\left[
\begin{matrix}
x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n\\
x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n\\
\vdots&\vdots&\vdots&\ddots&\vdots\\
x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n\\
\end{matrix}
\right],\
y=\left[
\begin{matrix}
y^{(1)}\\
y^{(2)}\\
\vdots\\
y^{(m)}\\
\end{matrix}
\right]\]

其中 $X$ 为 $m\times(n+1)$ 维矩阵，$y$ 为 $m$ 维的列向量。则

\[\theta=(X^TX)^{-1}X^Ty
\]

如果 $X^TX$ 不可逆（noninvertible），可能是因为：

Redundant features 冗余特征：存在线性相关的两个特征，需要删除其中一个；
特征过多，如 $m\leq n$：需要删除一些特征，或对其进行正规化（regularization）处理。

2.4 Polynomial Regression 多项式回归

If a linear $h_\theta(x)$ can’t fit the data well, we can change the behavior or curve of $h_\theta(x)$ by making it a quadratic, cubic or square root function(or any other form).
e.g.

$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1}$

Tags: 学习笔记

Coursera 学习笔记｜Machine Learning by Standford University – 吴恩达

Chapter 1 – Introduction

1.1 Definition

1.2 Concepts

1.2.1 Classification of Machine Learning

1.2.2 Model Representation

1.2.3 Cost Function 代价函数

1.2.4 Gradient Descent 梯度下降

Chapter 2 – Linear Regression 线性回归

2.1 Linear Regression with One Variable 单元线性回归

2.2 Multivariate Linear Regression 多元线性回归

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

2.3.2 Normal Equation(s) 正规方程（组）

2.4 Polynomial Regression 多项式回归

VirMach 便宜 VPS

QNews

Coursera 学习笔记｜Machine Learning by Standford University – 吴恩达

Chapter 1 – Introduction

1.1 Definition

1.2 Concepts

1.2.1 Classification of Machine Learning

1.2.2 Model Representation

1.2.3 Cost Function 代价函数

1.2.4 Gradient Descent 梯度下降

Chapter 2 – Linear Regression 线性回归

2.1 Linear Regression with One Variable 单元线性回归

2.2 Multivariate Linear Regression 多元线性回归

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

2.3.2 Normal Equation(s) 正规方程（组）

2.4 Polynomial Regression 多项式回归

分享此文：

Related Posts

PNAS：视觉工作记忆对瞳孔反应的调节模式

就因为把int改成Integer，第2天被辞了

上海大型核酸”时装秀”现场：旗袍、婚纱、女超人、小恐龙…

又一高铁年内开通运营：直通国际机场 市区到机场仅12分钟

VirMach 便宜 VPS

QNews

热门搜寻

又一高铁年内开通运营：直通国际机场市区到机场仅12分钟