统计学习方法6—logistic回归和最大熵模型
- 2019 年 10 月 3 日
- 筆記
??
logistic????????
1. logistic????
??logistic????????????generalized linear model?????????????????????????????????????wx+b???w?b?????????????????????????????wx+b???????y =wx+b??logistic???????g(wx+b)?????????p?p =g(wx+b),????p ?1-p???????????????????g?logistic?????logistic?????g??????????????
??logistic????????????????????????????????????????????????softmax?????????????????????logistic???
1.1 logistic??
???X????????X???logistic??????????
[ F(x) = P(X leqslant x) = frac {1} {1+exp(frac {-(x-mu)} {gamma})} ]
???(mu) ??????(gamma > 0)??????
?????????$mu =0,gamma =1 $???logistic????sigmoid??
[ F(x) = frac {1} {1+e^{-x}} ]
?????
1.2 ??logistic????
????logistic?????????????????????????????????logistic???????????????logistic?????????????????????????????????????????????????????????????
??logistic???????(P(Y|X))?????????????X??????????Y???1?0???logistic???????????????
[ begin{aligned} P(Y=1|x) =& frac {1}{1+exp(-(wcdot x+b))} P(Y=0|x) =& 1-P(Y=1|x) =& frac {1}{1+exp(wcdot x +b)} end{aligned} ]
?????????????????????logistic???????????????x???????????
??????logistic?????????????????????????????????logistic???????????????????logistic???????????(frac {p}{1-p})????????????????????????????logistic????????????????????sigmoid?
[ logfrac {p}{1-p} = log frac {P(Y=1|x)}{P(Y=0|x)}=wcdot x +b ]
?????????logistic???????????????????????????????????(wcdot x +b)?????????????logistic????????(0sim 1)???????????????????????
1.3 ??????
???logistic???????????????????X????logistic??????N??????????????
[ prod_{i=1}^N [pi(x_i)]^{y_i}[1-pi(x_i)]^{1-y_i} ]
???????
[ begin{aligned} L(w) = & sum_{i=1}^N[y_ilogpi(x_i)+(1-y_i)log(1-pi(x_i))] = & sum_{i=1}^N left [ y_ilogfrac {pi(x_i)}{1-pi(x_i)}+log(1-pi(x_i)) right ] = & sum_{i=1}^N [y_i(wcdot x_i)-log(1+exp(wcdot x_i)] end{aligned} ]
??
[ widehat w = argunderset w {max} L(w) ]
????(w)????
??(L(w))????????????????????????????????????????????????????????
2. ?????
2.1 ?????
??????????????????????????????????????????????????????????????????????????????????????“????”?????????????????????????????????????????????????????????????????????
??????????X??????(P(X))????????
[ H(P) = – sum_x P(x)log P(x) ]
???????
[ 0 leqslant H(P) leqslant log | num(X) | ]
????(X)???????????????????????????
[ H(P)=-sum_nfrac 1 n logfrac 1 n = logn ]
????????????????????????????????????????????????????????????????
2.2 ?????
???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
???????????????????(P(Y|X))?????????(X)???????????(Y)???????????????????(X,Y)?????(P(X,Y))?????(P(X))?????????(widetilde P(X,Y),widetilde P(X))???
????????????????????????????(x)?(y)?????????
[ f(x,y) = begin{cases} 1, &x?y????? 0?&?? end{cases} ]
????????????????????????????????????(X,Y)??????(X)??????????????????(P(Y|X))????????
[ widetilde P(X) cdot P(Y|X) = widetilde P(X,Y) ]
?????(P(Y|X))?????????????????????????????????????(f(x,y))??(widetilde P(X,Y))????
[ E_{widetilde P} (f) = sum_{x,y} widetilde P(x,y) f(x,y) ]
??(P(Y|X))?(widetilde P(X))????
[ E_p(f) = sum_{x,y} widetilde P(X) P(Y|X) f(x,y) ]
??????????????
[ E_{widetilde P} (f)= E_P(f) ]
- ???????
??????????????????
[ C equiv { Pin mathbb{P}|E_{widetilde P} (f_i)= E_P(f_i) } ]
?????????????????????????????????????????????(P(Y|X))?????
[ H(P) = -sum_{x,y} widetilde P(x)P(y|x) log P(y|x) ]
???????(p^*)?
[ P^* = argunderset {Pin C} {max} H(P) ]
2.3 ????????
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
??????????????
[ begin{aligned} underset {Pin C} {max} ;;; &H(P) = -sum_{x,y} widetilde P(x)P(y|x) log P(y|x) s.t. ;;;; &E_P(f_i) = E_{widetilde P}(f_i), ;;;;i=1,2,cdots, n &sum_yP(y|x) =1 end{aligned} ]
??????????????????????????????????????
[ begin{aligned} underset {Pin C} {min} ;;; &-H(P) = sum_{x,y} widetilde P(x)P(y|x) log P(y|x) s.t. ;;;; &E_P(f_i) = E_{widetilde P}(f_i), ;;;;i=1,2,cdots, n &sum_yP(y|x) =1 end{aligned} ]
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????
[ begin{aligned} L(P,w) =& -H(P) +w_0(1-sum_yP(y|x))+sum_{i=1}^nw_i(E_p(f_i) – E_{widetilde P}(f_i)) =& sum_{x,y}widetilde P(x) P(y|x) log P(Y|x) +w_0(1-sum_yP(y|x)) + &sum_{i=1}^n w_i left ( sum_{x,y} widetilde P(x,y) logP(y|x) + sum_{x,y} widetilde P(x) P(y|x) log P(y|x) right ) end{aligned} ]
?????????????????????????????????????????????(P,w)?????????????????????????(w)????????????(w)???????????????????????(P)???????(P)??????????????
?????????????????????(underset w {max};;;L(P,w))????
[ underset {Pin C} {min} ;underset w {max};;;L(P,w) ]
???????????????????????????????????????????????????????(w)?????????????????????(w)??????????0????????????(w)??????????????????????????????????????????????(-H(P))??????????????????????????(P,w)??????????????????????????????????????????????????????????????????Convex Optimization Overview (cnt’d) ?????????
????????(w)?????????????????????????????????????????(L(P,w))???????????????????????????????????????????——??KKT???
??????
[ underset w {max}; underset {Pin C} {min} ; L(P,w) ]
?????????(P)????????????????????????????????(w)??????????????
[ Psi(w) = underset {Pin C} {min} ; L(P,w) = L(P_w,w) ]
??????????(w)?????(P)???
[ P_w = arg underset {Pin C} {min} ; L(P,w) = P_w(y|x) ]
??????????(w)????????(P_w)?????????
?????????
- ????????
???(P)???
[ begin{aligned} frac {partial L(P,x)} {partial P(y|x)} = & sum_y widetilde P(x) (logP(y|x)+1)-sum_y w_0 -sum_{x,y} left ( widetilde P(x) sum_{i=1} ^n w_i f_i(x,y) right) =& sum_{x,y} widetilde P(x) left ( logP(y|x) + 1 -sum_x widetilde P(x)sum_y w_0 -sum_{i=1}^n w_if_i(x,y) right) =& sum_{x,y} widetilde P(x) left ( logP(y|x) + 1 – w_0 -sum_{i=1}^n w_if_i(x,y) right) end {aligned} ]
????0?(widetilde P(x) > 0)??
[ P(y|x) = exp left ( sum_{i=1}^n w_i f_i(x,y) + w_0 -1 right)= frac {exp left ( sum_{i=1}^n w_i f_i(x,y) right)} {exp(1-w_0)} ]
???????(P(y|x))??????????????????????????????(widetilde P(x))????????????????????????????????????????????????
[ sum_yP(y|x) = 1 ]
???????
[ Z_w(x) = sum_y expleft (sum_{i=1} {n} w_i f_i (x,y)right) ]
ps: (exp(1-w_0))???????
?? ????
[ P_w(y|x) = frac 1 {Z_w(x)} expleft (sum_{i=1} {n} w_i f_i (x,y)right) ]
- ???????(Psi(w))
??????????(Psi(w) = L(P_w,w))?????????(Psi(w))??????
[ underset w {max} Psi(w) ]
???
[ w^* = arg underset w {max} ; Psi(w) ]
????????(w)?????0???
??????(w^*)???(P_w(y|x))????????
3. ??????
?????????????????????????????????????????????????????????????????????????????????????????????????????
[ L_{widetilde P}(P) = log prod_{x,y}P(y|x)^{widetilde P(x,y)} = sum_{x,y} widetilde P(x,y) log P(y|x) ]
?(P_w(y|x))??
[ begin{aligned} L_{widetilde P}(P_w) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_{x,y} widetilde P(x,y) log Z_w(x) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_xwidetilde P(x) log Z_w(x) end{aligned} ]
????????
[ begin{aligned} Psi(w) =& sum_{x,y} widetilde P(x) P_w(y|x)log P_w(y|x) + sum_{i=1} ^n w_i left ( sum_{x,y} widetilde P(x,y) f_i(x,y) – sum_{x,y} widetilde P(x) P_w(y|x)f_i(x,y) right) =&sum_{x,y} widetilde P(x,y) sum_{i=1}^n w_i f_i(x,y) +sum_{x,y} widetilde P(x) P_w(y|x) left ( logP_w(y|x) – sum_{i=1}^n w_i f_i (x,y) right) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_{x,y} widetilde P(x,y) log Z_w(x) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_xwidetilde P(x) log Z_w(x) end{aligned} ]
????????????(sum_yP(y|x) = 1)?
????????????????????????????????????????
4. ????logistic?????
???????logistic??????????????????????????????????????logistic?????
??????y???????????????????0,1.?(yin { 0,1})??????x?y??????
[ f(x,y) = begin{cases} g(x) & y=1 0&y=0 end{cases} ]
??????????????????????????????(x,y)???????????????????????
[ begin{aligned} P(y=1|x) = &frac {exp(wcdot f(x,1))} {exp(wcdot f(x,0))+exp(wcdot f(x,1))} =& frac {exp(wcdot g(x)} {exp(0)+exp(wcdot g(x))} =& frac 1 {exp(-wcdot g(x))+1} end{aligned} ]
?(g(x))?sigmoid???????logistic?????
?????(P(y=0|x))?
????????????????
5. ??
??logistic????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
?????????????????????????????????????????????????????????????????????????????????KKT????????????????????
????????logistic?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
6. Reference
??????
Convex Optimization Overview (cnt’d)
PRML