统计学习方法6—logistic回归和最大熵模型

  • 2019 年 10 月 3 日
  • 筆記

logistic????????

1. logistic????

??logistic????????????generalized linear model?????????????????????????????????????wx+b???w?b?????????????????????????????wx+b???????y =wx+b??logistic???????g(wx+b)?????????p?p =g(wx+b),????p ?1-p???????????????????g?logistic?????logistic?????g??????????????

??logistic????????????????????????????????????????????????softmax?????????????????????logistic???

1.1 logistic??

???X????????X???logistic??????????
[ F(x) = P(X leqslant x) = frac {1} {1+exp(frac {-(x-mu)} {gamma})} ]
???(mu) ??????(gamma > 0)??????

?????????$mu =0,gamma =1 $???logistic????sigmoid??
[ F(x) = frac {1} {1+e^{-x}} ]
?????

{zoom:40%;}

1.2 ??logistic????

????logistic?????????????????????????????????logistic???????????????logistic?????????????????????????????????????????????????????????????

??logistic???????(P(Y|X))?????????????X??????????Y???1?0???logistic???????????????
[ begin{aligned} P(Y=1|x) =& frac {1}{1+exp(-(wcdot x+b))} P(Y=0|x) =& 1-P(Y=1|x) =& frac {1}{1+exp(wcdot x +b)} end{aligned} ]
?????????????????????logistic???????????????x???????????

??????logistic?????????????????????????????????logistic???????????????????logistic???????????(frac {p}{1-p})????????????????????????????logistic????????????????????sigmoid?
[ logfrac {p}{1-p} = log frac {P(Y=1|x)}{P(Y=0|x)}=wcdot x +b ]
?????????logistic???????????????????????????????????(wcdot x +b)?????????????logistic????????(0sim 1)???????????????????????

1.3 ??????

???logistic???????????????????X????logistic??????N??????????????
[ prod_{i=1}^N [pi(x_i)]^{y_i}[1-pi(x_i)]^{1-y_i} ]
???????
[ begin{aligned} L(w) = & sum_{i=1}^N[y_ilogpi(x_i)+(1-y_i)log(1-pi(x_i))] = & sum_{i=1}^N left [ y_ilogfrac {pi(x_i)}{1-pi(x_i)}+log(1-pi(x_i)) right ] = & sum_{i=1}^N [y_i(wcdot x_i)-log(1+exp(wcdot x_i)] end{aligned} ]
??
[ widehat w = argunderset w {max} L(w) ]
????(w)????

??(L(w))????????????????????????????????????????????????????????

2. ?????

2.1 ?????

??????????????????????????????????????????????????????????????????????????????????????“????”?????????????????????????????????????????????????????????????????????

??????????X??????(P(X))????????
[ H(P) = – sum_x P(x)log P(x) ]
???????
[ 0 leqslant H(P) leqslant log | num(X) | ]
????(X)???????????????????????????
[ H(P)=-sum_nfrac 1 n logfrac 1 n = logn ]
????????????????????????????????????????????????????????????????

2.2 ?????

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

???????????????????(P(Y|X))?????????(X)???????????(Y)???????????????????(X,Y)?????(P(X,Y))?????(P(X))?????????(widetilde P(X,Y),widetilde P(X))???

????????????????????????????(x)?(y)?????????
[ f(x,y) = begin{cases} 1, &x?y????? 0?&?? end{cases} ]
????????????????????????????????????(X,Y)??????(X)??????????????????(P(Y|X))????????
[ widetilde P(X) cdot P(Y|X) = widetilde P(X,Y) ]
?????(P(Y|X))?????????????????????????????????????(f(x,y))??(widetilde P(X,Y))????
[ E_{widetilde P} (f) = sum_{x,y} widetilde P(x,y) f(x,y) ]
??(P(Y|X))?(widetilde P(X))????
[ E_p(f) = sum_{x,y} widetilde P(X) P(Y|X) f(x,y) ]
??????????????
[ E_{widetilde P} (f)= E_P(f) ]

  • ???????

??????????????????
[ C equiv { Pin mathbb{P}|E_{widetilde P} (f_i)= E_P(f_i) } ]
?????????????????????????????????????????????(P(Y|X))?????
[ H(P) = -sum_{x,y} widetilde P(x)P(y|x) log P(y|x) ]
???????(p^*)?
[ P^* = argunderset {Pin C} {max} H(P) ]

2.3 ????????

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

??????????????
[ begin{aligned} underset {Pin C} {max} ;;; &H(P) = -sum_{x,y} widetilde P(x)P(y|x) log P(y|x) s.t. ;;;; &E_P(f_i) = E_{widetilde P}(f_i), ;;;;i=1,2,cdots, n &sum_yP(y|x) =1 end{aligned} ]
??????????????????????????????????????
[ begin{aligned} underset {Pin C} {min} ;;; &-H(P) = sum_{x,y} widetilde P(x)P(y|x) log P(y|x) s.t. ;;;; &E_P(f_i) = E_{widetilde P}(f_i), ;;;;i=1,2,cdots, n &sum_yP(y|x) =1 end{aligned} ]
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????
[ begin{aligned} L(P,w) =& -H(P) +w_0(1-sum_yP(y|x))+sum_{i=1}^nw_i(E_p(f_i) – E_{widetilde P}(f_i)) =& sum_{x,y}widetilde P(x) P(y|x) log P(Y|x) +w_0(1-sum_yP(y|x)) + &sum_{i=1}^n w_i left ( sum_{x,y} widetilde P(x,y) logP(y|x) + sum_{x,y} widetilde P(x) P(y|x) log P(y|x) right ) end{aligned} ]
?????????????????????????????????????????????(P,w)?????????????????????????(w)????????????(w)???????????????????????(P)???????(P)??????????????

?????????????????????(underset w {max};;;L(P,w))????
[ underset {Pin C} {min} ;underset w {max};;;L(P,w) ]
???????????????????????????????????????????????????????(w)?????????????????????(w)??????????0????????????(w)??????????????????????????????????????????????(-H(P))??????????????????????????(P,w)??????????????????????????????????????????????????????????????????Convex Optimization Overview (cnt’d) ?????????

????????(w)?????????????????????????????????????????(L(P,w))???????????????????????????????????????????——??KKT???

??????
[ underset w {max}; underset {Pin C} {min} ; L(P,w) ]
?????????(P)????????????????????????????????(w)??????????????
[ Psi(w) = underset {Pin C} {min} ; L(P,w) = L(P_w,w) ]
??????????(w)?????(P)???
[ P_w = arg underset {Pin C} {min} ; L(P,w) = P_w(y|x) ]
??????????(w)????????(P_w)?????????

?????????

  1. ????????

???(P)???
[ begin{aligned} frac {partial L(P,x)} {partial P(y|x)} = & sum_y widetilde P(x) (logP(y|x)+1)-sum_y w_0 -sum_{x,y} left ( widetilde P(x) sum_{i=1} ^n w_i f_i(x,y) right) =& sum_{x,y} widetilde P(x) left ( logP(y|x) + 1 -sum_x widetilde P(x)sum_y w_0 -sum_{i=1}^n w_if_i(x,y) right) =& sum_{x,y} widetilde P(x) left ( logP(y|x) + 1 – w_0 -sum_{i=1}^n w_if_i(x,y) right) end {aligned} ]
????0?(widetilde P(x) > 0)??
[ P(y|x) = exp left ( sum_{i=1}^n w_i f_i(x,y) + w_0 -1 right)= frac {exp left ( sum_{i=1}^n w_i f_i(x,y) right)} {exp(1-w_0)} ]
???????(P(y|x))??????????????????????????????(widetilde P(x))????????????????????????????????????????????????
[ sum_yP(y|x) = 1 ]
???????
[ Z_w(x) = sum_y expleft (sum_{i=1} {n} w_i f_i (x,y)right) ]
ps: (exp(1-w_0))???????

?? ????
[ P_w(y|x) = frac 1 {Z_w(x)} expleft (sum_{i=1} {n} w_i f_i (x,y)right) ]

  1. ???????(Psi(w))

??????????(Psi(w) = L(P_w,w))?????????(Psi(w))??????
[ underset w {max} Psi(w) ]
???
[ w^* = arg underset w {max} ; Psi(w) ]
????????(w)?????0???

??????(w^*)???(P_w(y|x))????????

3. ??????

?????????????????????????????????????????????????????????????????????????????????????????????????????
[ L_{widetilde P}(P) = log prod_{x,y}P(y|x)^{widetilde P(x,y)} = sum_{x,y} widetilde P(x,y) log P(y|x) ]
?(P_w(y|x))??
[ begin{aligned} L_{widetilde P}(P_w) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_{x,y} widetilde P(x,y) log Z_w(x) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_xwidetilde P(x) log Z_w(x) end{aligned} ]
????????
[ begin{aligned} Psi(w) =& sum_{x,y} widetilde P(x) P_w(y|x)log P_w(y|x) + sum_{i=1} ^n w_i left ( sum_{x,y} widetilde P(x,y) f_i(x,y) – sum_{x,y} widetilde P(x) P_w(y|x)f_i(x,y) right) =&sum_{x,y} widetilde P(x,y) sum_{i=1}^n w_i f_i(x,y) +sum_{x,y} widetilde P(x) P_w(y|x) left ( logP_w(y|x) – sum_{i=1}^n w_i f_i (x,y) right) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_{x,y} widetilde P(x,y) log Z_w(x) =& sum_{x,y} widetilde P(x,y) sum_{i=1} ^n w_i f_i(x,y) -sum_xwidetilde P(x) log Z_w(x) end{aligned} ]
????????????(sum_yP(y|x) = 1)?

????????????????????????????????????????

4. ????logistic?????

???????logistic??????????????????????????????????????logistic?????

??????y???????????????????0,1.?(yin { 0,1})??????x?y??????
[ f(x,y) = begin{cases} g(x) & y=1 0&y=0 end{cases} ]
??????????????????????????????(x,y)???????????????????????
[ begin{aligned} P(y=1|x) = &frac {exp(wcdot f(x,1))} {exp(wcdot f(x,0))+exp(wcdot f(x,1))} =& frac {exp(wcdot g(x)} {exp(0)+exp(wcdot g(x))} =& frac 1 {exp(-wcdot g(x))+1} end{aligned} ]
?(g(x))?sigmoid???????logistic?????

?????(P(y=0|x))?

????????????????

5. ??

??logistic????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????????????????KKT????????????????????

????????logistic?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

6. Reference

??????

?????????????

Convex Optimization Overview (cnt’d)

PRML

?????????????

Exit mobile version