《模式識別與智能計算》基於PCA的模板匹配法
- 2020 年 1 月 16 日
- 筆記
算法流程:
- 選取各類全體樣本組成矩陣X,待測樣品
- 計算協方差矩陣S
- 根據S的特徵值選取適合的矩陣C
- 使用矩陣C降維
- 採用模板匹配開始多類別分類
算法實現
PCA降維算法
def pca(x,k=0,percent = 0.9): """ :function: 主成分分析法 :param X: 數據X m*n維 n表示特徵個數,m表示數據個數 :param K: K表是要保留的維度 :param percent: 樣本所佔比例 :return: 返回特徵向量 """ m,n = x.shape mean = np.mean(x,axis=0) mean.shape = (1,n) x_norm = x - mean x_norm = x_norm.T # 將它變成 行列分別為特徵的矩陣 便於計算!!! cov = np.dot(x_norm, x_norm.T) eigval, eigvec = np.linalg.eig(cov) index = np.argsort(-eigval) eigvec_sort = eigvec[index] eigval_sort = eigval[index] eigval_ratio = eigval_sort/np.sum(eigval_sort) sum = 0 for i in range(eigval_ratio.shape[0]): sum += eigval_ratio[i] if sum > percent: return eigvec_sort[:,:i+1]
模板匹配算法
def neartemplet(x_train,y_train,sample): """ :function: 模板匹配法 :param X_train: 訓練集 M*N M為樣本個數 N為特徵個數 :param y_train: 訓練集標籤 1*M :param sample: 待識別樣品 :return: 返回判斷類別 """ n_train = x_train.shape[0] dis = [] for i in range(n_train): dis.append(np.sum((sample-x_train[i,:])**2)) minIndx = np.argmin(dis) return y_train[minIndx]
劃分數據集
def train_test_split(x,y,ratio = 3): """ :function: 對數據集劃分為訓練集、測試集 :param x: m*n維 m表示數據個數 n表示特徵個數 :param y: 標籤 :param ratio: 產生比例 train:test = 3:1(默認比例) :return: x_train y_train x_test y_test """ n_samples , n_train = x.shape[0] , int(x.shape[0]*(ratio)/(1+ratio)) train_id = random.sample(range(0,n_samples),n_train) x_train = x[train_id,:] y_train = y[train_id] x_test = np.delete(x,train_id,axis = 0) y_test = np.delete(y,train_id,axis = 0) return x_train,y_train,x_test,y_test
測試代碼
from sklearn import datasets from Include.chapter3 import function import numpy as np #讀取數據 digits = datasets.load_digits() x , y = digits.data,digits.target #劃分數據集 x_train, y_train, x_test, y_test = function.train_test_split(x,y) testId = np.random.randint(0, x_test.shape[0]) sample = x_test[testId, :] eigVec = function.pca(x_train) mean = np.mean(x,axis=0).reshape((1,64)) #去均值 x_train = x_train - mean sample = sample - mean #降維 x_train = np.dot(x_train,eigVec) sample = np.dot(sample,eigVec) #模板匹配 ans = function.neartemplet(x_train,y_train,sample) print(ans==y_test[testId])
算法結果
True