《模式識別與智慧計算》基於類中心的歐式距離分類法

  • 2020 年 1 月 16 日
  • 筆記

演算法流程
  1. 選取某一類樣本X
  2. 計算樣本類中心
  3. 採用歐式距離測度計算待測樣品到類中心的距離
  4. 距離最小的就是待測樣品的類別
演算法實現

計算距離

def euclid(x_train,y_train,sample):      """      :function: 基於類中心的模板匹配法      :param x_train:訓練集 M*N  M為樣本個數 N為特徵個數      :param y_train:訓練集標籤 1*M      :param sample: 待識別樣品      :return: 返回判斷類別      """      disMin = np.inf      label = 0      #去除標籤重複元素      target = np.unique(y_train)      for i in target:          #將同一類別的數據下標集中到一起          trainId =([j for j,y in enumerate(y_train) if y==i])          train = x_train[trainId,:]          trainMean = np.mean(train, axis=0)          dis = np.dot((sample-trainMean),(sample - trainMean).T)          if(disMin>dis):              disMin = dis              label = i      return label

劃分數據集

def train_test_split(x,y,ratio = 3):      """      :function: 對數據集劃分為訓練集、測試集      :param x: m*n維 m表示數據個數 n表示特徵個數      :param y: 標籤      :param ratio: 產生比例 train:test = 3:1(默認比例)      :return: x_train y_train  x_test y_test      """      n_samples , n_train = x.shape[0] , int(x.shape[0]*(ratio)/(1+ratio))      train_id = random.sample(range(0,n_samples),n_train)      x_train = x[train_id,:]      y_train = y[train_id]      x_test = np.delete(x,train_id,axis = 0)      y_test = np.delete(y,train_id,axis = 0)      return x_train,y_train,x_test,y_test

測試

from sklearn import datasets  from Include.chapter3 import function  import numpy as np    #讀取數據  digits = datasets.load_digits()  x , y = digits.data,digits.target    #劃分數據集  x_train, y_train, x_test, y_test = function.train_test_split(x,y)  testId = np.random.randint(0, x_test.shape[0])  sample = x_test[testId, :]    ans = function.euclid(x_train,y_train,sample)  print(ans==y_test[testId])
演算法結果
True