《模式识别与智能计算》基于类中心的欧式距离分类法

  • 2020 年 1 月 16 日
  • 笔记

算法流程
  1. 选取某一类样本X
  2. 计算样本类中心
  3. 采用欧式距离测度计算待测样品到类中心的距离
  4. 距离最小的就是待测样品的类别
算法实现

计算距离

def euclid(x_train,y_train,sample):      """      :function: 基于类中心的模板匹配法      :param x_train:训练集 M*N  M为样本个数 N为特征个数      :param y_train:训练集标签 1*M      :param sample: 待识别样品      :return: 返回判断类别      """      disMin = np.inf      label = 0      #去除标签重复元素      target = np.unique(y_train)      for i in target:          #将同一类别的数据下标集中到一起          trainId =([j for j,y in enumerate(y_train) if y==i])          train = x_train[trainId,:]          trainMean = np.mean(train, axis=0)          dis = np.dot((sample-trainMean),(sample - trainMean).T)          if(disMin>dis):              disMin = dis              label = i      return label

划分数据集

def train_test_split(x,y,ratio = 3):      """      :function: 对数据集划分为训练集、测试集      :param x: m*n维 m表示数据个数 n表示特征个数      :param y: 标签      :param ratio: 产生比例 train:test = 3:1(默认比例)      :return: x_train y_train  x_test y_test      """      n_samples , n_train = x.shape[0] , int(x.shape[0]*(ratio)/(1+ratio))      train_id = random.sample(range(0,n_samples),n_train)      x_train = x[train_id,:]      y_train = y[train_id]      x_test = np.delete(x,train_id,axis = 0)      y_test = np.delete(y,train_id,axis = 0)      return x_train,y_train,x_test,y_test

测试

from sklearn import datasets  from Include.chapter3 import function  import numpy as np    #读取数据  digits = datasets.load_digits()  x , y = digits.data,digits.target    #划分数据集  x_train, y_train, x_test, y_test = function.train_test_split(x,y)  testId = np.random.randint(0, x_test.shape[0])  sample = x_test[testId, :]    ans = function.euclid(x_train,y_train,sample)  print(ans==y_test[testId])
算法结果
True