《模式识别与智能计算》基于类中心的欧式距离分类法
- 2020 年 1 月 16 日
- 笔记
算法流程
- 选取某一类样本X
- 计算样本类中心
- 采用欧式距离测度计算待测样品到类中心的距离
- 距离最小的就是待测样品的类别
算法实现
计算距离
def euclid(x_train,y_train,sample): """ :function: 基于类中心的模板匹配法 :param x_train:训练集 M*N M为样本个数 N为特征个数 :param y_train:训练集标签 1*M :param sample: 待识别样品 :return: 返回判断类别 """ disMin = np.inf label = 0 #去除标签重复元素 target = np.unique(y_train) for i in target: #将同一类别的数据下标集中到一起 trainId =([j for j,y in enumerate(y_train) if y==i]) train = x_train[trainId,:] trainMean = np.mean(train, axis=0) dis = np.dot((sample-trainMean),(sample - trainMean).T) if(disMin>dis): disMin = dis label = i return label
划分数据集
def train_test_split(x,y,ratio = 3): """ :function: 对数据集划分为训练集、测试集 :param x: m*n维 m表示数据个数 n表示特征个数 :param y: 标签 :param ratio: 产生比例 train:test = 3:1(默认比例) :return: x_train y_train x_test y_test """ n_samples , n_train = x.shape[0] , int(x.shape[0]*(ratio)/(1+ratio)) train_id = random.sample(range(0,n_samples),n_train) x_train = x[train_id,:] y_train = y[train_id] x_test = np.delete(x,train_id,axis = 0) y_test = np.delete(y,train_id,axis = 0) return x_train,y_train,x_test,y_test
测试
from sklearn import datasets from Include.chapter3 import function import numpy as np #读取数据 digits = datasets.load_digits() x , y = digits.data,digits.target #划分数据集 x_train, y_train, x_test, y_test = function.train_test_split(x,y) testId = np.random.randint(0, x_test.shape[0]) sample = x_test[testId, :] ans = function.euclid(x_train,y_train,sample) print(ans==y_test[testId])
算法结果
True