PASCAL VOC数据集训练集、验证集、测试集的划分和提取
- 2019 年 12 月 10 日
- 笔记
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_36670529/article/details/103426238
1、训练集、验证集、测试集按比例精确划分
#数据集划分 import os import random root_dir='./park_voc/VOC2007/' ## 0.7train 0.1val 0.2test trainval_percent = 0.8 train_percent = 0.7 xmlfilepath = root_dir+'Annotations' txtsavepath = root_dir+'ImageSets/Main' total_xml = os.listdir(xmlfilepath) num = len(total_xml) # 100 list = range(num) tv = int(num*trainval_percent) # 80 tr = int(tv*train_percent) # 80*0.7=56 trainval = random.sample(list, tv) train = random.sample(trainval, tr) ftrainval = open(root_dir+'ImageSets/Main/trainval.txt', 'w') ftest = open(root_dir+'ImageSets/Main/test.txt', 'w') ftrain = open(root_dir+'ImageSets/Main/train.txt', 'w') fval = open(root_dir+'ImageSets/Main/val.txt', 'w') for i in list: name = total_xml[i][:-4]+'n' if i in trainval: ftrainval.write(name) if i in train: ftrain.write(name) else: fval.write(name) else: ftest.write(name) ftrainval.close() ftrain.close() fval.close() ftest .close()
2、训练集、验证集和测试集提取(只给出trian文件的提取方法)
# -*- coding:UTF-8 -*- import shutil f_txt = open('D:datasetVOCdevkitsplitVOC2007ImageSetsMain\trainval.txt', 'r') f_train = 'D:datasetVOCdevkitVOC2007\train' context = list(f_txt) for imagename in context: imagename = imagename[0:6] imagename = imagename + '.jpg' imagepath = 'D:datasetVOCdevkitVOC2007JPEGImages\'+ imagename shutil.copy(imagepath,f_train) # 删除训练集和验证集,剩余图片为测试集 # os.remove(imagepath) #处理Annotations同理只需将.jpg改为.xml
参考:https://www.cnblogs.com/sdu20112013/p/10801383.html