PASCAL VOC数据集训练集、验证集、测试集的划分和提取

  • 2019 年 12 月 10 日
  • 笔记

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文链接:https://blog.csdn.net/weixin_36670529/article/details/103426238

1、训练集、验证集、测试集按比例精确划分

#数据集划分  import os  import random    root_dir='./park_voc/VOC2007/'    ## 0.7train 0.1val 0.2test  trainval_percent = 0.8  train_percent = 0.7  xmlfilepath = root_dir+'Annotations'  txtsavepath = root_dir+'ImageSets/Main'  total_xml = os.listdir(xmlfilepath)    num = len(total_xml)  # 100  list = range(num)  tv = int(num*trainval_percent)  # 80  tr = int(tv*train_percent)  # 80*0.7=56  trainval = random.sample(list, tv)  train = random.sample(trainval, tr)    ftrainval = open(root_dir+'ImageSets/Main/trainval.txt', 'w')  ftest = open(root_dir+'ImageSets/Main/test.txt', 'w')  ftrain = open(root_dir+'ImageSets/Main/train.txt', 'w')  fval = open(root_dir+'ImageSets/Main/val.txt', 'w')    for i in list:      name = total_xml[i][:-4]+'n'      if i in trainval:          ftrainval.write(name)          if i in train:              ftrain.write(name)          else:              fval.write(name)      else:          ftest.write(name)    ftrainval.close()  ftrain.close()  fval.close()  ftest .close()

2、训练集、验证集和测试集提取(只给出trian文件的提取方法)

# -*- coding:UTF-8 -*-  import shutil    f_txt = open('D:datasetVOCdevkitsplitVOC2007ImageSetsMain\trainval.txt', 'r')  f_train = 'D:datasetVOCdevkitVOC2007\train'    context = list(f_txt)  for imagename in context:      imagename = imagename[0:6]      imagename = imagename + '.jpg'      imagepath = 'D:datasetVOCdevkitVOC2007JPEGImages\'+ imagename      shutil.copy(imagepath,f_train)      # 删除训练集和验证集,剩余图片为测试集      # os.remove(imagepath)    #处理Annotations同理只需将.jpg改为.xml

参考:https://www.cnblogs.com/sdu20112013/p/10801383.html