如何混洗圖像以用於培訓和測試

-1

我想將圖像數據集分爲兩部分，即訓練和測試集。我想將訓練和測試之間的數據分成80/20。我需要從所有類別的圖像中提取80％，並且還需要從所有類別的圖像中提取剩餘的20％圖像。如何混洗圖像以用於培訓和測試

這是目前我如何將數據拆分爲兩個，但它不能正常工作。

image_filenames = glob.glob("./imagenet-dogs/n02*/*.jpg") 

image_filenames[0:2] 

training_dataset = defaultdict(list) 
testing_dataset = defaultdict(list) 

image_filename_with_breed = map(lambda filename: (filename.split("/")[2], filename), image_filenames) 

for dog_breed, breed_images in groupby(image_filename_with_breed, lambda x: x[0]): 
    """ 
    Append training/testing image datasets to respective dictionaries 
    """ 
    # Enumerate each breed's image and send ~20% of the images to a testing set 
    for i, breed_image in enumerate(breed_images): 
     if i % 5 == 0: 
      testing_dataset[dog_breed].append(breed_image[1]) 
     else: 
      training_dataset[dog_breed].append(breed_image[1])

我在做什麼錯誤，以及如何確保圖像是從所有類中獲取的最佳方法是什麼？

會洗牌的圖像，然後採取80％，將其附加到培訓，然後採取其餘的20％，並將其附加到測試工作？如果是這樣，那麼最好的方法是什麼？

來源

2017-07-07 Micah Mca

不要重新發明輪子。 scikit-learn是Python中ML實用程序的實際庫。例如，它內置了'train_test_split_'。 –

這正是分層列車試驗分裂呢，sklearn實施，例如： http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html

總的想法很簡單：

對於每一個C類：
- 取所有屬於C類的樣品
- 洗牌（以防萬一）
- 採取前K％訓練，剩餘的測試
洗牌列車並再次測試（分別）

這種方式你選擇每個類的適當比例的保證（因此字分層）。

來源

2017-07-07 17:55:53 lejlot

如何混洗圖像以用於培訓和測試

回答

相關問題