儲蓄Keras增強數據作爲numpy的陣列

使用keras ImageDataGenerator，我們可以節省增強圖像作爲PNG或JPG：儲蓄Keras增強數據作爲numpy的陣列

for X_batch, y_batch in datagen.flow(train_data, train_labels, batch_size=batch_size,\ 
       save_to_dir='images', save_prefix='aug', save_format='png'):

我有形狀（1600，4，100,100）的數據集，這意味着1600個圖像具有4個100x100像素的通道。如何將增強數據保存爲形狀爲numpy的數組（N，4,100,100）而不是單個圖像？

來源

2017-08-10 FJ_Abbasi

你想將每個批次保存在一個文件中？像np.save（'batch.npy'，X_batch）？ – niklascp

我想將所有增強數據保存在一個文件中。 –

你不能。閱讀文檔：'flow（x，y）：採用numpy數據和標籤數組，並生成批量的增強/標準化數據。無限循環**無限期地產生批次**。雖然，你可能只需確定前M個批次並將它們結合在一起。 –

由於您知道樣本數量= 1600，只要達到此數字，您就可以停止datagen.flow()。

augmented_data = [] 
num_augmented = 0 
for X_batch, y_batch in datagen.flow(train_data, train_labels, batch_size=batch_size, shuffle=False): 
    augmented_data.append(X_batch) 
    num_augmented += batch_size 
    if num_augmented == train_data.shape[0]: 
     break 
augmented_data = np.concatenate(augmented_data) 
np.save(...)

請注意，您應該設置batch_size正確（如batch_size=10），這樣不會產生額外的增強圖像。

來源

2017-08-10 15:40:23

所以我可以保留'num_augmented'儘可能大，我想？這是否是一個正確的方法來從較少的數據創建大型訓練數據集？ –

嗯...從我自己的經驗來看，我覺得沒有太多的信息可以通過圖像增強「擠出」數據。通過對數據應用小型隨機變換，圖像增強可使您的模型更加健壯。如果太多扭曲圖像，您的模型可能會從中學習一些不合需要的模式。大型'num_augmented'的效果更像是在幾個曆元上運行相同的數據集，而不是數倍大的數據集。 –

儲蓄Keras增強數據作爲numpy的陣列

回答

相關問題