Keras：使用flow_from_directory將訓練數據擬合圖像增強

我想在Keras中使用圖像增強。我當前的代碼如下所示：Keras：使用flow_from_directory將訓練數據擬合圖像增強

# define image augmentations 
train_datagen = ImageDataGenerator(
featurewise_center=True, 
featurewise_std_normalization=True, 
zca_whitening=True) 

# generate image batches from directory 
train_datagen.flow_from_directory(train_dir)

當我運行這個模型，我得到以下錯誤：

"ImageDataGenerator specifies `featurewise_std_normalization`, but it hasn't been fit on any training data."

但我沒有找到有關如何使用train_dataget.fit()明確的信息連同flow_from_directory。

謝謝你的幫助。馬里奧

來源

2017-10-12 Mario Kreutzfeldt

你是對的，docs不在這個很受啓發......

你需要的其實是一個4個步驟：

定義你的數據增強
飛度增強
設置您的發電機使用flow_from_directory()
訓練您的模型與fit_generator()

下面是一個假設的圖像分類情況下，必要的代碼：

# define data augmentation configuration 
train_datagen = ImageDataGenerator(featurewise_center=True, 
            featurewise_std_normalization=True, 
            zca_whitening=True) 

# fit the data augmentation 
train_datagen.fit(x_train) 

# setup generator 
train_generator = train_datagen.flow_from_directory(
     train_data_dir, 
     target_size=(img_height, img_width), 
     batch_size=batch_size, 
     class_mode='categorical') 

# train model 
model.fit_generator(
    train_generator, 
    steps_per_epoch=nb_train_samples, 
    epochs=epochs, 
    validation_data=validation_generator, # optional - if used needs to be defined 
    validation_steps=nb_validation_samples)

顯然，要確定幾個參數（train_data_dir，nb_train_samples等），但希望你的想法。

如果您還需要使用validation_generator（如我的示例中所示），則應該按照與train_generator相同的方式進行定義。

UPDATE（後評論）

步驟2需要一定的討論;這裏，x_train是理想情況下應該放入主存的實際數據。另外（documentation），這一步是

Only required if featurewise_center or featurewise_std_normalization or zca_whitening.

然而，也有許多現實世界的情況下，所有的訓練數據裝入內存的要求顯然是不現實的。在這種情況下，如何中心化/標準化/白色數據本身就是一個（巨大的）子領域，可以說是Spark等大數據處理框架存在的主要原因。

那麼，這裏的實踐要做什麼？那麼，在這種情況下的下一個邏輯行動是樣本您的數據;事實上，這正是社會各界建議 - 這裏是Keras創作者弗朗索瓦CHOLLET上Working with large datasets like Imagenet：

datagen.fit(X_sample) # let's say X_sample is a small-ish but statistically representative sample of your data

，並從ongoing open discussion有關擴展（強調）ImageDataGenerator另一句名言：

fit is required for feature-wise standardization and ZCA , and it only takes an array as parameter, there is no fit for directory. For now, we need to manually read a subset of the image to do this fit for a directory. One idea is we can change fit() to accept the generator itself(flow_from_directory), of course, standardization should be disabled during fit.

希望這有助於...

來源

2017-10-12 12:26:30 desertnaut

謝謝desertnaut。 Thant是我如何設置它的。但是，我不明白的恰恰是第2點。如何在train_datagen.fit（x_train）中定義x_train？它不會使用我的training_folder作爲輸入，並且由於內存限制，我無法從所有圖像創建一個numpy數組。 –

@MarioKreutzfeldt你是對的 - 看到更新 – desertnaut

這真的很有幫助！像這樣的事情使得使用keras作爲初學者很困難。即使chollets書在這個問題上也不是很清楚！ –

Keras：使用flow_from_directory將訓練數據擬合圖像增強

回答

相關問題