2017-06-14 69 views
1

我已經創建了一個模擬CNN,我試圖在視頻數據集上使用。 我將測試數據設置爲所有幀的全部單個圖像以獲得正面示例,將0設置爲負面示例。我認爲這會很快學會。但它根本不動。 在Windows 10 64bit上使用當前版本的Keras & Tensorflow。CNN學習停滯

第一個問題,我的邏輯錯了嗎?我應該期望這個測試數據的學習能夠快速達到高精度嗎?

我的模型或參數有什麼問題嗎?我一直在嘗試一些改變,但仍然遇到同樣的問題。

樣本量(56)是否太小?

# testing feature extraction model. 
import time 
import numpy as np, cv2 
import sys 
import os 
import keras 
import tensorflow as tf 

from keras.models import Sequential 
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization 
from keras.layers import Conv3D, MaxPooling3D 

from keras.optimizers import SGD,rmsprop, adam 

from keras import regularizers 
from keras.initializers import Constant 

from keras.models import Model 

#set gpu options 
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=.99, allocator_type = 'BFC') 
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options)) 
config = tf.ConfigProto() 

batch_size = 5 
num_classes = 1 
epochs = 50 
nvideos = 56 
nframes = 55 
nchan = 3 
nrows = 480 
ncols = 640 

#load any single image, resize if needed 
img = cv2.imread('C:\\Users\\david\\Documents\\AutonomousSS\\single frame.jpg',cv2.IMREAD_COLOR) 
img = cv2.resize(img,(640,480)) 

x_learn = np.random.randint(0,255,(nvideos,nframes,nrows,ncols,nchan),dtype=np.uint8) 
y_learn = np.array([[1],[1],[1],[0],[1],[0],[1],[0],[1],[0], 
        [1],[0],[0],[1],[0],[0],[1],[0],[1],[0], 
        [1],[0],[1],[1],[0],[1],[0],[0],[1],[1], 
        [1],[0],[1],[0],[1],[0],[1],[0],[1],[0], 
        [0],[1],[0],[0],[1],[0],[1],[0],[1],[0], 
        [1],[1],[0],[1],[0],[0]],np.uint8) 

#each sample, each frame is either the single image for postive examples or 0 for negative examples. 

for i in range (nvideos): 
    if y_learn[i] == 0 : 
     x_learn[i]=0 
    else: 
     x_learn[i,:nframes]=img 



#build model  
m_loss = 'mean_squared_error' 
m_opt = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True) 
m_met = 'acc' 


model = Sequential() 

# 1st layer group 
model.add(Conv3D(32, (3, 3,3), activation='relu',padding="same", name="conv1a", strides=(3, 3, 3), 
       kernel_initializer = 'glorot_normal', 
       trainable=False, 
       input_shape=(nframes,nrows,ncols,nchan))) 
#model.add(BatchNormalization(axis=1)) 
model.add(Conv3D(32, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv1b", activation="relu")) 
#model.add(BatchNormalization(axis=1)) 
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool1", strides=(2, 2, 2))) 


# 2nd layer group 
model.add(Conv3D(128, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv2a", activation="relu")) 
model.add(Conv3D(128, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv2b", activation="relu")) 
#model.add(BatchNormalization(axis=1)) 
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool2", strides=(2, 2, 2))) 

# 3rd layer group 
model.add(Conv3D(256, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv3a", activation="relu")) 
model.add(Conv3D(256, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv3b", activation="relu")) 
#model.add(BatchNormalization(axis=1)) 
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool3", strides=(2, 2, 2))) 

# 4th layer group 
model.add(Conv3D(512, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv4a", activation="relu")) 
model.add(Conv3D(512, (3, 3, 3), trainable=False, strides=(1, 1, 1), padding="same", name="conv4b", activation="relu")) 
#model.add(BatchNormalization(axis=1)) 
model.add(MaxPooling3D(padding="valid", trainable=False, pool_size=(1, 5, 5), name="pool4", strides=(2, 2, 2))) 

model.add(Flatten(name='flatten',trainable=False)) 

model.add(Dense(512,activation='relu', trainable=True,name='den0')) 

model.add(Dense(num_classes,activation='softmax',name='den1')) 
print (model.summary()) 

#compile model 
model.compile(loss=m_loss, 
       optimizer=m_opt, 
       metrics=[m_met]) 
print ('compiled') 


#set callbacks 
from keras import backend as K 
K.set_learning_phase(0) #set learning phase 
tb = keras.callbacks.TensorBoard(log_dir=sample_root_path+'logs', histogram_freq=0, 
          write_graph=True, write_images=False) 
tb.set_model(model) 
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.2,verbose=1, 
       patience=2, min_lr=0.000001) 
reduce_lr.set_model(model) 
ear_stop = keras.callbacks.EarlyStopping(monitor='loss', min_delta=0, patience=4, verbose=1, mode='auto') 
ear_stop.set_model(model) 


#fit 

history = model.fit(x_learn, y_learn, 
        batch_size=batch_size, 
        callbacks=[reduce_lr,tb, ear_stop], 
        verbose=1, 
        validation_split=0.1, 
        shuffle = True, 
        epochs=epochs) 


score = model.evaluate(x_learn, y_learn, batch_size=batch_size) 
print(str(model.metrics_names) + ": " + str(score)) 

像往常一樣,感謝您的任何和所有幫助。

相加的輸出...

_________________________________________________________________ 
Layer (type)     Output Shape    Param # 
================================================================= 
conv1a (Conv3D)    (None, 19, 160, 214, 32) 2624  
_________________________________________________________________ 
conv1b (Conv3D)    (None, 19, 160, 214, 32) 27680  
_________________________________________________________________ 
pool1 (MaxPooling3D)   (None, 10, 78, 105, 32) 0   
_________________________________________________________________ 
conv2a (Conv3D)    (None, 10, 78, 105, 128) 110720  
_________________________________________________________________ 
conv2b (Conv3D)    (None, 10, 78, 105, 128) 442496  
_________________________________________________________________ 
pool2 (MaxPooling3D)   (None, 5, 37, 51, 128) 0   
_________________________________________________________________ 
conv3a (Conv3D)    (None, 5, 37, 51, 256) 884992  
_________________________________________________________________ 
conv3b (Conv3D)    (None, 5, 37, 51, 256) 1769728 
_________________________________________________________________ 
pool3 (MaxPooling3D)   (None, 3, 17, 24, 256) 0   
_________________________________________________________________ 
conv4a (Conv3D)    (None, 3, 17, 24, 512) 3539456 
_________________________________________________________________ 
conv4b (Conv3D)    (None, 3, 17, 24, 512) 7078400 
_________________________________________________________________ 
pool4 (MaxPooling3D)   (None, 2, 7, 10, 512)  0   
_________________________________________________________________ 
flatten (Flatten)   (None, 71680)    0   
_________________________________________________________________ 
den0 (Dense)     (None, 512)    36700672 
_________________________________________________________________ 
den1 (Dense)     (None, 1)     513  
================================================================= 
Total params: 50,557,281 
Trainable params: 36,701,185 
Non-trainable params: 13,856,096 
_________________________________________________________________ 
None 
compiled 
Train on 50 samples, validate on 6 samples 
Epoch 1/50 
50/50 [==============================] - 20s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 2/50 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 3/50 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 4/50 
45/50 [==========================>...] - ETA: 1s - loss: 0.5111 - acc: 0.4889 
Epoch 00003: reducing learning rate to 0.00020000000949949026. 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 5/50 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 6/50 
45/50 [==========================>...] - ETA: 1s - loss: 0.5111 - acc: 0.4889 
Epoch 00005: reducing learning rate to 4.0000001899898055e-05. 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 7/50 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 8/50 
45/50 [==========================>...] - ETA: 1s - loss: 0.4889 - acc: 0.5111 
Epoch 00007: reducing learning rate to 8.000000525498762e-06. 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 9/50 
50/50 [==============================] - 16s - loss: 0.5000 - acc: 0.5000 - val_loss: 0.5000 - val_acc: 0.5000 
Epoch 00008: early stopping 
56/56 [==============================] - 12s  
['loss', 'acc']: [0.50000001516725334, 0.5000000127724239] 
+1

您能否提供關於您的目標的更多細節,您想要訓練的最終數據,特別是您爲什麼要嘗試在單個圖像上進行訓練?並且,由於您將所有圖層設置爲不可訓練(除了最後一個密集圖層):您是否加載了任何預訓練的重量?我沒有看到你導入像VGG或Inception這樣的Keras應用程序,否則就加載任何權重。 – petezurich

+0

最終目標是訓練某一動作。這是幀之間的一系列移動。上面的測試只是一個測試例子。我得到相同的行爲,正面的​​例子是1)所有幀1,2)幀是隨機的,3)幀是真實的視頻序列。我不認爲在Keras應用程序中建立將會有所幫助。 – DSP209

+0

好的,謝謝澄清一些觀點。據我瞭解,你想調整預訓模型。如果是這樣的話:你如何加載重量?凱拉斯應用程序正是一種很好的方法。例如,參見Keras的這個教程:https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html否則你的網絡不會學到任何東西,因爲幾乎所有的圖層都是設置爲不可訓練。從你的代碼:'... trainable = False ...' – petezurich

回答

0

你的層被設置爲trainable=False(除了最後緻密層)。因此你的CNN無法學習。此外,你不能僅僅訓練一個樣本。

如果您遇到GPU切換到CPU或AWS的性能問題或縮小圖像大小。

+0

我可能不清楚,我想培養一個3d網絡。 keras應用程序適用於2D圖像,因此我無法瞭解它們的預定義值如何提供幫助。 – DSP209

+0

好的。然後再說一遍:爲什麼你將ConvNet的95%設置爲可培訓= False? – petezurich

+0

我正在遵循最近的研究論文中描述的方法。我曾嘗試降低數據大小並添加更多圖層以便可訓練。沒有什麼區別。 – DSP209