2017-04-22 136 views
0

我重新訓練了Inception的最後一層,並使用tensorflow.com的this tutorial對其進行了再培訓。我是Tensorflow的初學者,我的目標是爲工作中的項目分類30.000張照片。Tensorflow初始批次分類在每次迭代中速度較慢

將最後一層重新訓練成我自己的標籤後,我抓住了大約20張看不見的照片,並將它們(完整文件路徑)添加到熊貓數據框中。接下來,我將數據幀中的每個圖像提供給圖像分類器,並在分類之後,將相應的最高預測標籤和可靠性分數添加到同一行中的其他兩列。

爲了將圖片提供給分類器,我使用了df.iterrows(),df.apply(function)以及3個單獨的硬編碼文件路徑(請參閱下面的代碼,我將它們留下評論)。但是,我發現對照片進行分類需要花費更長的時間,無論我給圖片添加照片的方式如何。圖片[0]以2.2秒的分類時間開始,但通過圖片[19],這已增加到23秒。想象一下,在圖10.000,20.000等時需要多長時間。此外,文件被分類時,cpu和內存使用量也增加緩慢,儘管它們並沒有顯着增加。

請參閱下面的代碼(其大部分內容,保存大熊貓和分類激活部分,取自上面tensorflow教程中提到的this示例)。

import os 
import tensorflow as tf, sys 
import pandas as pd 
import gc 
import numpy as np 
import tensorflow as tf 
import time 
import psutil  


modelFullPath = '/Users/jaap/tf_files/retrained_graph.pb' 
labelsFullPath = '/Users/jaap/tf_files/retrained_labels.txt'  

def create_graph(): 
    """Creates a graph from saved GraphDef file and returns a saver.""" 
    # Creates graph from saved graph_def.pb. 
    with tf.gfile.FastGFile(modelFullPath, 'rb') as f: 
     graph_def = tf.GraphDef() 
     graph_def.ParseFromString(f.read()) 
     _ = tf.import_graph_def(graph_def, name='')  


def run_inference_on_image(image): 
    answer = None 
    imagePath = image 
    print imagePath 
    if not tf.gfile.Exists(imagePath): 
     tf.logging.fatal('File does not exist %s', imagePath) 
     return answer  

    image_data = tf.gfile.FastGFile(imagePath, 'rb').read()  

    # Creates graph from saved GraphDef. 
    create_graph()  

    with tf.Session() as sess:  

     softmax_tensor = sess.graph.get_tensor_by_name('final_result:0') 
     predictions = sess.run(softmax_tensor, 
           {'DecodeJpeg/contents:0': image_data}) 
     predictions = np.squeeze(predictions)  

     top_k = predictions.argsort()[-5:][::-1] # Getting top 5 predictions 
     f = open(labelsFullPath, 'rb') 
     lines = f.readlines() 
     labels = [str(w).replace("\n", "") for w in lines] 
     for node_id in top_k: 
      human_string = labels[node_id] 
      score = predictions[node_id] 
      print('%s (score = %.5f)' % (human_string, score)) 
      return human_string, score  


werkmap = '/Users/jaap/tf_files/test/' 
filelist = [] 
files_in_dir = os.listdir('/Users/jaap/tf_files/test/') 
for f in files_in_dir: 
    if f != '.DS_Store': 
     filelist.append(werkmap+f)  

df = pd.DataFrame(filelist, index=None, columns=['Pics']) 
df = df.drop_duplicates() 
df['Class'] = '' 
df['Reliability'] = ''  

print(df)  


#-------------------------------------------------------- 
for index, pic in df.iterrows(): 
    start = time.time() 
    df['Class'][index] = run_inference_on_image(pic[0]) 
    stop = time.time() 
    duration = stop - start 
    print("duration = %s" % duration) 
    print("cpu usage: %s" % psutil.cpu_percent()) 
    print("memory usage: %s " % psutil.virtual_memory()) 
    print("") 

df['Class'] = df['Class'].astype(str) 
df['Class'], df['Reliability'] = df['Class'].str.split(',', 1).str  

#-------------------------------------------------   

# df['Class'] = df['Pics'].apply(run_inference_on_image) 
# df['Class'] = df['Class'].astype(str) 
# df['Class'], df['Reliability'] = df['Class'].str.split(',', 1).str 
# print(df)  

#-------------------------------------------------------------- 
# start = time.time() 
# ja = run_inference_on_image('/Users/jaap/tf_files/test/12345_1.jpg') 
# stop = time.time() 
# duration = stop - start 
# print("duration = %s" % duration) 

# start = time.time() 
# ja = run_inference_on_image('/Users/jaap/tf_files/test/12345_2.jpg') 
# stop = time.time() 
# duration = stop - start 
# print("duration = %s" % duration)  

# start = time.time() 
# ja = run_inference_on_image('/Users/jaap/tf_files/test/12345_3.jpg') 
# stop = time.time() 
# duration = stop - start 
# print("duration = %s" % duration)  

我感謝任何幫助!

+0

通過爲每張圖片通過shell腳本調用python腳本來解決此問題。分類時間現在保持穩定在〜2.5s。 Python似乎不斷向內存中添加分類信息,使得腳本在每次迭代時體積較大。 – Poehe

回答

1

看來你正在爲每個推理創建整個圖。這應該會讓它變慢。相反,您可以執行以下操作:

with tf.Graph().as_default(): 
    create_graph() 
    with tf.Session() as sess: 
    for index, pic in df.iterrows(): 
     start = time.time() 
     df['Class'][index] = run_inference_on_image(pic[0], sess) 
     stop = time.time()