我重新訓練了Inception的最後一層,並使用tensorflow.com的this tutorial對其進行了再培訓。我是Tensorflow的初學者,我的目標是爲工作中的項目分類30.000張照片。Tensorflow初始批次分類在每次迭代中速度較慢
將最後一層重新訓練成我自己的標籤後,我抓住了大約20張看不見的照片,並將它們(完整文件路徑)添加到熊貓數據框中。接下來,我將數據幀中的每個圖像提供給圖像分類器,並在分類之後,將相應的最高預測標籤和可靠性分數添加到同一行中的其他兩列。
爲了將圖片提供給分類器,我使用了df.iterrows(),df.apply(function)以及3個單獨的硬編碼文件路徑(請參閱下面的代碼,我將它們留下評論)。但是,我發現對照片進行分類需要花費更長的時間,無論我給圖片添加照片的方式如何。圖片[0]以2.2秒的分類時間開始,但通過圖片[19],這已增加到23秒。想象一下,在圖10.000,20.000等時需要多長時間。此外,文件被分類時,cpu和內存使用量也增加緩慢,儘管它們並沒有顯着增加。
請參閱下面的代碼(其大部分內容,保存大熊貓和分類激活部分,取自上面tensorflow教程中提到的this示例)。
import os
import tensorflow as tf, sys
import pandas as pd
import gc
import numpy as np
import tensorflow as tf
import time
import psutil
modelFullPath = '/Users/jaap/tf_files/retrained_graph.pb'
labelsFullPath = '/Users/jaap/tf_files/retrained_labels.txt'
def create_graph():
"""Creates a graph from saved GraphDef file and returns a saver."""
# Creates graph from saved graph_def.pb.
with tf.gfile.FastGFile(modelFullPath, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
def run_inference_on_image(image):
answer = None
imagePath = image
print imagePath
if not tf.gfile.Exists(imagePath):
tf.logging.fatal('File does not exist %s', imagePath)
return answer
image_data = tf.gfile.FastGFile(imagePath, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
top_k = predictions.argsort()[-5:][::-1] # Getting top 5 predictions
f = open(labelsFullPath, 'rb')
lines = f.readlines()
labels = [str(w).replace("\n", "") for w in lines]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return human_string, score
werkmap = '/Users/jaap/tf_files/test/'
filelist = []
files_in_dir = os.listdir('/Users/jaap/tf_files/test/')
for f in files_in_dir:
if f != '.DS_Store':
filelist.append(werkmap+f)
df = pd.DataFrame(filelist, index=None, columns=['Pics'])
df = df.drop_duplicates()
df['Class'] = ''
df['Reliability'] = ''
print(df)
#--------------------------------------------------------
for index, pic in df.iterrows():
start = time.time()
df['Class'][index] = run_inference_on_image(pic[0])
stop = time.time()
duration = stop - start
print("duration = %s" % duration)
print("cpu usage: %s" % psutil.cpu_percent())
print("memory usage: %s " % psutil.virtual_memory())
print("")
df['Class'] = df['Class'].astype(str)
df['Class'], df['Reliability'] = df['Class'].str.split(',', 1).str
#-------------------------------------------------
# df['Class'] = df['Pics'].apply(run_inference_on_image)
# df['Class'] = df['Class'].astype(str)
# df['Class'], df['Reliability'] = df['Class'].str.split(',', 1).str
# print(df)
#--------------------------------------------------------------
# start = time.time()
# ja = run_inference_on_image('/Users/jaap/tf_files/test/12345_1.jpg')
# stop = time.time()
# duration = stop - start
# print("duration = %s" % duration)
# start = time.time()
# ja = run_inference_on_image('/Users/jaap/tf_files/test/12345_2.jpg')
# stop = time.time()
# duration = stop - start
# print("duration = %s" % duration)
# start = time.time()
# ja = run_inference_on_image('/Users/jaap/tf_files/test/12345_3.jpg')
# stop = time.time()
# duration = stop - start
# print("duration = %s" % duration)
我感謝任何幫助!
通過爲每張圖片通過shell腳本調用python腳本來解決此問題。分類時間現在保持穩定在〜2.5s。 Python似乎不斷向內存中添加分類信息,使得腳本在每次迭代時體積較大。 – Poehe