2017-10-04 142 views
0

我對tensorflow很陌生,我試圖用批處理從我的csv文件中進行訓練。Tensorflow - 批處理問題

這是我讀的CSV文件中的代碼並進行批量

filename_queue = tf.train.string_input_producer(
    ['BCHARTS-BITSTAMPUSD.csv'], shuffle=False, name='filename_queue') 

reader = tf.TextLineReader() 
key, value = reader.read(filename_queue) 

# Default values, in case of empty columns. Also specifies the type of the 
# decoded result. 
record_defaults = [[0.], [0.], [0.], [0.], [0.],[0.],[0.],[0.]] 
xy = tf.decode_csv(value, record_defaults=record_defaults) 

# collect batches of csv in 
train_x_batch, train_y_batch = \ 
    tf.train.batch([xy[0:-1], xy[-1:]], batch_size=100) 

和這裏的訓練:

# initialize 
sess = tf.Session() 
sess.run(tf.global_variables_initializer()) 

# Start populating the filename queue. 
coord = tf.train.Coordinator() 
threads = tf.train.start_queue_runners(sess=sess, coord=coord) 


# train my model 
for epoch in range(training_epochs): 
    avg_cost = 0 
    total_batch = int(2193/batch_size) 

    for i in range(total_batch): 
     batch_xs, batch_ys = sess.run([train_x_batch, train_y_batch]) 
     feed_dict = {X: batch_xs, Y: batch_ys} 
     c, _ = sess.run([cost, optimizer], feed_dict=feed_dict) 
     avg_cost += c/total_batch 

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost)) 

coord.request_stop() 
coord.join(threads) 

這裏是我的問題:

1.

我的csv文件有2193條記錄,我的批量大小是100.所以我想要的是:在每一個「時代」開始'第一條記錄',共培訓21批次,共100條記錄,最後1批93條記錄。所以共有22批次。

但是,我發現每批有100個尺寸 - 即使是最後一個。而且,它不是從第二個「時代」開始的「第一條記錄」。

2.

如何獲取記錄大小(本例中爲2193)?我應該硬編碼嗎?還是有其他聰明的方式來做到這一點?我使用了tendor.get_shape()。as_list(),但它不適用於batch_xs。它只是返回給我空的形狀[]。

回答

1

我們最近爲TensorFlow添加了一個名爲tf.contrib.data的新API,可以更輕鬆地解決這樣的問題。 (「隊列亞軍」爲基礎的API使得它很難寫上準確的時代界限的計算,因爲起始界限丟失。)

這裏是你如何使用tf.contrib.data重寫你的程序的例子:

lines = tf.contrib.data.TextLineDataset("BCHARTS-BITSTAMPUSD.csv") 

def decode(line): 
    record_defaults = [[0.], [0.], [0.], [0.], [0.],[0.],[0.],[0.]] 
    xy = tf.decode_csv(value, record_defaults=record_defaults) 
    return xy[0:-1], xy[-1:] 

decoded = lines.map(decode) 

batched = decoded.batch(100) 

iterator = batched.make_initializable_iterator() 

train_x_batch, train_y_batch = iterator.get_next() 

然後訓練部分可以成爲一個有點簡單:

# initialize 
sess = tf.Session() 
sess.run(tf.global_variables_initializer()) 

# train my model 
for epoch in range(training_epochs): 
    avg_cost = 0 
    total_batch = int(2193/batch_size) 

    total_cost = 0.0 
    total_batch = 0 

    # Re-initialize the iterator for another epoch. 
    sess.run(iterator.initializer) 

    while True: 

    # NOTE: It is inefficient to make a separate sess.run() call to get each batch 
    # of input data and then feed it into a different sess.run() call. For better 
    # performance, define your training graph to take train_x_batch and train_y_batch 
    # directly as inputs. 
    try: 
     batch_xs, batch_ys = sess.run([train_x_batch, train_y_batch]) 
    except tf.errors.OutOfRangeError: 
     break 

    feed_dict = {X: batch_xs, Y: batch_ys} 
    c, _ = sess.run([cost, optimizer], feed_dict=feed_dict) 
    total_cost += c 
    total_batch += batch_xs.shape[0] 

    avg_cost = total_cost/total_batch 

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost)) 

有關如何使用新的API,請參閱"Importing Data" programmer's guide更多細節。

+0

所以仍然沒有辦法獲得'記錄數(2193)'? – BlakStar

+0

''total_batch'變量將在'while'循環結束時包含2193(或者記錄的實際數量)。 – mrry

+0

我今天運行它......並且它犯了錯誤。這是因爲batch_xs的形狀[7,100],所以它不能被饋送到形狀爲[?,7]的X.我閱讀了你鏈接的指南,並發現它有意成型[7,100]。但我不明白爲什麼batch_xs已經塑造[7,100]而不是[100,7] ...所以我應該改變我的訓練模型?還是有另一種方式? – BlakStar