Tensorflow分佈式傳遞設備

我最近安裝了用於分佈式處理的tensorflow版本。從trend，我試圖在多臺計算機上實現多個gpus，並且還發現了一些white paper的一些附加規範。我可以分別在2臺不同的計算機上運行服務器和一臺工作站，並使用會話grpc分配遠程或本地模式下的程序並運行。Tensorflow分佈式傳遞設備

我跑在本地的遠程計算機tensorflow有：

bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server \ 
--cluster_spec='local|localhost:2500' --job_name=local --task_id=0 &

，並使用服務器上

bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server \ 
--cluster_spec='worker|192.168.170.193:2500,prs|192.168.170.226:2500' --job_name=worker --task_id=0 \ 
--job_name=prs --task_id=0 &

然而，當我試圖指定設備在2臺計算機上運行同時python向我顯示錯誤：

Could not satisfy explicit device specification '/job:worker/task:0'

當我使用

with tf.device("/job:prs/task:0/device:gpu:0"): 
    x = tf.placeholder(tf.float32, [None, 784], name='x-input') 
    W = tf.Variable(tf.zeros([784, 10]), name='weights') 
with tf.device("/job:prs/task:0/device:gpu:1"): 
    b = tf.Variable(tf.zeros([10], name='bias')) 
# Use a name scope to organize nodes in the graph visualizer 
with tf.device("/job:worker/task:0/device:gpu:0"): 
    with tf.name_scope('Wx_b'): 
    y = tf.nn.softmax(tf.matmul(x, W) + b)

甚至changin工作的名稱。所以我想知道是否需要Add a New Device或者我在做羣集初始化時出錯。

來源

2016-03-06 Nelson Yalta

工作人員確實是集羣的名稱。

你的第一個電話巴澤勒應該是這樣的：

bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server \ --cluster_spec='worker|192.168.170.193:2500;192.168.170.226:2501' --job_name=worker --task_id=0 &

你的第一個節點上運行

，192.168.170.193

您的羣集名稱是工人，其中包括的IP地址兩個節點。然後該任務引用兩個正在運行的節點。您必須在兩個節點上啓動協議，爲每個節點指定不同的任務ID，即。你的第二個節點上

bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server \ 
--cluster_spec='worker|192.168.170.193:2500;192.168.170.226:2501' --job_name=worker --task_id=1 &`

，192.168.170.226

然後運行：然後運行

with tf.device("/job:worker/task:0/device:gpu:0"): 
    x = tf.placeholder(tf.float32, [None, 784], name='x-input') 
    W = tf.Variable(tf.zeros([784, 10]), name='weights') 
with tf.device("/job:worker/task:0/device:gpu:1"): 
    b = tf.Variable(tf.zeros([10], name='bias')) 
# Use a name scope to organize nodes in the graph visualizer 
with tf.device("/job:worker/task:1/device:gpu:0"): 
    with tf.name_scope('Wx_b'): 
    y = tf.nn.softmax(tf.matmul(x, W) + b)

來源

2016-03-10 06:24:42 LKT

歡迎SO。請訪問幫助中心，查看如何[回答]（http://stackoverflow.com/help/answering）這些問題。如果您面臨與問題相同的問題，請將其添加爲評論。 – UditS

這個答案很好，但如果你解釋瞭如何使用'ClusterSpec'和'Server'，那麼這個答案可能會更好。 – AkiRoss

TF開發人員已經大大改善了他們的文檔，可以在這裏找到：https://www.tensorflow.org/how_tos/distributed/ – LKT

Tensorflow分佈式傳遞設備

回答

相關問題