「尷尬平行」編程使用羣集上的python和PBS

我有一個產生數字的函數（神經網絡模型）。我希望使用PBS在帶有Torque的標準集羣上測試幾個參數，方法和不同的輸入（意味着數百次函數運行）。「尷尬平行」編程使用羣集上的python和PBS

注意：我嘗試了parallelpython，ipython等，並且從來沒有完全滿意，因爲我想要簡單一些。該集羣處於一個我無法改變的給定配置中，這樣一個集成python + qsub的解決方案肯定會使社區受益。

爲了簡化問題，我有一個簡單的函數，例如：

import myModule 
def model(input, a= 1., N=100): 
    do_lots_number_crunching(input, a,N) 
    pylab.savefig('figure_' + input.name + '_' + str(a) + '_' + str(N) + '.png')

input哪裏是表示輸入的物體，input.name是一個字符串，和do_lots_number_crunching可能持續小時。

我的問題是：是否有改造類似的參數，如

for a in pylab.linspace(0., 1., 100): 
    model(input, a)

成「東西」，將推出一個PBS腳本每次調用model功能的掃描以正確的方式？

#PBS -l ncpus=1 
#PBS -l mem=i1000mb 
#PBS -l cput=24:00:00 
#PBS -V 
cd /data/work/ 
python experiment_model.py

我在想一個函數，將包括PBS模板，並從Python腳本調用它，但不是可能尚未弄明白的（裝飾？）。

來源

2010-07-22 meduz

pbs_python [1]可以爲此工作。如果experiment_model.py 'A' 作爲參數，你可以做

import pbs, os 

server_name = pbs.pbs_default() 
c = pbs.pbs_connect(server_name) 

attopl = pbs.new_attropl(4) 
attropl[0].name = pbs.ATTR_l 
attropl[0].resource = 'ncpus' 
attropl[0].value = '1' 

attropl[1].name = pbs.ATTR_l 
attropl[1].resource = 'mem' 
attropl[1].value = 'i1000mb' 

attropl[2].name = pbs.ATTR_l 
attropl[2].resource = 'cput' 
attropl[2].value = '24:00:00' 

attrop1[3].name = pbs.ATTR_V 

script=''' 
cd /data/work/ 
python experiment_model.py %f 
''' 

jobs = [] 

for a in pylab.linspace(0.,1.,100): 
    script_name = 'experiment_model.job' + str(a) 
    with open(script_name,'w') as scriptf: 
     scriptf.write(script % a) 
    job_id = pbs.pbs_submit(c, attropl, script_name, 'NULL', 'NULL') 
    jobs.append(job_id) 
    os.remove(script_name) 

print jobs

[1]：https://oss.trac.surfsara.nl/pbs_python/wiki/TorqueUsage pbs_python

來源

2010-07-22 12:29:09 macedoine

爲此，您可以輕鬆地使用jug（這是我的一個類似的設置開發）。

你會在寫入文件（例如，model.py）：

@TaskGenerator 
def model(param1, param2): 
    res = complex_computation(param1, param2) 
    pyplot.coolgraph(res) 


for param1 in np.linspace(0, 1.,100): 
    for param2 in xrange(2000): 
     model(param1, param2)

就是這樣！

現在您可以在您的隊列上啓動「jug jobs」：jug execute model.py並且這將自動並行。什麼情況是，每個作業將在，一個循環，這樣做：

while not all_done(): 
    for t in tasks in tasks_that_i_can_run(): 
     if t.lock_for_me(): t.run()

（它實際上比這更復雜，但你明白了吧）。

它使用文件系統鎖定（如果你在NFS系統上）或者redis服務器（如果你願意的話）。它也可以處理任務之間的依賴關係。

這不完全是你要求的，但我相信這是一個更清潔的架構，可以將它從作業排隊系統中分離出來。

來源

2010-10-22 17:26:57 luispedro

看起來我晚了一點，但我也有同樣的問題，如何在幾年前將尷尬的並行問題映射到python的羣集上，並編寫我自己的解決方案。我最近上傳到GitHub上的位置：https://github.com/plediii/pbs_util

要使用pbs_util編寫程序，我將首先在包含工作目錄下創建一個pbs_util.ini

[PBSUTIL] 
numnodes=1 
numprocs=1 
mem=i1000mb 
walltime=24:00:00

然後Python腳本這樣

import pbs_util.pbs_map as ppm 

import pylab 
import myModule 

class ModelWorker(ppm.Worker): 

    def __init__(self, input, N): 
     self.input = input 
     self.N = N 

    def __call__(self, a): 
     myModule.do_lots_number_crunching(self.input, a, self.N) 
     pylab.savefig('figure_' + self.input.name + '_' + str(a) + '_' + str(self.N) + '.png') 



# You need "main" protection like this since pbs_map will import this file on the  compute nodes 
if __name__ == "__main__": 
    input, N = something, picklable 
    # Use list to force the iterator 
    list(ppm.pbs_map(ModelWorker, pylab.linspace(0., 1., 100), 
        startup_args=(input, N), 
        num_clients=100))

而且這樣做。

來源

2012-04-10 06:14:12 plediii

我剛開始使用集羣和EP應用程序。我的目標（我與圖書館合作）是學習足夠的知識，以幫助校園其他研究人員訪問HP應用程序，尤其是STEM以外的研究人員。我還是很新的，但認爲它可以幫助這個問題指出在PBS腳本中使用GNU Parallel來啓動具有不同參數的基本Python腳本。在.pbs文件中，有兩行指出：

module load gnu-parallel # this is required on my environment 

parallel -j 4 --env PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \ 
--workdir $NODE_LOCAL_DIR --transfer --return 'output.{#}' --clean \ 
`pwd`/simple.py '{#}' '{}' ::: $INPUT_DIR/input.* 

# `-j 4` is the number of processors to use per node, will be cluster-specific 
# {#} will substitute the process number into the string 
# `pwd`/simple.py `{#}` `{}` this is the command that will be run multiple times 
# ::: $INPUT_DIR/input.* all of the files in $INPUT_DIR/ that start with 'input.' 
#  will be substituted into the python call as the second(3rd) argument where the 
#  `{}` resides. These can be simple text files that you use in your 'simple.py' 
#  script to pass the parameter sets, filenames, etc.

作爲一個牛逼到EP超級計算，儘管我還沒有理解上的「水貨」的所有其他選項，這個命令讓我用不同的參數並行啓動python腳本。如果您可以提前生成大量的參數文件，並行處理您的問題，這將很有效。例如，在參數空間上運行模擬。或者使用相同的代碼處理許多文件。

來源

2013-11-06 18:06:44

「尷尬平行」編程使用羣集上的python和PBS

回答

相關問題