0
我建立了一個IPython的平行ipcluster
使用的Sun Grid Engine和東西似乎很好地工作:ipcluster上的Sun Grid Engine僅位列0
ipcluster start -n 100 --profile=sge
2016-07-15 14:47:09.749 [IPClusterStart] Starting ipcluster with [daemon=False]
2016-07-15 14:47:09.751 [IPClusterStart] Creating pid file: /home/USERNAME/.ipython/profile_sge/pid/ipcluster.pid
2016-07-15 14:47:09.751 [IPClusterStart] Starting Controller with SGEControllerLauncher
2016-07-15 14:47:09.789 [IPClusterStart] Job submitted with job id: u'6354583'
2016-07-15 14:47:10.790 [IPClusterStart] Starting 100 Engines with SGEEngineSetLauncher
2016-07-15 14:47:10.826 [IPClusterStart] Job submitted with job id: u'6354584'
2016-07-15 14:47:40.856 [IPClusterStart] Engines appear to have started successfully
然後我從筆記本連接使用
rc = ipp.Client(profile='sge')
但是當我使用並行魔法
我所有的過程只rank 0
返回:
[stdout:0] I am #0 of 1 and run on compute-8-13.local
[stdout:1] I am #0 of 1 and run on compute-8-13.local
[stdout:2] I am #0 of 1 and run on compute-3-3.local
[stdout:3] I am #0 of 1 and run on compute-3-3.local
[stdout:4] I am #0 of 1 and run on compute-3-3.local
...
這裏是我的安裝腳本:
ipcluster_config.py
:c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher' c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher' c.SlurmEngineSetLauncher.batch_template_file = '/home/USERNAME/.ipython/profile_sge/sge.engine.template' c.SlurmControllerLauncher.batch_template_file = '/home/USERNAME/.ipython/profile_sge/sge.controller.template'
ipcontroller_config.py
:c.HubFactory.ip = '*'
sge.controller.template
# /bin/sh #$ -S /bin/sh #$ -pe orte 1 #$ -q sThC.q #$ -cwd #$ -N ipyparallel_controller #$ -o ipyparallel_controller.log #$ -e ipyparallel_controller.err module load gcc/5.3/openmpi source activate parallel ipcontroller --profile-dir={profile_dir}
sge.engine.template
# /bin/sh #$ -S /bin/sh #$ -pe orte {n} #$ -q sThC.q #$ -cwd #$ -N ipyparallel_engines #$ -o ipyparallel_engines.log #$ -e ipyparallel_engines.err module load gcc/5.3/openmpi source activate parallel mpiexec -n {n} ipengine --profile-dir={profile_dir} --timeout=30