2012-07-16 89 views
0

有一些問題如下開始newbeam:啓動失敗梁在主機和無法在主機上

==> 20120712-1611/[email protected] <== 

=INFO REPORT==== 12-Jul-2012::16:12:45 === 
    ts_config_server:(0:<0.100.0>) Can't start newbeam on host tester1 (reason: timeout) ! Aborting! 

=INFO REPORT==== 12-Jul-2012::16:12:45 === 
    ts_config_server:(0:<0.99.0>) Can't start newbeam on host tester2 (reason: timeout) ! Aborting! 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.74.0>) Fail to start beam on host "web1-1b" ({error, 
            timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.74.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.372>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"web1-1b",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"web1-1b", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.82.0>) Fail to start beam on host "master3" ({error, 
            timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.82.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.405>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"master3",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"master3", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.80.0>) Fail to start beam on host "master1" ({error, 
            timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.80.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.397>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"master1",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"master1", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.81.0>) Fail to start beam on host "master2" ({error, 
            timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.81.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.400>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"master2",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"master2", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.78.0>) Fail to start beam on host "memcache-1a" ({error, 
             timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.78.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.386>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"memcache-1a",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"memcache-1a", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.79.0>) Fail to start beam on host "memcache-1b" ({error, 
             timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.79.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.392>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"memcache-1b",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"memcache-1b", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.76.0>) Fail to start beam on host "task1" ({error, 
             timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.76.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.374>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"task1",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"task1", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.77.0>) Fail to start beam on host "ffmpeg1" ({error, 
            timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.77.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.380>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"ffmpeg1",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"ffmpeg1", 
            {},10000, 
            {global, 
            ts_mon}} 
=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(3:<0.73.0>) Fail to start beam on host "web1-1a" ({error, 
            timeout}) 

=ERROR REPORT==== 12-Jul-2012::16:12:46 === 
** Generic server <0.73.0> terminating 
** Last message in was {timeout,#Ref<0.0.0.364>,start_beam} 
** When Server state == {state,{global,ts_mon}, 
        10000,undefined,"web1-1a",undefined} 
** Reason for termination == 
** {error,timeout} 

=INFO REPORT==== 12-Jul-2012::16:12:46 === 
    ts_os_mon_erlang:(5:<0.67.0>) starting os_mon_erlang with args {"web1-1a", 
            {},10000, 
            {global, 
            ts_mon}} 

這裏是我的/ etc/hosts文件(tester0):

127.0.0.1 localhost 

# The following lines are desirable for IPv6 capable hosts 
::1 ip6-localhost ip6-loopback 
fe00::0 ip6-localnet 
ff00::0 ip6-mcastprefix 
ff02::1 ip6-allnodes 
ff02::2 ip6-allrouters 
ff02::3 ip6-allhosts 

10.0.3.192 vm0 
10.0.3.199 vm1 
10.0.3.242 vm2 

10.100.238.56 master1 
10.90.245.66 master2 
10.70.78.51master3 
10.37.53.46 web1-1a 
10.94.245.79 web1-1b 
10.127.46.19 task1 
10.35.99.161 ffmpeg1 
10.243.63.212 memcache-1a 
10.223.50.72 memcache-1b 
10.29.155.171 tester0 
10.78.159.23 tester1 
10.78.149.115 tester2 

每當我啓動實例(並且它們都具有相同版本的Erlang,我從源代碼構建它),我運行以下腳本:

#!/bin/bash 

for i in web1-1a web1-1b task1 ffmpeg1 master1 master2 master3 memcache-1a memcache-1b tester1 tester2; do 
    ssh $i -i ~/.ssh/amazon-key.pem "echo \"<MY PUB SSH KEY IN HERE>" | tee -a ~/.ssh/authorized_keys; ssh-keygen -t rsa << hereintime 



hereintime; sudo hostname $i; exit" &> /dev/null 
    ssh $i "echo \"host * 
    user <myuser> 
    StrictHostKeyChecking no\" | tee -a .ssh/config; sudo sed -i.bak -e \"s/localhost/localhost $i/\" -e \"/$i/d\" /etc/hosts; echo \"# need to have ssh-agent running 
eval \`ssh-agent\` 
[ -e /home/<myuser>/.ssh/id_rsa.pub ] && ssh-add\" | tee -a ~/.bashrc" &> /dev/null 
    newhostline=`grep $i /etc/hosts` 
    ssh $i "sudo sed -i -e \"/$i/d\" /etc/hosts; echo $newhostline | sudo tee -a /etc/hosts" &> /dev/null 
    [ "${i:0:-1}" == "tester" ] && tester0=`grep tester0 /etc/hosts` && ssh $i "sudo sed -i -e '/tester0/d' /etc/hosts" &> /dev/null 
    ssh $i "rm ~/.ssh/known_hosts; echo $tester0 | sudo tee -a /etc/hosts; ssh tester0 \"exit\"" &> /dev/null 
    ssh $i "cat ~/.ssh/id_rsa.pub" | tee -a ~/.ssh/authorized_keys 
    ssh $i "sudo hostname $i; exit" 
done 

而且我完全能夠運行你的文檔已經規定的測試(如):

# ssh tester1 erl 
Eshell V5.9.1 (abort with ^G) 
1> inet:gethostname(). 
{ok,"tester1"} 

之類的東西在頁面描述:https://support.process-one.net/doc/display/ERL/Starting+a+set+of+Erlang+cluster+nodes

-module(cluster). 
-export([slaves/1]). 

%% Argument: 
%% %% Hosts: List of hostname (string) 
slaves([]) -> 
ok; 
slaves([Host|Hosts]) -> 
Args = erl_system_args(), 
NodeName = "cluster", 
{ok, Node} = slave:start_link(Host, NodeName, Args), 
io:format("Erlang node started = [~p]~n", [Node]), 
slaves(Hosts). 

erl_system_args()-> 
Shared = case init:get_argument(shared) of 
    error -> " "; 
    {ok,[[]]} -> " -shared " 
end, 
    lists:append(["-rsh ssh -setcookie", 
     atom_to_list(erlang:get_cookie()), 
     Shared, " +Mea r10b "]). 

%% Do not forget to start erlang with a command like: 
%% erl -rsh ssh -sname clustmaster 

然後我(在tester0)運行:

# erl -rsh ssh -sname clustmaster 
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [async-threads:0] [kernel-poll:false] 

Eshell V5.9.1 (abort with ^G) 
([email protected])1> c(cluster). 
{ok,cluster} 
([email protected])2> cluster:slaves(["tester1","tester2"]). 
** exception error: no match of right hand side value {error,timeout} 
    in function cluster:slaves/1 (cluster.erl, line 11) 
([email protected])3> cluster:slaves(["tester0"]).   
Erlang node started = [[email protected]] 
ok 

這是有道理的,因爲:

([email protected])14> slave:start_link("tester0", "cluster", " -rsh ssh -setcookieVTJKCGTPGNTMRAUDYLBU +Mea r10b"). 
{ok,[email protected]} 
([email protected])15> slave:start_link("tester1", "cluster", " -rsh ssh -setcookieVTJKCGTPGNTMRAUDYLBU +Mea r10b"). 
{error,timeout} 

奇怪?

([email protected])5> inet:gethostbyname("tester1"). 
{ok,{hostent,"tester1",[],inet,4,[{10,78,159,23}]}} 
([email protected])6> inet:gethostbyname("tester2"). 
{ok,{hostent,"tester2",[],inet,4,[{10,78,149,115}]}} 

# ping -c 1 tester1 
PING tester1 (10.78.159.23) 56(84) bytes of data. 
64 bytes from tester1 (10.78.159.23): icmp_req=1 ttl=56 time=1.69 ms 
# ping -c 1 tester2 
PING tester2 (10.78.149.115) 56(84) bytes of data. 
64 bytes from tester2 (10.78.149.115): icmp_req=1 ttl=56 time=2.03 ms 
+0

你'bash'腳本登錄,擺弄SSH設置,並更改主機名看起來很奇怪;你是否真正改變了機器的主機名,而不是僅僅在啓動時使用'/ etc/hostname'來設置主機名?我想知道程序在運行時如何容忍更改的主機名。 (整個事情看起來已經成熟,可以用複製到目標機器的腳本進行替換並運行 - 而不是每臺機器登錄六次(!)。)記錄日誌時使用鍵「〜/ .ssh/amazon-key.pem」但是不要將它用於'erl -rsh'命令 - 使用'〜/ .ssh/config'來設置這些主機的密鑰? – sarnold 2012-07-17 02:12:54

+0

我知道bash腳本可以更改爲僅登錄一次,但我還將標準id_rsa.pub定義放入要登錄的主機中(從負載測試框中),然後從客戶端/服務器獲取一個背部。 我能夠自由地(無密碼)ssh到我的任何客戶端/服務器,並回到負載測試主機本身。我也可以: ssh erl 並獲得erlang命令行罰款。只是似乎erlang本身並沒有正確地連接到主控制器之外的任何東西。 同樣在ec2上,更改/ etc/hostname在重新啓動時沒有diff效應和clobbers – ikosuave 2012-07-18 13:43:04

回答

1

發現與我的腳本獲取SSH密鑰到遠程服務器的問題。我用的是:

sudo sed -i.bak -e \"s/localhost/localhost $i/\" -e \"/$i/d\" /etc/hosts; 

[ "${i:0:-1}" == "tester" ] && tester0=`grep tester0 /etc/hosts` && ssh $i "sudo sed -i -e '/tester0/d' /etc/hosts" &> /dev/null 

這是抹殺我的 「控制器」 的從服務器地址。現在它已被刪除,我可以:

slave:start_linke(Host,Name,Args)