2016-05-12 72 views
0

documentation看來,當節點進入mesos節點進入維護模式時,它向所有框架發送反向報價。我對此的解釋是,像馬拉松這樣的框架應該接受這些反向提議,並努力將任務從計劃維護的節點遷移出去。沒有從mesos節點遷移的馬拉松任務進入排空模式

我安排60秒維修從現在開始使用API​​:

curl -X POST leader.mesos:5050/maintenance/schedule \ 
    --data '{"windows": [{"machine_ids":[{"hostname": "host43.local"}], "unavailability": {"start": {"nanoseconds": '$(($(date +%s) + 60))'000000000}, "duration": {"nanoseconds": 3600000000000}}}]}' 

然後,我查詢維修狀態,並且可以確認它正在消耗:

$ curl leader.mesos:5050/maintenance/status | jq . 
{ 
    "draining_machines": [ 
    { 
     "id": { 
     "hostname": "host43.local" 
     } 
    } 
    ] 
} 

最後,一旦窗口我倒了:

curl -X POST leader.mesos:5050/machine/down --data '[{"hostname": "host43.local"}]' 

我確認它生效了:

$ curl leader.mesos:5050/maintenance/status | jq . 
{ 
    "down_machines": [ 
    { 
     "hostname": "hsot43.local" 
    } 
    ] 
} 

然後,我檢查馬拉松(通過用戶界面),並看到仍有任務在host43.local上運行。

我在馬拉松日誌中看到此錯誤信息,我不知道它是否是有關:

May 12 11:46:02 host43.local start[126170]: [2016-05-12 11:46:02,581] ERROR not currently active (Actor[akka://marathon/user/taskTracker#-1732573467]) (akka.actor.OneForOneStrategy:marathon-akka.actor.default-dispatcher-17) 
May 12 11:46:02 host43.local start[126170]: java.lang.IllegalStateException: not currently active (Actor[akka://marathon/user/taskTracker#-1732573467]) 
May 12 11:46:02 host43.local start[126170]: at mesosphere.marathon.core.leadership.impl.WhenLeaderActor$$anonfun$1.applyOrElse(WhenLeaderActor.scala:38) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at akka.actor.Actor$class.aroundReceive(Actor.scala:465) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at mesosphere.marathon.core.leadership.impl.WhenLeaderActor.aroundReceive(WhenLeaderActor.scala:20) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at akka.actor.ActorCell.invoke(ActorCell.scala:487) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at akka.dispatch.Mailbox.run(Mailbox.scala:221) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at akka.dispatch.Mailbox.exec(Mailbox.scala:231) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [marathon-assembly-1.1.1.jar:1.1.1] 
May 12 11:46:02 host43.local start[126170]: [2016-05-12 11:46:02,581] INFO Killing 1 instances from 1 (mesosphere.marathon.upgrade.TaskKillActor:marathon-akka.actor.default-dispatcher-17) 

如果我手動殺死馬拉松的任務,他們似乎並沒有得到關於節點分配經受保養。看起來行爲應該是節點自動遷移,我不知道我在做什麼錯誤,或者如果我遇到錯誤,或者如果我誤解了文檔和預期的行爲。

跑馬拉松1.1.1和Mesos 0.28

回答