我有部署在YARN(Hadoop 2.6.0/CDH 5.5)上的Spark版本(1.6,2.0,2.1)。我試圖保證某個應用程序永遠不會在我們的YARN集羣上缺乏資源,無論這些應用程序在那裏運行什麼。在YARN上運行時,Spark調度程序池如何工作?
我已啓用shuffle服務並設置了一些Fair Scheduler Pools,如Spark文檔中所述。我創建了高優先級應用我想永遠不會被餓死的資源的一個單獨的游泳池,並賦予它資源的minShare
:
<?xml version="1.0"?>
<allocations>
<pool name="default">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>0</minShare>
</pool>
<pool name="high_priority">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>24</minShare>
</pool>
</allocations>
當我運行我們YARN集羣上的星火應用程序,我可以看到,池我配置被認可:
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1
不過,我沒有看到,我的應用程序正在使用新high_priority
池,即使我在我的電話設置spark.scheduler.pool
到。因此,當集羣由常規性活動掛鉤,這意味着,我的高優先級的應用程序沒有得到其所需的資源:
17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default
17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
缺少什麼我在這裏?我的同事和我嘗試在YARN中啓用先發制人,但沒有做任何事情。然後我們意識到在YARN中有一個與Spark調度程序池非常相似的概念,稱爲YARN queues。所以現在我們不確定這兩個概念是否有衝突。
我們如何才能讓我們的高優先級池按預期工作? Spark調度器池和YARN隊列之間是否存在某種衝突?