我試圖調整下面的過程,因爲我有一個非常Java heap space error.
調整星火工作
望着星火UI,有一個cogroup
,在一個非常奇怪的方式表現。 在這個階段之前,一切看起來都很平衡(目前我已經硬編碼了48個分區)。在方法loadParentMPoint
中有cogroup轉換,基本上當我要執行下一個計數時,cogroup被計算,基本上有48個任務被安排,但是其中有47個立即終止(似乎沒有什麼可處理的),除非開始做洗牌讀取,直到它填滿堆空間並引發異常。
我已經用相同的數據集啓動了幾次進程,並且結束總是相同的。 每次它只是一個執行者,而以前是平衡的。
爲什麼我有這種行爲?也許我錯過了什麼?我在cogroup之前試過repartition
的數據,因爲我認爲它是不平衡的,但它不起作用,當我試圖使用partitionBy
時也是如此。
這是代碼摘錄:
class BillingOrderGeneratorProcess extends SparkApplicationErrorHandler {
implicit val ctx = sc
val log = LoggerFactory.getLogger(classOf[BillingOrderGeneratorProcess])
val ipc = new Handler[ConsumptionComputationBigDataIPC]
val billingOrderDao = new Handler[BillingOrderDao]
val mPointDao = new Handler[MeasurementPointDAO]
val billingOrderBDao = new Handler[BillingOrderBDAO]
val ccmDiscardBdao = new Handler[CCMDiscardBDAO]
val ccmService = new Handler[ConsumptionComputationBillingService]
val registry = new Handler[IncrementalRegistryTableData]
val podTimeZoneHelper = new Handler[PodDateTimeUtils]
val billingPodStatusDao = new Handler[BillingPodStatusBDAO]
val config = new Handler[PropertyManager]
val paramFacade = new Handler[ConsumptionParameterFacade]
val consumptionMethods = new Handler[ConsumptionMethods]
val partitions = config.get.defaultPartitions()
val appName = sc.appName
val appId = sc.applicationId
val now = new DateTime
val extracted = ctx.accumulator(0l, "Extracted from planning")
val generated = ctx.accumulator(0l, "Billing orders generated")
val discarded = ctx.accumulator(0l, "Billing orders discarded")
// initialize staging
val staging = new TxStagingTable(config.get().billingOrderGeneratorStagingArea())
staging.prepareReading
val rddExtractedFromPlanning = staging
.read[ExtractedPO]()
.repartition(48)
.setName("rddExtractedFromPlanning")
.cache
val rddExtracted = rddExtractedFromPlanning
.filter { x =>
extracted += 1
(x.getExtracted == EExtractedType.EXTRACTED ||
x.getExtracted == EExtractedType.EXTRACTED_BY_USER ||
x.getExtracted == EExtractedType.EXTRACTED_BY_TDC)
}
.map { x =>
log.info("1:extracted>{}", x)
val bo = MapperUtil.mapExtractedPOtoBO(x)
bo
}
val podWithExtractedAndLastBillingOrderPO = rddExtracted.map { e =>
val billOrdr = CCMIDGenerator.newIdentifier(CCMIDGenerator.Context.GENERATOR, e.getPod, e.getCycle(), e.getExtractionDate())
val last = billingOrderDao.get.getLastByPodExcludedActual(e.getPod, billOrdr)
log.info("2:last Billing order>{}", last);
(e.getPod, e, last)
}
.setName("podWithExtractedAndLastBillingOrderPO")
.cache()
val podWithExtractedAndLastBillingOrder = podWithExtractedAndLastBillingOrderPO.map(e => (e._1, (e._2, MapperUtil.mapBillingOrderPOtoBO(e._3))))
val rddRegistryFactoryKeys = podWithExtractedAndLastBillingOrderPO
.map(e => (e._1,1))
.reduceByKey(_+_)
.keys
val rddRegistryFactory = registry.get().createIncrementalRegistryFromPods(rddRegistryFactoryKeys, List())
val rddExtractedWithMPoint = ConsumptionComputationUtil
.groupPodWithMPoint(podWithExtractedAndLastBillingOrder, rddRegistryFactory)
.filter{ e =>
val mPoint = e._3
val condition = mPoint != null
condition match {
case false => log.error("MPoint is NULL for POD -> " + e._1)
case true =>
}
condition
}
.setName("rddExtractedWithMPoint")
.cache
rddExtractedWithMPoint.count
val rddExtractedWithMPointWithParent = ConsumptionComputationUtil
.groupWithParent(rddExtractedWithMPoint)
.map{
case (pod, extracted, measurementPoint, billOrder, parentMpointId, factory) =>
if (!parentMpointId.isEmpty) {
val mPointParent = mPointDao.get.findByMPoint(parentMpointId.get)
log.info("2.1:parentMpoin>Mpoint=" + parentMpointId + " parent for pod -> " + pod)
(pod, extracted, measurementPoint, billOrder, mPointParent.getPod, factory)
} else {
log.info("2.1:parentMpoin>Mpoint=null parent for pod -> " + pod)
(pod, extracted, measurementPoint, billOrder, null, factory)
}
}
.setName("rddExtractedWithMPointWithParent")
.cache()
rddExtractedWithMPointWithParent.count
val rddRegistryFactoryParentKeys = rddExtractedWithMPointWithParent
.filter(e => Option(e._5).isDefined)
.map(e => (e._5,1))
.reduceByKey(_+_)
.keys
rddRegistryFactoryParentKeys.count
val rddRegistryFactoryParent = registry.get().createIncrementalRegistryFromPods(rddRegistryFactoryParentKeys, List())
rddRegistryFactoryParent.count
val imprb = new Handler[IncrementalMeasurementPointRegistryBuilder]
val rddNew = rddExtractedWithMPointWithParent.map({
case (pod, extracted, measurementPoint, billingOrder, parentPod, factory) =>
(parentPod, (pod, extracted, measurementPoint, billingOrder, factory))
})
rddNew.count
val p = rddNew.cogroup(rddRegistryFactoryParent)
p.count
val rddExtractedWithMPointWithMpointParent = p.filter{ case (pod, (inputs, mpFactories)) => inputs.nonEmpty }
.flatMap{ case (pod, (inputs, mpFactories)) =>
val factory = mpFactories.headOption //eventually one or none factory
val results = inputs.map{e =>
val measurementPointTupla = factory.flatMap{f =>
Option(imprb.get.buildSparkDecorator(new MeasurementPointFactoryAdapter(f)).getMeasurementPointByDate(e._2.getRequestDate), f)
}
val tupla = measurementPointTupla.getOrElse(null)
val toBeBilled = if(tupla!=null && tupla._1!=null) false else true
val m = if(tupla!=null && tupla._1!=null) tupla._1 else null
val f = if(tupla!=null && tupla._2!=null) tupla._2 else null
(e._1, e._2, e._3, e._4, m, toBeBilled, e._5 , f)
}
results
}
.setName("rddExtractedWithMPointWithMpointParent")
.cache()
rddExtractedWithMPointWithMpointParent.foreach({ e =>
log.info("2.2:parentMpoint>MpointComplete=" + e._5 + " parent for pod -> " + e._1)
})
}
這些是參與到協同組操作兩個RDDS階段,rddNew:
rddRegistryFactory:
,這是協同組的階段:
這是存儲的情況:
這是執行人凸片:
注:我添加了計數操作僅用於調試目的。
UPDATE:
- 我試圖刪除緩存ADN再次啓動的過程中,現在每個執行人大約有用於存儲數據100M,但行爲是一樣的:隨機讀恰好一遺囑執行人。
- 我也試圖在cogroup之前做相同的兩個RDD之間的連接操作,只是爲了知道我遇到的問題是僅與cogroup相關還是擴展到了所有寬轉換以及連接,行爲已經完全一樣。
好像你的「緩存」正在創造內存壓力。爲什麼要在這裏緩存?你有沒有嘗試過緩存? –
我添加了兩張圖片,代表存儲和執行者的情況。也許是有堆壓力的一點點,但行爲是奇怪的,這可能只是緩存濫用? – Giorgio
有各種各樣的因素沒有一個,請刪除緩存,並看到 –