2017-10-04 128 views
0

我正在AWS上運行的Spark 2.2集羣上運行結構化流式處理作業。我在eu-central-1中使用S3存儲桶來執行點校驗。 一些承諾對工人的行動似乎隨意失敗,出現以下錯誤:S3的Spark結構化流式傳輸失敗致命

17/10/04 13:20:34 WARN TaskSetManager: Lost task 62.0 in stage 19.0 (TID 1946, 0.0.0.0, executor 0): java.lang.IllegalStateException: Error committing version 1 into HDFSStateStore[id=(op=0,part=62),dir=s3a://bucket/job/query/state/0/62] 
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:198) 
at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3$$anon$1.hasNext(statefulOperators.scala:230) 
at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:99) 
at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:97) 
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) 
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) 
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 
at org.apache.spark.scheduler.Task.run(Task.scala:108) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: abcdef== 
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) 
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) 
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) 
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) 
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507) 
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143) 
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131) 
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189) 
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134) 
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
... 3 more 

的作業subbmitted與下列選項允許歐盟 - 中部 - 1桶:

--packages org.apache.hadoop:hadoop-aws:2.7.4 
--conf spark.hadoop.fs.s3a.endpoint=s3.eu-central-1.amazonaws.com 
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem 
--conf spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true 
--conf spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true 
--conf spark.hadoop.fs.s3a.access.key=xxxxx 
--conf spark.hadoop.fs.s3a.secret.key=xxxxx 

我已經嘗試生成沒有特殊字符和使用實例策略的訪問密鑰,兩者都具有相同的效果。

+1

請勿將S3用於檢查點。由於S3在寫後讀後僅提供*最終一致性,因此不能保證當'HDFSBackedStateStore'列出文件或試圖重命名文件時,它將存在於S3存儲桶中,即使它剛剛被寫入。 –

+0

我還能使用什麼?使用HDFS時,最終更改日誌會變得非常大以至於無法啓動 –

+0

使用HDFS。我們在談論哪個更改日誌? –

回答

1

這種情況常常出現的Hadoop的團隊provide a troubleshooting guide。但是就像Yuval所說的那樣:直接向S3提交代碼太危險了,而且創建的數據越慢越慢,列出不一致的風險意味着有時數據會丟失,至少在Apache Hadoop 2.6-2.8版本的S3A

+0

是的,我已經閱讀了很多,但問題並不總是在發生。所以我想這是一個不存在的文件夾或文件,因爲最終一致性 –

+0

不,不是那樣的:它會表現爲FileNotFoundException。這是身份驗證,並且不容易追查,特別是因爲安全原因,代碼不敢記錄有用的信息,如使用的特定祕密。 如果只是針對法蘭克福,可能是v4 api問題 –

0

你的日誌說:

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXX, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method., S3 Extended Request ID: abcdef==

這意味着錯誤的憑據是不正確的。

val credentials = new com.amazonaws.auth.BasicAWSCredentials(
    "ACCESS_KEY_ID", 
    "SECRET_ACCESS_KEY" 
) 

出於調試目的

1)訪問密鑰/祕密密鑰是既有效

2)桶的名字是正確與否

3)打開日誌記錄在CLI和比較它與SDK

4)啓用SDK日誌記錄如下所述:

http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/java-dg-logging.html

您需要提供log4j jar和示例log4j.properties文件。

http://docs.aws.amazon.com/ses/latest/DeveloperGuide/get-aws-keys.html