2016-03-01 98 views
3

剛試圖將基本RDD數據集導入到DynamoDB時出現問題。這是代碼:在Spark中將簡單RDD寫入DynamoDB

import org.apache.hadoop.mapred.JobConf 

var rdd = sc.parallelize(Array(("", Map("col1" -> Map("s" -> "abc"), "col2" -> Map("n" -> "123"))))) 

var jobConf = new JobConf(sc.hadoopConfiguration) 
jobConf.set("dynamodb.output.tableName", "table_x") 
jobConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat") 

rdd.saveAsHadoopDataset(jobConf) 

,這是錯誤我得到:

16/02/28 15:40:38 WARN TaskSetManager: Lost task 7.0 in stage 1.0 (TID 18, ip-172-31-9-224.eu-west-1.compute.internal): java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text 
at org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:10) 
at org.apache.hadoop.dynamodb.write.AbstractDynamoDBRecordWriter.write(AbstractDynamoDBRecordWriter.java:90) 
at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:96) 
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1199) 
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197) 
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197) 
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250) 
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205) 
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
at org.apache.spark.scheduler.Task.run(Task.scala:89) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

我能做些什麼來解決這個問題?

回答