從csv到javaRDD的轉換錯誤

我不知道爲什麼我不斷收到用於在JavaRDD中存儲csv數據的noSuchMethodError。我定義了以下類，其實例將成爲csv文件中的記錄。從csv到javaRDD的轉換錯誤

public class Historical_Data_Record implements Serializable { 
    String tripduration; 
    String starttime; 
    String stoptime; 
    String start_station_id; 
    String start_station_name; 
    long start_station_latitude; 
    long start_station_longitude; 
    String stop_station_id; 
    String stop_station_name; 
    long stop_station_latitude; 
    long stop_station_longitude; 
    String bikeid; 
    String usertype; 
    String birth_year; 
    int gender; 
    // if 1, male, if 0, female 
}

然後我有下面的代碼，通過從csv中讀取數據並存儲在JavaRDD中來創建Historical_Data_Record對象。

public static final JavaRDD<Historical_Data_Record> get_Historical_Data(JavaSparkContext sc, String filename){ 
    // get the data using the configuration parameters 
    final JavaRDD<Historical_Data_Record> rdd_records = sc.textFile(filename).map(
     new Function<String, Historical_Data_Record>() { 
      private static final long serialVersionUID = 1L; 

      public Historical_Data_Record call(String line) throws Exception { 
       String[] fields = line.split(","); 

       Historical_Data_Record sd = new Historical_Data_Record();   
       sd.tripduration = fields[0]; 
       sd.starttime = fields[1]; 
       sd.stoptime = fields[2]; 
       sd.start_station_id = fields[3]; 
       sd.start_station_name = fields[4]; 
       sd.start_station_latitude = Long.valueOf(fields[5]).longValue(); 
       sd.start_station_longitude = Long.valueOf(fields[6]).longValue(); 
       sd.stop_station_id = fields[7]; 
       sd.stop_station_name = fields[8]; 
       sd.stop_station_latitude = Long.valueOf(fields[9]).longValue(); 
       sd.stop_station_longitude = Long.valueOf(fields[10]).longValue(); 
       sd.bikeid = fields[11]; 
       sd.usertype = fields[12]; 
       sd.birth_year = fields[13]; 
       sd.gender = Integer.parseInt(fields[14]); 
       return sd; 
    }}); 

    return rdd_records; 

}

但是當我運行下面的代碼，

JavaRDD<Historical_Data_Record> aData = Spark.get_Historical_Data(sc, filename);

，其中SC是SparkContext和文件名，只是包含文件路徑的字符串。錯誤如下：

2014-11-03 11:04:42.959 java[5856:1b03] Unable to load realm info from SCDynamicStore 
14/11/03 11:04:43 WARN storage.BlockManager: Putting block broadcast_0 failed 
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; 
    at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) 
    at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) 
    at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) 
    at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) 
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) 
    at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) 
    at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) 
    at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) 
    at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) 
    at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) 
    at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) 
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) 
    at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) 
    at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) 
    at org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52) 
    at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35) 
    at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29) 
    at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) 
    at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) 
    at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:545) 
    at org.apache.spark.SparkContext.textFile(SparkContext.scala:457) 
    at org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:164) 
    at com.big_data.citibike_project.Spark.get_Historical_Data(Spark.java:19) 
    at com.big_data.citibike_project.Main.main(Main.java:18)

起初，我以爲這可能是因爲有頭，所以我刪除它。但是，這又是一個錯誤。有人能幫助我嗎？

來源

2014-11-03 Changhyun Lee

其實，我只是執行文本文件（「csv_file」），它給了我同樣的錯誤。任何人都知道發生了什麼事？ – 2014-11-03 17:17:30

Spark使用很老的番石榴版本（14.0.1），看起來像你的一個依賴關係帶來了新的不兼容版本。嘗試修復番石榴版本的火花版本。

而且這其中可能有興趣 - http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-1-1-0-incompatible-with-Hive-td17364.html

來源

2014-11-04 01:36:04 1esha

從csv到javaRDD的轉換錯誤

回答

相關問題